Build an Invoice Extraction API with FastAPI and Python

A FastAPI invoice extraction API is an HTTP endpoint that accepts uploaded invoices (PDF or image), passes them to a managed extraction SDK, and returns structured invoice data as a typed JSON response. The Invoice Data Extraction Python SDK reduces this to a single extract() call: your endpoint receives the file, the SDK handles all OCR, document interpretation, and field extraction, and your FastAPI route returns the result as a Pydantic model containing the invoice number, date, vendor details, line items, and totals. No Tesseract pipeline, no LLM prompt chains, no post-processing glue code.

FastAPI is well suited to this architecture. According to the Python Developers Survey 2025, FastAPI was the biggest winner among Python web frameworks, jumping from 29% to 38% adoption, a 30% year-over-year increase. That growth reflects what backend developers already know: FastAPI's async-first design and built-in Pydantic validation make it a natural fit for file-processing APIs.

This tutorial builds a web service, not a standalone script. If you need to extract invoice data from a local directory of files without an HTTP layer, see the guide on extracting invoice data with standalone Python scripts. If you want a visual internal-tool front end instead, see this guide to building an invoice upload and review app with Streamlit. The approach here covers everything a web service requires: UploadFile handling for multipart form data, Pydantic response schemas for typed output, dependency injection for SDK client management, and async endpoint patterns for batch workloads.

Setting Up FastAPI with the Extraction SDK

A FastAPI project for invoice extraction needs only four dependencies. Install them into your virtual environment:

pip install fastapi uvicorn python-multipart invoicedataextraction-sdk

fastapi and uvicorn give you the ASGI application server. python-multipart is required for FastAPI's UploadFile handling (without it, file upload endpoints will fail silently). invoicedataextraction-sdk is the Python SDK that wraps the same extraction engine available through the Invoice Data Extraction API, giving you document parsing, field extraction, and structured output through a single method call.

For this tutorial, a single-file structure keeps things focused:

invoice-api/
├── main.py
├── .env
└── requirements.txt

In a production microservice, you would split route handlers, Pydantic models, and dependency providers into separate modules. For now, everything lives in main.py.

SDK Client Initialization

The SDK client needs an API key, which you generate and manage from your account dashboard. Store it as an environment variable and never hardcode it:

import os
from fastapi import FastAPI, Depends
from invoicedataextraction import InvoiceDataExtraction

app = FastAPI()

def get_extraction_client() -> InvoiceDataExtraction:
    return InvoiceDataExtraction(
        api_key=os.environ.get("INVOICE_DATA_EXTRACTION_API_KEY"),
    )

This get_extraction_client function is a FastAPI dependency. Inject it into any route handler with Depends(), and every endpoint shares the same client configuration without importing or instantiating it directly:

@app.post("/extract")
async def extract_invoice(
    client: InvoiceDataExtraction = Depends(get_extraction_client),
):
    # client is ready to use
    ...

This pattern gives you a single place to change configuration (swap API keys per environment, point to a staging URL) and makes testing straightforward. In your test suite, override the dependency to return a mock client instead of hitting the live API.

Why a Managed SDK Instead of DIY OCR

If you have spent any time comparing Python OCR libraries for invoice processing, you know what a DIY invoice extraction pipeline looks like: Tesseract or PaddleOCR to convert scans to raw text, custom parsing logic to locate invoice numbers and totals in varying layouts across vendors, table extraction to handle line items that span page breaks, and eventually an LLM layer to handle the edge cases that rule-based parsing misses. If you do want that LLM-centric route, this guide to LangChain structured output for invoice extraction shows a leaner way to enforce schema-shaped responses, and this walkthrough on building a typed Pydantic AI agent for invoices covers BinaryContent input, dependency injection, and self-healing validation across providers. Each layer adds dependencies, failure modes, and code you have to own.

A managed SDK replaces that stack with one function call. The service handles document interpretation, field extraction, and data structuring including multi-language invoices and low-quality scans. Your FastAPI code stays focused on request handling and business logic rather than extraction internals.

Developers who prefer working with raw HTTP requests can call the REST API directly instead of using the SDK wrapper. The SDK is a convenience layer over the same endpoints.

From File Upload to Structured Invoice Data

The endpoint you are building has three jobs: accept an uploaded invoice file, run it through the extraction SDK, and return typed invoice data that FastAPI can document automatically. Start with the data models that define what "structured invoice data" looks like in your API.

Defining Pydantic Response Models

Pydantic models serve double duty here. They validate the extraction output at runtime and generate the OpenAPI schema that FastAPI exposes at /docs. If you want a deeper walkthrough on schema design and validating extracted invoice JSON before downstream processing, see this guide to Pydantic invoice validation for structured output. Define a model hierarchy that mirrors real invoice structure:

from pydantic import BaseModel
from typing import Optional

class InvoiceLineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    line_total: float

class InvoiceData(BaseModel):
    invoice_number: str
    date: str
    vendor_name: str
    subtotal: Optional[float] = None
    tax: Optional[float] = None
    total: float
    currency: Optional[str] = None
    line_items: list[InvoiceLineItem] = []

class ExtractionResponse(BaseModel):
    success: bool
    extraction_id: str
    data: InvoiceData

InvoiceLineItem captures each product or service row. InvoiceData holds the header-level fields plus a list of line items. ExtractionResponse wraps everything with a success flag and a unique extraction ID for traceability. Optional fields like currency and subtotal account for invoices that omit them — the SDK returns all fields it can extract, and missing or unreadable values come back as null. This is typed Python, not a raw dictionary or an unstructured text dump from an OCR pipeline. Downstream API consumers get a guaranteed contract: every response has an invoice_number string and a total float, and the OpenAPI spec FastAPI generates from these models serves as live documentation.

Building the Extraction Endpoint

The POST route accepts a file via FastAPI's UploadFile, which gives you the file object, its filename, and the declared content type in a single parameter. Before touching the SDK, validate that the upload is a format the extraction engine supports:

import os
import json
import tempfile
from fastapi import FastAPI, UploadFile, HTTPException
from invoicedataextraction import InvoiceDataExtraction

app = FastAPI(title="Invoice Extraction API")

client = InvoiceDataExtraction(
    api_key=os.environ.get("INVOICE_DATA_EXTRACTION_API_KEY"),
)

ALLOWED_TYPES = {
    "application/pdf",
    "image/jpeg",
    "image/png",
}

@app.post("/extract", response_model=ExtractionResponse)
async def extract_invoice(file: UploadFile):
    if file.content_type not in ALLOWED_TYPES:
        raise HTTPException(
            status_code=422,
            detail=f"Unsupported file type: {file.content_type}. "
                   f"Accepted formats: PDF, JPG, PNG.",
        )

The 422 response follows HTTP semantics for unprocessable content. Clients sending a Word document or a TIFF get an immediate 422 before any processing begins.

Calling the SDK

With the file validated, save it to a temporary location and pass the path to the SDK's extract() method. The method's prompt parameter accepts a plain string (up to 2,500 characters) describing what to extract in natural language, or a dict with structured field definitions where each field has a name and an optional prompt for field-specific instructions. The string form keeps the code readable for a single-endpoint service:

    with tempfile.NamedTemporaryFile(
        delete=False, suffix=os.path.splitext(file.filename)[1]
    ) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    try:
        result = client.extract(
            files=[tmp_path],
            prompt="Extract invoice number, date, vendor name, line items with description quantity unit price and line total, subtotal, tax, total, and currency",
            output_structure="per_invoice",
            download={
                "formats": ["json"],
                "output_path": "./output",
            },
        )
    finally:
        os.unlink(tmp_path)

The output_structure parameter controls extraction granularity. Setting it to "per_invoice" produces one result object per invoice in the file. If you needed row-level detail instead (one object per product line across all invoices), you would switch to "per_line_item". The SDK handles uploading the file to the extraction engine, polling for completion, and downloading the JSON result to the output path you specify.

Parsing the Result Into Your Response Model

The SDK returns a result object with the extraction status, ID, and output URLs. When you request JSON format via the download parameter, the SDK writes the structured extraction data to your output directory. Read that file and map it onto your Pydantic models:

    if not result.get("success"):
        raise HTTPException(status_code=502, detail="Extraction failed.")

    output_dir = "./output"
    json_files = [f for f in os.listdir(output_dir) if f.endswith(".json")]
    if not json_files:
        raise HTTPException(status_code=502, detail="No extraction output.")

    with open(os.path.join(output_dir, json_files[-1])) as f:
        extracted = json.load(f)

    invoice_raw = extracted[0] if isinstance(extracted, list) else extracted

    line_items = [
        InvoiceLineItem(**item)
        for item in invoice_raw.get("line_items", [])
    ]

    invoice_data = InvoiceData(
        invoice_number=invoice_raw.get("invoice_number", ""),
        date=invoice_raw.get("date", ""),
        vendor_name=invoice_raw.get("vendor_name", ""),
        subtotal=invoice_raw.get("subtotal"),
        tax=invoice_raw.get("tax"),
        total=invoice_raw.get("total", 0),
        currency=invoice_raw.get("currency"),
        line_items=line_items,
    )

    return ExtractionResponse(
        success=True,
        extraction_id=result.get("extraction_id", ""),
        data=invoice_data,
    )

Because ExtractionResponse is declared as the response_model on the route, FastAPI validates the return value against the schema before serializing it to JSON. Any missing required fields or type mismatches raise an error rather than sending malformed data to the client. A successful extraction returns a response like this:

{
  "success": true,
  "extraction_id": "ext_8f2a1b3c",
  "data": {
    "invoice_number": "INV-2026-0042",
    "date": "2026-03-15",
    "vendor_name": "Acme Office Supplies Ltd",
    "subtotal": 450.00,
    "tax": 90.00,
    "total": 540.00,
    "currency": "USD",
    "line_items": [
      {
        "description": "Printer Paper A4 (5 reams)",
        "quantity": 5,
        "unit_price": 12.00,
        "line_total": 60.00
      },
      {
        "description": "Ergonomic Desk Chair",
        "quantity": 1,
        "unit_price": 390.00,
        "line_total": 390.00
      }
    ]
  }
}

Complete Runnable Example

The full endpoint in a single file:

import os
import json
import tempfile
from fastapi import FastAPI, UploadFile, HTTPException
from pydantic import BaseModel
from typing import Optional
from invoicedataextraction import InvoiceDataExtraction

# --- Pydantic models ---

class InvoiceLineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    line_total: float

class InvoiceData(BaseModel):
    invoice_number: str
    date: str
    vendor_name: str
    subtotal: Optional[float] = None
    tax: Optional[float] = None
    total: float
    currency: Optional[str] = None
    line_items: list[InvoiceLineItem] = []

class ExtractionResponse(BaseModel):
    success: bool
    extraction_id: str
    data: InvoiceData

# --- App and SDK client ---

app = FastAPI(title="Invoice Extraction API")

client = InvoiceDataExtraction(
    api_key=os.environ.get("INVOICE_DATA_EXTRACTION_API_KEY"),
)

ALLOWED_TYPES = {"application/pdf", "image/jpeg", "image/png"}

# --- Extraction endpoint ---

@app.post("/extract", response_model=ExtractionResponse)
async def extract_invoice(file: UploadFile):
    if file.content_type not in ALLOWED_TYPES:
        raise HTTPException(
            status_code=422,
            detail=f"Unsupported file type: {file.content_type}. "
                   f"Accepted formats: PDF, JPG, PNG.",
        )

    with tempfile.NamedTemporaryFile(
        delete=False, suffix=os.path.splitext(file.filename)[1]
    ) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    try:
        result = client.extract(
            files=[tmp_path],
            prompt="Extract invoice number, date, vendor name, line items with description quantity unit price and line total, subtotal, tax, total, and currency",
            output_structure="per_invoice",
            download={
                "formats": ["json"],
                "output_path": "./output",
            },
        )
    finally:
        os.unlink(tmp_path)

    if not result.get("success"):
        raise HTTPException(status_code=502, detail="Extraction failed.")

    output_dir = "./output"
    json_files = [f for f in os.listdir(output_dir) if f.endswith(".json")]
    if not json_files:
        raise HTTPException(status_code=502, detail="No extraction output.")

    with open(os.path.join(output_dir, json_files[-1])) as f:
        extracted = json.load(f)

    invoice_raw = extracted[0] if isinstance(extracted, list) else extracted

    line_items = [
        InvoiceLineItem(**item)
        for item in invoice_raw.get("line_items", [])
    ]

    invoice_data = InvoiceData(
        invoice_number=invoice_raw.get("invoice_number", ""),
        date=invoice_raw.get("date", ""),
        vendor_name=invoice_raw.get("vendor_name", ""),
        subtotal=invoice_raw.get("subtotal"),
        tax=invoice_raw.get("tax"),
        total=invoice_raw.get("total", 0),
        currency=invoice_raw.get("currency"),
        line_items=line_items,
    )

    return ExtractionResponse(
        success=True,
        extraction_id=result.get("extraction_id", ""),
        data=invoice_data,
    )

# --- Entry point ---

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Run it with python main.py or uvicorn main:app --reload during development. Once the server starts, open http://localhost:8000/docs in your browser. FastAPI generates interactive Swagger UI documentation directly from your Pydantic models. Every field in ExtractionResponse, InvoiceData, and InvoiceLineItem appears in the schema panel with its type and optionality. API consumers can test the file upload endpoint from that same page, inspect the exact response shape, and use the OpenAPI spec to generate client libraries in any language.

Async Batch Processing for Large Workloads

The single-file extraction endpoint works well for on-demand requests, but production workloads rarely arrive one invoice at a time. An accounts payable team uploads 200 supplier invoices at month-end. A document ingestion pipeline feeds in multi-page PDFs that take minutes to process. In both cases, a synchronous endpoint that blocks until extraction finishes will either time out or starve your API of worker threads.

The fix is a job-based pattern: accept the files, return a job ID immediately, and process the extraction in the background. The SDK's staged workflow methods are built for exactly this.

The Staged Workflow

Instead of the single extract() call, the SDK exposes individual methods that let you build a job-based async pipeline:

upload_files() sends documents to the extraction service and returns a session ID plus a list of file IDs.
submit_extraction() kicks off the extraction job against those uploaded files and returns an extraction_id, which becomes your job reference.
wait_for_extraction_to_finish() polls the service until the job reaches a terminal state. You configure the polling loop with an interval_ms (minimum 5,000 ms, default 10,000 ms) and an optional timeout_ms (defaults to None for no timeout).
check_extraction() performs a single status check without entering a polling loop, returning the current state and a progress percentage for in-flight jobs.
download_output() retrieves the finished results as JSON, CSV, or XLSX.

This separation gives you full control over each stage. In a FastAPI background task, wait_for_extraction_to_finish() handles the polling loop internally so your background worker can fire-and-forget.

Building the Batch Endpoint

The pattern pairs FastAPI's BackgroundTasks with the staged workflow. A POST endpoint accepts multiple files, validates them against the SDK's batch limits, and returns a 202 response with the extraction ID. A background task handles the waiting. A separate GET endpoint lets callers poll for results.

import os
import tempfile
from fastapi import FastAPI, UploadFile, BackgroundTasks, HTTPException
from fastapi.responses import JSONResponse
from invoicedataextraction import InvoiceDataExtraction

app = FastAPI()
client = InvoiceDataExtraction(
    api_key=os.environ.get("INVOICE_DATA_EXTRACTION_API_KEY"),
)

MAX_FILES = 6000
MAX_TOTAL_BYTES = 2 * 1024 * 1024 * 1024  # 2 GB
MAX_PDF_BYTES = 150 * 1024 * 1024          # 150 MB per PDF
MAX_IMAGE_BYTES = 5 * 1024 * 1024          # 5 MB per image


def validate_batch(files: list[UploadFile]) -> None:
    if len(files) > MAX_FILES:
        raise HTTPException(400, f"Batch exceeds {MAX_FILES} file limit")
    total_size = 0
    for f in files:
        f.file.seek(0, 2)
        size = f.file.tell()
        f.file.seek(0)
        total_size += size
        ext = f.filename.lower().rsplit(".", 1)[-1] if f.filename else ""
        if ext == "pdf" and size > MAX_PDF_BYTES:
            raise HTTPException(400, f"{f.filename} exceeds 150 MB PDF limit")
        if ext in ("jpg", "jpeg", "png") and size > MAX_IMAGE_BYTES:
            raise HTTPException(400, f"{f.filename} exceeds 5 MB image limit")
    if total_size > MAX_TOTAL_BYTES:
        raise HTTPException(400, "Total upload exceeds 2 GB limit")


@app.post("/extractions", status_code=202)
async def create_extraction(
    files: list[UploadFile],
    background_tasks: BackgroundTasks,
    prompt: str = "Extract invoice number, date, vendor name, line items, tax, and total",
):
    validate_batch(files)
    tmp_dir = tempfile.mkdtemp()
    file_paths = []
    for f in files:
        path = os.path.join(tmp_dir, f.filename)
        with open(path, "wb") as out:
            out.write(await f.read())
        file_paths.append(path)

    upload = client.upload_files(files=file_paths)
    submission = client.submit_extraction(
        upload_session_id=upload["upload_session_id"],
        file_ids=upload["file_ids"],
        prompt=prompt,
        output_structure="per_invoice",
    )
    extraction_id = submission["extraction_id"]

    background_tasks.add_task(
        lambda: client.wait_for_extraction_to_finish(
            extraction_id=extraction_id,
            polling={"interval_ms": 10000},
        )
    )

    return JSONResponse(
        status_code=202,
        content={"extraction_id": extraction_id, "status": "processing"},
    )

The status endpoint uses check_extraction() to return the current state without blocking:

@app.get("/extractions/{extraction_id}/status")
async def get_extraction_status(extraction_id: str):
    result = client.check_extraction(extraction_id=extraction_id)
    return result

When the caller sees a completed status, they can hit a download endpoint:

@app.get("/extractions/{extraction_id}/download")
async def download_extraction(extraction_id: str, format: str = "json"):
    output_path = f"./outputs/{extraction_id}.{format}"
    client.download_output(
        extraction_id=extraction_id,
        format=format,
        file_path=output_path,
    )
    from fastapi.responses import FileResponse
    return FileResponse(output_path, filename=f"extraction.{format}")

Polling Configuration

The polling parameter on wait_for_extraction_to_finish() accepts two fields:

interval_ms controls how frequently the SDK checks for completion. The minimum is 5,000 ms. For large batches, the default of 10,000 ms avoids unnecessary requests while still surfacing results promptly.
timeout_ms sets a ceiling on total wait time. When set to None (the default), the method polls indefinitely until the extraction completes or fails. If you set a timeout and the job exceeds it, the SDK raises an error, but the extraction continues server-side. You can pick it up later with check_extraction().

For background tasks where no HTTP connection is waiting, leaving timeout_ms at None is typically the right choice. The background worker simply waits until the job finishes.

Batch Limits and Credit Consumption

The validation logic above mirrors the SDK's actual constraints: up to 6,000 files per upload session, a 2 GB total upload size, individual PDFs up to 150 MB, and images up to 5 MB each. Enforcing these at the endpoint level gives callers clear error messages instead of opaque SDK failures.

On the cost side, the SDK and web platform share the same credit pool with no separate API subscription fees. Each successfully processed page consumes 1 credit, whether it arrives through your FastAPI endpoint or the web interface. Every account receives 50 free pages per month, and additional credits are available pay-as-you-go. If you are building a batch service that processes hundreds or thousands of pages, factor per-page credit consumption into your capacity planning and pass costs through to your users accordingly.

For developers who want to understand the HTTP API that the SDK wraps underneath these method calls, the invoice extraction API developer quickstart covers the raw endpoints and authentication flow.

Error Handling and Production Deployment

Going from working to production means handling SDK failures cleanly, validating input before burning credits, and tuning the deploy for I/O-bound workloads.

Structured SDK Error Handling

The Python SDK exposes two exception classes you need to catch: SdkError for client-side failures (file system issues, network timeouts, upload orchestration) and ApiResponseError for server-side rejections (invalid files, extraction failures). Import both from the SDK's errors module:

from invoicedataextraction.errors import SdkError, ApiResponseError

Both exceptions attach a structured body with a consistent shape:

{
  "success": false,
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description",
    "retryable": true,
    "details": null
  }
}

The retryable flag is the key field for your error-handling logic. When it is true, the failure is transient (a network blip, a temporary service issue) and the caller should retry. When false, the request itself is invalid and retrying will produce the same result. Map this directly to your HTTP responses:

from fastapi import HTTPException
from invoicedataextraction.errors import SdkError, ApiResponseError

async def handle_extraction(client, file_path: str, prompt: str):
    try:
        result = client.extract(
            files=[file_path],
            prompt=prompt,
            output_structure="per_invoice",
        )
        return result
    except ApiResponseError as e:
        error_body = e.body["error"]
        if error_body["retryable"]:
            raise HTTPException(
                status_code=503,
                detail={
                    "error": error_body["code"],
                    "message": error_body["message"],
                    "retry": True,
                },
            )
        raise HTTPException(
            status_code=422,
            detail={
                "error": error_body["code"],
                "message": error_body["message"],
                "retry": False,
            },
        )
    except SdkError as e:
        error_body = e.body["error"]
        code = error_body["code"]
        if code == "SDK_NETWORK_ERROR":
            raise HTTPException(
                status_code=502,
                detail="Extraction service unreachable. Retry shortly.",
            )
        if code == "SDK_UPLOAD_ERROR":
            raise HTTPException(
                status_code=502,
                detail="File upload to extraction service failed.",
            )
        raise HTTPException(
            status_code=500,
            detail=f"Extraction failed: {error_body['message']}",
        )

This gives callers actionable information: a 503 with a retry flag means "back off and try again," a 422 means "fix your request," and a 502 means the upstream extraction service had a transient issue.

Input Validation for Single-File Endpoints

The validate_batch helper above covers multi-file POSTs. For single-file endpoints, the same content-type and size checks apply — extract them into a validate_upload(file: UploadFile) dependency that returns the file contents and reuse it across routes. Doing this at the FastAPI layer means invalid requests never leave your server.

Rate Limiting Awareness

The extraction API enforces per-key rate limits: 600 upload requests/min, 30 submit requests/min, 120 poll requests/min, and 30 download requests/min. For a single-user tool, these limits are generous. For a high-throughput service handling concurrent users, the submit limit of 30 per minute is the bottleneck you will hit first.

For high-throughput services, queue or throttle extraction requests to stay within limits, either through an async queue (asyncio.Queue, Celery) or timestamp-based tracking per endpoint category. If you do exceed a rate limit, the SDK raises an ApiResponseError with retryable set to true, so the error handling above already covers it.

Deploying with Uvicorn

Invoice extraction endpoints are I/O-bound. Your FastAPI handlers spend most of their time waiting on HTTP calls to the extraction API, not doing CPU work. This means async handlers paired with multiple Uvicorn workers give you the best throughput:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4

Four workers can handle four concurrent extraction requests that are each awaiting SDK responses. Adjust the count based on your expected concurrency and the rate limits above. On Linux, Gunicorn with Uvicorn workers provides process management (automatic restarts, graceful shutdown):

gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Once deployed, verify the endpoint at /docs, where FastAPI's auto-generated Swagger UI lets you upload files, trigger extractions, and inspect responses.