Pydantic AI Invoice Extraction: Build a Typed Agent

Two unrelated Python packages share the Pydantic name and search engines still confuse them. Pydantic-the-library validates JSON against BaseModel classes; Pydantic AI (at ai.pydantic.dev) is the agent framework built on top of it that runs typed agents against pluggable LLM providers. This article is about Pydantic AI as the agent framework — if what you actually wanted is post-extraction JSON validation with the original library, validating extracted invoice JSON with Pydantic the library is the article you came for.

Pydantic AI invoice extraction works by defining a typed Pydantic BaseModel for the invoice schema, passing it to an Agent as output_type, and providing the PDF via BinaryContent(media_type='application/pdf') or DocumentUrl. The framework calls the chosen provider — OpenAI, Anthropic, Gemini, DeepSeek, or Grok — validates the returned data against the model, and automatically retries with the validation error attached when fields fail constraints. That last part is the differentiator for invoice work: when the LLM drifts on totals arithmetic, the framework re-asks with the specific error rather than handing your downstream code a quietly broken record.

Here is the minimal working version. It runs against a real PDF, returns a typed Invoice object, and is the snippet the rest of this guide evolves.

from pathlib import Path
from pydantic import BaseModel
from pydantic_ai import Agent, BinaryContent


class LineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    line_total: float


class Invoice(BaseModel):
    invoice_number: str
    issue_date: str
    due_date: str | None = None
    vendor_name: str
    vendor_vat_id: str | None = None
    currency: str
    net_total: float
    vat_amount: float
    grand_total: float
    line_items: list[LineItem]


agent = Agent(
    'anthropic:claude-sonnet-4-6',
    output_type=Invoice,
    system_prompt='Extract structured invoice data. Be exact about totals.',
)

result = agent.run_sync([
    'Extract this invoice.',
    BinaryContent(
        data=Path('invoice.pdf').read_bytes(),
        media_type='application/pdf',
    ),
])

invoice: Invoice = result.output
print(invoice.vendor_name, invoice.grand_total)
for line in invoice.line_items:
    print(line.description, line.line_total)

Two things to notice. First, result.output is an actual Invoice instance, not a dict you have to parse and pray over. Any FastAPI handler, queue worker, or data pipeline that receives this object gets autocomplete, type-checking, and every validation rule defined on the model — the typing isn't a contract you maintain at the boundary, it's the object itself. Second, the agent definition carries no provider-specific code. The string 'anthropic:claude-sonnet-4-6' is the only line that names a vendor; swap it to 'google:gemini-2.5-pro' or 'openai:gpt-5.2' and the rest of the program does not move. That portability is what most invoice-extraction tutorials skip past, and it's worth showing in code rather than describing in prose.

PDF input as a first-class primitive with BinaryContent and DocumentUrl

Invoice PDFs reach an extraction agent two ways: as bytes already on disk or in memory, or as URLs in object storage. Pydantic AI exposes a primitive for each. BinaryContent carries the raw bytes inline with the prompt; DocumentUrl hands the provider a URL and lets the provider fetch the file directly. The right choice is rarely ambiguous — what matters is knowing both exist and which providers handle each cleanly.

The local-bytes path is what the opening example showed:

from pathlib import Path
from pydantic_ai import BinaryContent

result = agent.run_sync([
    'Extract this invoice.',
    BinaryContent(
        data=Path('invoice.pdf').read_bytes(),
        media_type='application/pdf',
    ),
])

Use it for any PDF already accessible to the process — files uploaded to a FastAPI endpoint and held in memory, files pulled from a message-queue payload, files generated by an upstream step in the same pipeline. The bytes travel inside the request to the provider.

The hosted-PDF path uses DocumentUrl:

from pydantic_ai import DocumentUrl

result = agent.run_sync([
    'Extract this invoice.',
    DocumentUrl(url='https://your-bucket.s3.amazonaws.com/invoices/2026-05-12-acme.pdf'),
])

Use it when invoices are already sitting in S3, GCS, Azure Blob, or a vendor portal URL — the provider downloads the file on its end rather than the developer reading bytes into the process. For batch workloads where invoices land in object storage and the agent runs nearby, this skips a round trip.

The technical basis for choosing Anthropic as the default model for PDF invoice work is concrete: Anthropic's Claude API accepts native PDF input via the application/pdf media type, processing requests up to 32 MB with each page converted to both text and image for analysis, per the native PDF support in the Claude API documentation. That dual text-plus-image processing is what lets Claude reason about tabular line items and visual layout cues — invoice numbers in headers, totals in footers, amounts in table cells — without the developer pre-rasterising anything.

Google Gemini handles native PDF the same way. OpenAI's image-input endpoints historically required rasterising PDF pages to per-page images before submission, which Pydantic AI does not do for you; check the current OpenAI vision documentation before assuming BinaryContent(media_type='application/pdf') will be forwarded usefully to a GPT-class model. The framework passes the content through faithfully — what the provider does with it on the other end varies, and the difference matters most for PDFs.

BinaryContent and DocumentUrl are part of a family. ImageUrl covers hosted images for scanned-only invoices, AudioUrl and VideoUrl cover audio and video sources, and TextContent exists for explicit text-only inputs alongside the prompt. For invoice work the two PDF primitives carry almost every case; the image primitive matters when the source is a phone photo or a scan that arrived as JPG.

Designing a typed Invoice schema that survives real PDFs

The Invoice model from the opening snippet looks simple, but every field is a decision. The choices that hold up against real invoice variability come down to four questions: what's required, what's nested, what type the money fields are, and what type the dates are.

Required versus optional. The required fields should be the ones every invoice actually has: invoice_number, issue_date, vendor_name, currency, net_total, vat_amount, and grand_total. Everything else — due_date, vendor_vat_id, PO numbers, payment terms, billing addresses — should be Optional with a default of None. Marking a non-universal field as required forces the LLM into one of two failure modes: it fabricates a plausible value, or it fails validation and burns a retry on something nothing in the document can fix. Optional fields with None defaults let the model report absence honestly:

from pydantic import BaseModel

class Invoice(BaseModel):
    invoice_number: str
    issue_date: str
    due_date: str | None = None
    vendor_name: str
    vendor_vat_id: str | None = None
    po_number: str | None = None
    payment_terms: str | None = None
    currency: str
    net_total: float
    vat_amount: float
    grand_total: float
    line_items: list[LineItem]
    notes: str | None = None

The notes field at the bottom is an escape valve for context the LLM noticed but the schema doesn't formally model — "VAT applied on shipping only," "credit note offset against invoice 4421," that kind of thing. It costs nothing and prevents the model from cramming context into other string fields.

Nested line items. list[LineItem] is the right shape for invoice rows. Each LineItem is its own BaseModel, validated independently, with the same required-versus-optional discipline — extending the opening shape with two optional fields worth carrying once the schema goes beyond a minimal example:

class LineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    line_total: float
    product_code: str | None = None
    line_tax: float | None = None

When a single line item fails validation, the framework's error message points at the specific row and field rather than the entire invoice, which makes the LLM's correction prompt narrower and more effective.

Money typing. Use float for LLM-output money fields, not Decimal. LLMs return JSON numbers, and Decimal requires either a custom validator or string-typed fields that the LLM then quotes inconsistently — both add friction without buying accuracy at the extraction layer. The right place for Decimal is downstream, in the accounting boundary: convert float to Decimal once when the validated Invoice enters the system of record. Currency belongs as a 3-letter ISO 4217 string ('USD', 'EUR', 'GBP'); a field_validator enforcing length and uppercase is one line and catches the LLM occasionally returning 'us dollars'.

Date typing. Same logic for dates. Keep them as ISO 8601 strings ('2026-05-12') at the LLM-output stage, parse to date or datetime downstream if the strict type matters. A field_validator can enforce the format and reject anything that doesn't parse, but typing the field as str keeps the model permissive at the boundary and the rejection explicit at validation time.

Pushing hints to the LLM. Pydantic AI passes Field(description=...) text into the prompt the framework constructs around the schema. That's where you disambiguate the fields the LLM regularly gets wrong:

from pydantic import BaseModel, Field

class Invoice(BaseModel):
    invoice_number: str = Field(description='The unique vendor invoice reference, usually top-right or in the header')
    net_total: float = Field(description='Pre-tax invoice total — sum of line items before VAT')
    vat_amount: float = Field(description='Total VAT charged on the invoice')
    grand_total: float = Field(description='Final invoice total — net_total plus vat_amount')

The descriptions are cheap, the LLM uses them, and they're the right place to encode the domain knowledge that distinguishes net total from grand total, or invoice date from due date. A focused schema with good field descriptions outperforms a sprawling one with no guidance every time.

Dependency injection for AP context: vendor master, currency, and tools

A real AP integration doesn't just extract invoice fields and write them somewhere. It looks up the vendor in the master to confirm identity, converts foreign-currency amounts to the company base currency, and reaches for tax-rate tables when the document is ambiguous. Each of those is a dependency the agent needs while it's reasoning about the invoice — not after.

The naive options are both bad. Embedding the calls as raw imports inside tool function bodies couples the agent to global state, breaks testing, and erases any type information about what each tool actually needs. Running everything in a post-processing layer after extraction strips the LLM of the ability to use those lookups mid-reasoning — it can't notice the VAT ID doesn't match any vendor in the master and ask the user a clarifying question, because the master is invisible to it.

Pydantic AI's deps_type resolves both problems by typing the dependencies and injecting them into every tool call through RunContext:

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext


@dataclass
class APDeps:
    vendor_db: VendorDatabase
    currency_rates: CurrencyRates
    api_key: str  # used in the extraction-API fallback tool, shown later


agent = Agent(
    'anthropic:claude-sonnet-4-6',
    deps_type=APDeps,
    output_type=Invoice,
    system_prompt='Extract structured invoice data. Use the available tools to verify vendor identity and convert foreign-currency totals.',
)


@agent.tool
async def lookup_vendor(ctx: RunContext[APDeps], vat_id: str) -> dict:
    """Look up vendor master data by VAT ID. Use this to confirm vendor
    identity when the invoice has a VAT ID, especially when the vendor
    name on the invoice doesn't obviously match a known supplier."""
    return await ctx.deps.vendor_db.get(vat_id)


@agent.tool
async def convert_to_base_currency(
    ctx: RunContext[APDeps],
    amount: float,
    currency: str,
) -> float:
    """Convert an amount from the invoice currency to USD using current rates.
    Use this when the invoice currency differs from USD and the caller needs
    a base-currency figure alongside the original."""
    return await ctx.deps.currency_rates.convert(amount, currency, 'USD')

The call site builds APDeps once and passes it in:

import os

deps = APDeps(
    vendor_db=VendorDatabase(...),
    currency_rates=CurrencyRates(...),
    api_key=os.environ['IDE_API_KEY'],
)

result = await agent.run(
    [
        'Extract this invoice.',
        BinaryContent(data=pdf_bytes, media_type='application/pdf'),
    ],
    deps=deps,
)

The framework wires deps into every @agent.tool invocation through ctx.deps, fully typed. Static type checkers see ctx.deps.vendor_db and ctx.deps.currency_rates as their actual types, not as Any.

Two things about tools that the documentation tends to underemphasise. First, the LLM decides when to call each tool based on the docstring, not on a separate routing config. The docstring is the tool's contract with the model. A docstring like "Look up vendor master data by VAT ID. Use this to confirm vendor identity..." gives the model a clear signal about intent; a docstring like "Look up a vendor" is functionally invisible — the model has no reason to call it. Treat docstrings as part of the prompt surface area, not as developer notes.

Second, the type annotations on tool parameters are part of the contract too. The framework generates the JSON schema for tool arguments from the function signature, and the LLM uses that schema to format its call. A vat_id: str parameter is unambiguous; a vat_id parameter with no annotation forces the framework to fall back, and the LLM's tool calls become less reliable.

This DI pattern translates almost directly when the agent gets wrapped behind an HTTP API — FastAPI's Depends() system builds per-request dependencies the same way, and an APDeps instance can be constructed once per request and passed straight through to agent.run(deps=deps). The article on exposing the Pydantic AI agent over HTTP with FastAPI walks that pattern end to end for readers ready to serve the agent over a real endpoint.

Self-healing extraction with model_validator and the retry loop

LLMs drift on arithmetic. On a typical batch you'll see invoices where grand_total doesn't equal net_total + vat_amount, where line items don't sum to the net total, or where a 10% VAT line gets recorded as the gross VAT-inclusive figure. The drift isn't random — different invoice layouts trigger different mistakes — but it's frequent enough that handing the validated output straight into an accounting system is unsafe without checking.

Pydantic validators catch these mistakes at extraction time. Pydantic AI does something more useful: when a validator raises, the framework catches the error, feeds the validation message back to the LLM as part of the conversation, and asks for a corrected output. The validator's error message becomes the correction prompt. This is the closest thing to self-healing extraction the current generation of frameworks offers, and it's the most differentiated feature for invoice work.

The canonical case is totals reconciliation:

from pydantic import BaseModel, model_validator

class Invoice(BaseModel):
    # ... field declarations from the schema above ...

    @model_validator(mode='after')
    def totals_consistent(self):
        expected = self.net_total + self.vat_amount
        if abs(self.grand_total - expected) > 0.02:
            raise ValueError(
                f'grand_total ({self.grand_total}) does not equal '
                f'net_total ({self.net_total}) + vat_amount ({self.vat_amount}). '
                f'Re-check the invoice for the correct figures.'
            )
        return self

The two-cent tolerance acknowledges floating-point rounding without admitting larger drift. The error message includes the actual values — that specificity matters, because the LLM uses it to figure out which field to recheck rather than guessing.

A second validator on currency keeps the ISO 4217 contract honest:

from pydantic import field_validator

class Invoice(BaseModel):
    # ... field declarations ...

    @field_validator('currency')
    @classmethod
    def currency_is_iso_4217(cls, v: str) -> str:
        if len(v) != 3 or not v.isupper():
            raise ValueError(
                f'currency must be a 3-letter ISO 4217 code in uppercase, got {v!r}'
            )
        return v

These two validators between them catch a meaningful share of real extraction errors before they leave the agent.

The retry budget lives on the Agent constructor:

agent = Agent(
    'anthropic:claude-sonnet-4-6',
    deps_type=APDeps,
    output_type=Invoice,
    system_prompt='Extract structured invoice data. Be exact about totals.',
    retries=3,
)

The default is 1, which is too tight for invoice work. A value of 2 or 3 is usually right. After three corrections, if totals still don't reconcile, the invoice probably has a structural problem the model can't recover from on its own — a credit note misclassified as an invoice, a multi-page document where line items spill across pages — and the right action is human review or escalation to a more capable extraction path, not more LLM cycles. Each retry is another full request to the provider, with the corresponding latency and token cost; tuning the budget is a real production decision, not a knob to crank to maximum.

One discipline about where to draw the validator line. Validators should catch problems the LLM can plausibly fix by re-reading the document — arithmetic, format compliance, internally inconsistent fields. They should not catch problems that need external lookup, like "this VAT ID doesn't match any vendor in the master." That belongs in a tool (the lookup_vendor from the dependency-injection section), where the LLM can call it, get the answer, and reason about what to do. Putting an external-lookup check in a validator just burns retries on a problem more model passes can't solve.

Provider portability: one model string, five providers

The model-agnostic claim is Pydantic AI's marketing line, and it holds up in code. The agent definition built up so far — the Invoice schema, the validators, the APDeps dependencies, the lookup_vendor and convert_to_base_currency tools — runs against any supported provider by changing one string:

# OpenAI
agent = Agent('openai:gpt-5.2', deps_type=APDeps, output_type=Invoice, system_prompt=...)

# Anthropic (current default for PDF invoice work)
agent = Agent('anthropic:claude-sonnet-4-6', deps_type=APDeps, output_type=Invoice, system_prompt=...)

# Google Gemini (native PDF, large context window)
agent = Agent('google:gemini-2.5-pro', deps_type=APDeps, output_type=Invoice, system_prompt=...)

# xAI Grok
agent = Agent('xai:grok-4', deps_type=APDeps, output_type=Invoice, system_prompt=...)

# DeepSeek
agent = Agent('deepseek:deepseek-v3.1', deps_type=APDeps, output_type=Invoice, system_prompt=...)

What changes: one string. What doesn't: the BaseModel schema, the model_validator and field_validator definitions, the @dataclass APDeps, every @agent.tool function, the agent.run call site, and any downstream code consuming result.output. That's the value the abstraction buys.

Per-provider notes worth knowing for invoice extraction specifically:

Anthropic — anthropic:claude-sonnet-4-6. Native PDF via BinaryContent(media_type='application/pdf') is the path of least resistance, and Claude's tabular reasoning handles line-item extraction reliably. Generally the default choice for invoice work with Pydantic AI today. Authentication via ANTHROPIC_API_KEY.

Google Gemini — google:gemini-2.5-pro. Native PDF support, and the large context window makes it the practical choice for genuinely long invoices — multi-page line-item statements, consolidated month-end vendor statements with embedded invoices. Generous free tier for development. Authentication via GOOGLE_API_KEY (or the Vertex AI variant depending on deployment).

OpenAI — openai:gpt-5.2. Strong structured-output mode, but historically the vision endpoints required rasterising PDFs to per-page images before submission. Pydantic AI passes content through faithfully; whether GPT-class models accept application/pdf directly depends on the current state of OpenAI's vision support. Verify against current OpenAI docs before assuming the same BinaryContent path that works for Anthropic and Gemini works here. Authentication via OPENAI_API_KEY.

xAI Grok — xai:grok-4. Structured outputs supported, useful where the rate-limit or cost profile fits a particular workload. Less commonly deployed for document workflows than the three above, so test the specific invoice formats your pipeline will see before committing.

DeepSeek — deepseek:deepseek-v3.1. Cost-aggressive option for batch processing, which makes it attractive when invoice volume is high and per-page cost dominates. Multimodal-input support evolves quickly; check current docs against the document types your workload includes.

Authentication is environment-variable based by default — Pydantic AI looks for the provider's standard key name (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, XAI_API_KEY, DEEPSEEK_API_KEY) and uses what it finds. Swapping providers requires setting the corresponding key, nothing more.

The honest framing about portability in production: most teams don't swap daily. The value is having the option when a provider degrades on a document type, when pricing shifts, when a customer's data-residency rules require a specific cloud, or when a competitive renegotiation depends on credible willingness to switch. The same agent code keeps working through each of those events, which is what the abstraction buys you and what tying the same code directly to one provider's SDK does not.

Production tracing with Pydantic Logfire

Once the agent runs in production against real invoice traffic, two questions surface every week. Why did invoice 4421 get extracted with vendor "Acme Holdings" instead of "Acme Ltd"? And why did the token cost for last Tuesday's batch double against the prior week? Both questions need traces — the prompt the LLM actually saw, the tool calls it made, the validation failures and retries, and the token usage per step.

Pydantic Logfire wires into Pydantic AI with two lines at bootstrap:

import logfire

logfire.configure()
logfire.instrument_pydantic_ai()

After that, every agent.run and agent.run_sync produces a structured trace automatically — no per-call instrumentation, no decorators on tool functions.

The trace view for a single invoice extraction shows the full sequence: the system prompt and any field descriptions the framework merged in, the user prompt and the PDF input as a content reference, each @agent.tool invocation with its arguments and return value, the raw model output before validation, any validator failures with the exact error message, every retry attempt and its corrected output, and per-call token usage rolled up into a total. Latency is captured per span, so the slow step in a multi-tool run is visible without manual timing.

That's what closes the audit-trail question. When a finance team asks why a particular invoice was extracted the way it was, the trace shows what the LLM saw, what it asked, what the tools returned, and what the final validated Invoice looked like. The evidence is structured, queryable, and tied to the specific run — exactly the kind of audit record accounting workflows expect from any system touching financial data.

Logfire is built by the Pydantic team specifically with Pydantic AI in mind, which is why the integration is one line. Under the hood it emits OpenTelemetry-compatible spans, so a team already running an OTel-based observability stack (Jaeger, Grafana Tempo, Datadog APM) can consume the same traces without adopting Logfire as the storage and UI layer. The first-party path is the path of least resistance; the OTel path is there when the platform team needs the traces to land alongside everything else they observe.

When the model can't crack the invoice: an extraction API fallback tool

Direct LLM extraction is fast and cheap on the standard invoice — single page, clean layout, ten or twenty line items, a tidy VAT summary at the bottom. It degrades on the documents AP teams actually see at volume: multi-page invoices with hundreds of line items rolling across page breaks, consolidated vendor statements with multiple invoices embedded, complex multi-rate VAT breakdowns across product categories, and low-quality scans where field positions wander. Throwing more retries at those documents doesn't help — the model's failure isn't arithmetic drift, it's losing the document's structure.

The honest production pattern is to give the agent an escalation tool. When the model encounters complexity it can't handle reliably alone, it hands the document off to a service purpose-built for that complexity. The invoice data extraction API is the specialised path for those cases — a service that converts invoices into structured Excel, CSV, or JSON output through a prompt-based interface, handles batches up to 6,000 files and single PDFs up to 5,000 pages in a single job, and processes at one to eight seconds per page (often around two seconds per page on larger batches). The same prompt produces the same structured result whether the input is 10 invoices or 10,000, which is the property that direct LLM extraction loses at volume.

The fallback tool wraps the official Python SDK (pip install invoicedataextraction-sdk) rather than the REST API directly. The SDK's one-call extract method handles upload, submit, polling, and result retrieval in a single function call:

from invoicedataextraction import InvoiceDataExtraction
import asyncio


@agent.tool
async def extract_with_specialised_api(
    ctx: RunContext[APDeps],
    local_pdf_path: str,
    extraction_prompt: str,
) -> dict:
    """Extract invoice data using the specialised extraction API for complex
    invoices the model cannot handle reliably alone. Use this for multi-page
    invoices with many line items, consolidated vendor statements with multiple
    invoices, complex multi-rate VAT breakdowns, or low-quality scans where
    direct extraction has produced inconsistent results."""
    client = InvoiceDataExtraction(api_key=ctx.deps.api_key)
    result = await asyncio.to_thread(
        client.extract,
        files=[local_pdf_path],
        prompt=extraction_prompt,
        output_structure='per_invoice',
    )
    return result

The asyncio.to_thread wrap matters. The SDK's extract method is synchronous and runs the full upload-submit-poll-download cycle internally, which would block the event loop in an async agent. Wrapping it in asyncio.to_thread keeps the loop free for other concurrent agent runs without forcing a custom async HTTP client just for this one tool.

The tool's docstring is the LLM's routing logic. Listing the specific failure modes — multi-page invoices, consolidated statements, complex VAT, low-quality scans — gives the model concrete signal about when to escalate. Without that intent in the docstring, the LLM has no reason to choose this tool over its own native extraction. Treat the docstring as the most important line of the tool definition, not as a comment.

For workloads that need finer control than the one-call extract method — separate upload sessions, custom polling intervals, batch staging — the official invoice extraction Python SDK documents the staged-workflow methods (upload_files, submit_extraction, wait_for_extraction_to_finish, download_output) that the one-call method composes internally. Credits are shared between web and API usage from a single account balance, so the same key works whether the team's extraction is running through the dashboard or through this agent's fallback tool.

The result the tool returns is a structured object the agent can feed back into its reasoning — either projected onto the same Invoice schema for downstream consistency, or returned as-is when the agent's job is just to route the document to the right extraction path and report the outcome. Both shapes are reasonable; the right choice depends on whether downstream code expects every result to be a typed Invoice or whether it's prepared to handle a mix.

When not to use Pydantic AI for invoice extraction

Pydantic AI earns its place when typed agents with self-healing extraction, clean dependency injection, and provider portability are what the workload actually needs. That's a real fit for a large share of invoice-extraction problems, but not all of them. A few honest counter-cases:

If the team is deeply committed to Anthropic. Claude's native skills, tool-use ergonomics, and Anthropic-specific primitives are first-class in the Anthropic-native equivalent built on the Claude Agent SDK and Skills. Provider portability stops being a benefit when you're never going to switch, and the Anthropic-native path gives you closer access to features Pydantic AI abstracts away by design.

If the AP workflow is genuinely stateful with multi-step human approval. Extract, route for approver A, route for approver B if amount exceeds a threshold, re-extract on rejection, post to the ledger when approved — that's a workflow graph with persistent state and human-in-the-loop interrupts, not a single agentic call. Pydantic AI handles the extraction step beautifully but isn't where you want the orchestration. LangGraph for stateful AP workflows with human-in-the-loop approval is the more honest fit when the multi-step state machine is the central problem.

If the task is genuinely one-shot vision-LLM extraction without the agent loop. When there are no tools to call, no DI to wire, no validator-retry loop to benefit from, and the developer is calling a vision model directly through its provider SDK to extract data from one document at a time, the agent framework's abstractions cost more than they buy. Calling vision LLMs directly for invoice extraction without an agent framework shows the bare-metal path. Pydantic AI adds weight that earns its keep on the typed-tools-and-retry workloads; for a one-shot script it's overhead.

If the team has standardised on LangChain across the codebase. Matching the existing stack often matters more than picking the technically optimal framework in isolation. The same workflow built with LangChain chains covers that path. Pydantic AI and LangChain solve overlapping problems with different philosophies — typed Python first versus chain composition first — and a team that's already fluent in chains will move faster staying there.

If the team hasn't picked a framework yet and wants the design-space view first. Framework-neutral agentic invoice processing patterns walks the architectural choices — agent loops, tool design, state, evaluation — without committing to any specific framework, so the team can pick from a position of understanding rather than recommendation.

The decision usually shakes out into four cases. Pick Pydantic AI when typed Python and model portability matter and the workload is single-call agentic extraction with tools and validation. Pick the Claude Agent SDK when you're Anthropic-committed and want Anthropic-native primitives. Pick LangGraph when the workflow is stateful multi-step with human approval as a first-class concern. Pick Pydantic-the-library alone when there's no agent loop and you just need JSON validation on whatever an LLM happens to return. Each of those is the right answer for some teams; this article showed what Pydantic AI does best when it is.

Pydantic AI Invoice Extraction: Build a Typed Agent

PDF input as a first-class primitive with BinaryContent and DocumentUrl

Designing a typed Invoice schema that survives real PDFs

Dependency injection for AP context: vendor master, currency, and tools

Self-healing extraction with model_validator and the retry loop

Provider portability: one model string, five providers

Production tracing with Pydantic Logfire

When the model can't crack the invoice: an extraction API fallback tool

When not to use Pydantic AI for invoice extraction

Extract invoice data to Excel with natural language prompts

Vision LLM Invoice Extraction with Python: Practical Guide

Build an AP Automation Agent with the OpenAI Agents SDK

LangGraph Accounts Payable Workflow with HITL Approval