Build AP Automation with the Claude Agent SDK and Skills

Building Claude Agent SDK accounts payable automation is a two-layer composition problem. The Claude Agent SDK is the runtime — Claude Code's agent loop, built-in tools, hooks, AgentDefinition subagents, MCP servers, and sessions, exposed as a library in Python and TypeScript. Claude Skills are the procedural-knowledge layer — a SKILL.md plus referenced detail files and scripts that the agent reads on-demand through three-level progressive disclosure. The SDK runs the loop; Skills provide the AP know-how the loop consults only when each invoice warrants it.

That separation is what lets the rest of the architecture fall into place. A custom accounts-payable Skill holds the three-way match rules and the approval-threshold matrix as procedural content, not as ten thousand tokens of system prompt. A PostToolUse hook writes a structured row to an audit_log table on every tool the agent runs, which is the SOC 2 and SOX evidence shape an AP function actually needs. Three subagents declared as AgentDefinitions — an extractor, a matcher, and an approver — split the workflow so each role's reasoning surface stays small and its tool allow-list stays tight. Upstream of all that, Anthropic's pre-built pdf Skill handles general PDF intake, and the custom Skill's extract_invoice.py script is the specialised step that calls our extraction API to produce the structured per-invoice data the rest of the loop consumes.

This is a practitioner's map of how those primitives compose into a working AP system. It is not a generic agent tutorial dressed up as invoice content, not a reintroduction to Claude Code, and not a defence of choosing Claude over another provider. The reader on the other side has already decided. The Anthropic-ecosystem instantiation of framework-neutral agentic invoice processing patterns needs its own treatment because the SDK's primitives and the Skill packaging model are specific enough that "use an agent framework" is not actionable advice. The rest of the article walks the SDK skeleton, the Skill anatomy, the pre-built pdf composition, the hook layer, the subagent split, MCP integration, session resumption, and the hyperscaler routing — each section adding to a single agent definition rather than presenting an isolated demo.

Setting up the agent: query() and ClaudeAgentOptions for AP

The SDK installs as a single package per language:

pip install claude-agent-sdk
# or
npm install @anthropic-ai/claude-agent-sdk

It ships as a wrapper over the Claude Code CLI, which is the peer install the runtime expects on the same machine, and it reads ANTHROPIC_API_KEY from the environment (or a hyperscaler equivalent — covered later). Once both are in place, the entire AP agent starts from a single query() call driven by ClaudeAgentOptions. The Python shape:

import anyio
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=(
        "You are an accounts payable agent. Process invoices end to end: "
        "extract, three-way match, approve within policy, post to ERP. "
        "Delegate to the extractor, matcher, and approver subagents. "
        "Consult the accounts-payable Skill for matching rules and thresholds."
    ),
    allowed_tools=[
        "Read", "Glob", "Bash", "AskUserQuestion", "Monitor",
        "mcp__erp__get_po", "mcp__erp__get_receipt",
        "mcp__erp__post_invoice", "mcp__vendor_master__lookup",
    ],
    permission_mode="default",
    mcp_servers={
        "invoice_extraction": {"command": "python", "args": ["-m", "ide_mcp"]},
        "erp": {"url": "https://erp-mcp.internal.example/sse"},
        "vendor_master": {"command": "node", "args": ["vendor-master-mcp.js"]},
    },
    # AgentDefinition stubs - full role definitions in the subagents section below.
    agents={
        "extractor": {"description": "...", "prompt": "...", "tools": [...]},
        "matcher":   {"description": "...", "prompt": "...", "tools": [...]},
        "approver":  {"description": "...", "prompt": "...", "tools": [...]},
    },
    hooks={
        "PreToolUse":  [pre_post_to_erp_matcher],
        "PostToolUse": [audit_log_matcher],
    },
)

async def main():
    async for msg in query(prompt="Process the invoices in ./inbox", options=options):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, TextBlock):
                    print(block.text)

anyio.run(main)

Each field on ClaudeAgentOptions carries a specific AP-shaped reason for being set the way it is. allowed_tools is an explicit allow-list, not the SDK default of "all tools available." A production AP agent should not have unfettered file-system write or web-fetch access — the principle is that everything the agent does against the books goes through a script or an MCP call that you wrote and audited. Write is omitted in favour of routing all ERP writes through the post_to_erp script tool and the corresponding mcp__erp__post_invoice MCP path; freeform Write would punch a hole in that discipline.

permission_mode governs how the agent reacts to a tool call that's outside policy. The default mode lets hooks veto specific calls without freezing the entire loop — the agent receives the block decision, reasons about it, and (typically) routes to the escalation path the Skill's ESCALATION-PLAYBOOK.md defines. This is what makes the PreToolUse veto on over-threshold ERP posts (covered later) actually useful: the agent doesn't crash on a blocked tool call; it adapts.

mcp_servers is the integration surface. Existing invoice-extraction, ERP, and vendor-master MCP servers attach as-is without being rewritten as native SDK tools — the section on MCP composition covers this in depth. agents is where the AP role split lives. The full extractor, matcher, and approver AgentDefinitions ship with the application rather than living on the operator's filesystem. hooks is where the SOC 2 audit trail comes from. The two handlers referenced above — one for PreToolUse, one for PostToolUse — are the entire compliance instrumentation layer.

The built-in tools cover most of the AP intake plumbing without a single custom-tool definition. Read opens invoice PDFs once the pre-built pdf Skill or the extractor script has identified them. Glob scans ./inbox (or a network watched folder) for new invoices arriving as a batch. Bash runs the Skill's extract_invoice.py, three_way_match.py, and post_to_erp.py scripts and captures their stdout. AskUserQuestion is how the agent surfaces a borderline vendor-name match to the AP clerk for human confirmation. Monitor watches the background ERP import process and streams progress events back into the loop. None of these tools needs to be written; they ship with the SDK and behave the same way they do in Claude Code interactive sessions, which means the same patterns developers already trust apply directly here.

The same skeleton works against AWS Bedrock, Google Vertex AI, Azure AI Foundry, or Claude Platform on AWS — the four CLAUDE_CODE_USE_* environment variables route the model call without changing anything else in this configuration. The detail is in the deployment section near the end.

This skeleton is what every subsequent section extends. The Skill goes onto disk where the SDK's Skill discovery finds it. The hooks get their handler implementations. The agents dict gets three filled-in AgentDefinitions. The path from the Anthropic Agent SDK to our invoice extraction API surfaces as both a script call and an MCP tool. Nothing in ClaudeAgentOptions needs to be rewritten as the article progresses; it accretes.

Building the accounts-payable Skill: SKILL.md, scripts, and progressive disclosure

The procedural knowledge an AP function uses every day — three-way match tolerances, the company-specific approval matrix, the escalation playbook — does not belong in the agent's system prompt. It changes when finance policy changes, it varies by entity, and most of it is irrelevant to any single invoice. Packaging it as a Skill is what lets the agent read only the part it needs for the invoice in front of it.

The Skill folder layout sits under .claude/skills/accounts-payable/:

.claude/skills/accounts-payable/
├── SKILL.md
├── THREE-WAY-MATCH.md
├── APPROVAL-THRESHOLDS.md
├── ESCALATION-PLAYBOOK.md
└── scripts/
    ├── extract_invoice.py
    ├── three_way_match.py
    └── post_to_erp.py

SKILL.md is the entry point. The YAML frontmatter declares the Skill to the SDK and tells Claude when it's relevant; the body is the workflow phrased as instructions the agent reads, decides on, and executes:

---
name: accounts-payable
description: Process supplier invoices end to end. Use this skill when an
  invoice arrives for intake, when three-way matching against a PO and
  receipt is needed, or when an extracted invoice needs to be approved and
  posted to the ERP. Covers the matching tolerances, the approval matrix,
  and the escalation rules.
---

# Accounts payable workflow

1. **Intake.** Use the `pdf` Skill to identify invoice boundaries in a
   combined PDF if needed. Then call `scripts/extract_invoice.py
   <path>` to produce structured JSON for each invoice.
2. **Three-way match.** Read THREE-WAY-MATCH.md before running this step
   the first time in a session. Call `scripts/three_way_match.py
   <invoice-json>` and review the discrepancy report.
3. **Approval.** Read APPROVAL-THRESHOLDS.md to determine whether this
   invoice clears the autonomous approval limit or needs human sign-off.
   For human sign-off use the `AskUserQuestion` tool.
4. **Post.** Once approved, call `scripts/post_to_erp.py <invoice-json>`
   to write the invoice to the ERP. This step is audit-logged and
   subject to PreToolUse veto if it exceeds policy.
5. **Escalation.** If any step fails or returns ambiguous results, read
   ESCALATION-PLAYBOOK.md and follow the routing rules there.

The architecture this works against is precise: Anthropic's Skills architecture overview defines that Claude Skills load through three-level progressive disclosure: the name and description of every installed skill are pre-loaded into the agent's system prompt at startup, the full SKILL.md is read into context only when Claude judges the skill relevant to the current task, and additional bundled files referenced from SKILL.md are navigated only as needed. The mental model is an onboarding folder for an AP analyst — the cover page is always visible, the workflow document opens when AP work shows up, and the specialist reference files come off the shelf only when a specific step calls for them. A 2,000-token SKILL.md plus 4,000 tokens of detail files takes zero context until an invoice arrives, then the SKILL.md loads, and THREE-WAY-MATCH.md only loads if the agent reaches step 2.

The detail files are domain content rather than agent instructions. THREE-WAY-MATCH.md documents the price-and-quantity tolerances the matcher subagent enforces, the unit-of-measure normalisation rules, and what counts as a legitimate partial-receipt scenario. APPROVAL-THRESHOLDS.md is the company's approval matrix — autonomous for invoices under one threshold, single-approver for the next band, two-approver above another, escalation above some hard ceiling. ESCALATION-PLAYBOOK.md names who gets the ticket and through which channel. Finance policy lives in these files, not in the agent code, which is the entire reason Skills exists as a layer separate from the SDK.

The scripts are where the Skill's procedural knowledge meets executable code. extract_invoice.py is the structured-output step — the script that calls our AI invoice data extraction API to turn an invoice PDF into the per-invoice or per-line-item JSON the matcher subagent operates on. It uses the Python SDK for invoice extraction, which wraps the upload, submit, poll, and download workflow in a single extract() call so the script can stay short. If the developer prefers calling the HTTP API directly rather than via the SDK, the invoice extraction API quickstart has the equivalent endpoint walkthrough.

# scripts/extract_invoice.py
import json
import os
import sys
from invoicedataextraction import InvoiceDataExtraction

def main(invoice_path: str) -> int:
    client = InvoiceDataExtraction(
        api_key=os.environ["INVOICE_DATA_EXTRACTION_API_KEY"]
    )
    result = client.extract(
        files=[invoice_path],
        prompt=(
            "Extract invoice number, invoice date, vendor legal name, "
            "PO number, currency, net amount, tax amount, total amount, "
            "and per-line-item description, quantity, unit price, line total."
        ),
        output_structure="per_invoice",
        download={"formats": ["json"], "output_path": "./extracted"},
        console_output=False,
    )
    if not result.get("success") or result["pages"]["failed_count"] > 0:
        print(json.dumps({"status": "failed", "detail": result}), file=sys.stderr)
        return 1
    print(json.dumps({"status": "ok", "result": result}))
    return 0

if __name__ == "__main__":
    raise SystemExit(main(sys.argv[1]))

The script reads one invoice path from sys.argv, calls the API through the SDK, and prints structured JSON to stdout. That is the entire contract: the agent's Bash tool runs python scripts/extract_invoice.py <path>, captures stdout, and reasons about the JSON. Failures land on stderr with a non-zero exit code so the matcher subagent can route them to escalation rather than treating empty output as a clean extraction.

One Anthropic-side constraint is worth naming because it catches developers when they try to port the Skill to the hosted Skills API later: Skills running inside Anthropic's hosted code-execution sandbox have no network access and no runtime package installation. That doesn't affect the configuration in this article — the local Claude Agent SDK runs the Skill scripts on the developer's machine via Bash, so the invoicedataextraction-sdk import and the outbound HTTPS call both work fine. It does mean that if you later want the same Skill to run inside Anthropic's hosted code-execution environment, the extract_invoice.py step needs a different bridge — calling our API via a tool the agent has access to from outside the sandbox, rather than from inside it. Worth knowing now; not a blocker for the build the article walks through.

Skill discovery is automatic. The SDK scans .claude/skills/ at agent startup and registers every Skill it finds. ClaudeAgentOptions does not need a skills field pointing at the folder — once the directory is in place, the Skill participates in the agent's system-prompt metadata at startup and loads on-demand when its description matches the task at hand.

Composing the pre-built pdf Skill with the custom AP Skill

The obvious question after building a custom Skill is why Anthropic's pre-built pdf Skill doesn't already cover invoice processing. The answer matters because it shapes the rest of the intake design: pre-built pdf is a general-purpose document Skill that reads, navigates, and excerpts PDFs. It is not a structured invoice-extraction engine, and it isn't trying to be. The right pattern is composition — let pdf do general PDF work, then hand off to the custom accounts-payable Skill for the specialised step that produces invoice-shaped JSON.

The cases where pre-built pdf earns its place in the AP loop are the ones where the input isn't a clean one-invoice-per-file batch. A scanned-and-merged supplier PDF with forty invoices concatenated into a single 200-page file needs to be sliced into per-invoice ranges before per-invoice extraction makes sense. A scanned PDF where the text is embedded as images needs OCR-aware reading before any structured field extraction can run. A statement of account with embedded invoice tables needs table identification before each row can be treated as its own document. These are exactly the things a general document Skill is good at, and exactly the things you do not want to reimplement inside an invoice extractor.

For the local Claude Agent SDK runtime this article is built around, enabling the pre-built pdf Skill is a matter of having it installed in the discovery path alongside the custom accounts-payable Skill — the SDK loads it the same way it loads any other Skill, and the beta-headers handshake is handled by the underlying client. Nothing extra to wire up.

The picture is different if you're building against the Claude API directly rather than through the Agent SDK. Three beta headers on the client are required:

from anthropic import Anthropic

anthropic = Anthropic(
    default_headers={
        "anthropic-beta": (
            "code-execution-2025-08-25,"
            "skills-2025-10-02,"
            "files-api-2025-04-14"
        )
    },
)

This is the configuration that catches developers when they wire it up the first time outside the Agent SDK — if the pdf Skill is failing to load and the rest of the configuration looks right, the missing beta header is almost always the cause.

A composed intake flow runs like this. The agent calls Glob against ./inbox and finds combined-2026-Q2.pdf — a 200-page file that the supplier ships once a quarter with every invoice for the period concatenated. The agent recognises this as a job for the pdf Skill (its description matches "navigate and excerpt PDFs") and delegates the boundary-identification step there. The pdf Skill returns the page ranges where each invoice begins and ends. The agent then iterates the ranges, splitting the file or passing per-range references to Bash, and calls python scripts/extract_invoice.py <slice> for each. Each script call hits our extraction API and returns one structured invoice JSON; the matcher subagent picks up from there. Pre-built pdf handled the general-document work; the custom Skill's script handled claude skills invoice processing at production accuracy. Each Skill did the job it's good at, in the same loop.

Anthropic ships other pre-built Skills — xlsx, docx, pptx — and the composition pattern holds wherever an AP input occasionally arrives in a non-PDF format: a vendor statement as Excel, a remittance advice as Word. The custom Skill stays the structured-output specialist; the pre-built Skill handles whatever document shape the inbox happens to contain.

Wiring up the audit trail: PreToolUse and PostToolUse hooks

Hooks are how the SDK turns the AP function's audit-log requirement from an aspiration into a control surface. The mechanism attaches to lifecycle events — PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit — through the hooks field on ClaudeAgentOptions. Each entry is a HookMatcher that pairs a tool-name pattern (or a wildcard) with an async handler. The handler receives the tool's input or its result depending on the lifecycle stage, and it can either record what it sees, modify the data on its way through, or — for PreToolUse — block the call outright. The matcher is what makes a PostToolUse hook fire on every tool the agent runs without the developer enumerating them.

The audit handler an AP function actually needs writes a structured row on every tool call, not a print() to stdout. The row shape is what an internal auditor or external SOC 2 examiner will read three months from now:

import json
from datetime import datetime, timezone
from claude_agent_sdk import HookMatcher
import db  # your project's database module
from app_state import subagent_for_parent  # your delegation -> subagent map

async def audit_log_handler(input_data, tool_use_id, context):
    tool_name   = input_data["tool_name"]
    tool_input  = input_data.get("tool_input")
    tool_result = input_data.get("tool_response")
    parent_id   = input_data.get("parent_tool_use_id")
    invoice_id  = (
        tool_input.get("invoice_id")
        if isinstance(tool_input, dict) else None
    )
    # Resolve which subagent ran this call by looking up the delegation
    # tool_use_id in app state. The SDK exposes parent_tool_use_id; the
    # map from there to a subagent name is the developer's responsibility.
    agent_definition = subagent_for_parent(parent_id) if parent_id else "main"

    await db.execute(
        """
        INSERT INTO audit_log (
            ts, session_id, parent_tool_use_id, tool_use_id,
            agent_definition, invoice_id, tool_name,
            tool_input, tool_result, status
        ) VALUES (
            :ts, :session_id, :parent_tool_use_id, :tool_use_id,
            :agent_definition, :invoice_id, :tool_name,
            :tool_input, :tool_result, :status
        )
        """,
        {
            "ts": datetime.now(timezone.utc),
            "session_id": getattr(context, "session_id", None),
            "parent_tool_use_id": parent_id,
            "tool_use_id": tool_use_id,
            "agent_definition": agent_definition,
            "invoice_id": invoice_id,
            "tool_name": tool_name,
            "tool_input": json.dumps(tool_input)[:8000],
            "tool_result": json.dumps(tool_result)[:8000] if tool_result else None,
            "status": "ok" if tool_result else "pending",
        },
    )
    return {}  # no modification to the tool call

audit_log_matcher = HookMatcher(matcher="*", hooks=[audit_log_handler])

The matcher "*" is what attaches the handler to every tool. The parent_tool_use_id field is the thread that lets an auditor trace which subagent ran which call — when the matcher subagent delegates a three-way-match script run, the resulting Bash invocation's audit row carries the parent ID pointing at the matcher's delegation, and the subagent_for_parent lookup resolves it to "matcher". The SDK exposes the linkage; mapping a delegation tool_use_id back to a subagent name is the developer's responsibility and is usually a small dict populated as the main agent delegates. That linkage is what makes the audit log meaningful at role granularity rather than as a flat tool-call stream.

The veto half of the audit story is PreToolUse. The post-to-ERP case is the canonical example: any invoice posting that exceeds the approval threshold without a matching approver sign-off should be blocked before the script ever runs, regardless of what the agent's reasoning trace claimed:

async def pre_post_to_erp_handler(input_data, tool_use_id, context):
    tool_name  = input_data["tool_name"]
    tool_input = input_data["tool_input"]

    if tool_name != "Bash":
        return {}
    command = tool_input.get("command", "")
    if "post_to_erp.py" not in command:
        return {}

    invoice_json_path = _parse_arg(command)
    amount, has_approver_signoff = _inspect_invoice(invoice_json_path)
    threshold = await db.fetchone(
        "SELECT autonomous_limit FROM ap_policy WHERE entity_id = :e",
        {"e": current_entity_id()},  # from your app state, not the SDK context
    )

    if amount > threshold and not has_approver_signoff:
        return {
            "decision": "block",
            "reason": (
                f"Invoice amount {amount} exceeds autonomous approval limit "
                f"{threshold} without approver subagent sign-off. Route to "
                f"escalation per ESCALATION-PLAYBOOK.md."
            ),
        }
    return {}

pre_post_to_erp_matcher = HookMatcher(
    matcher="Bash",
    hooks=[pre_post_to_erp_handler],
)

When the block fires, the agent receives the reason string as the tool's response and continues the loop with that information in hand. The escalation playbook the SKILL.md references picks up from there — the agent reads the playbook, decides which approver to notify, and uses AskUserQuestion or an MCP-exposed ticketing tool to surface the case. The hook did not stop the agent; it stopped one specific tool call.

The veto layer sits downstream of the matcher subagent's structural validation. The matcher applies the substance — tolerances, unit-of-measure normalisation, PO-receipt-invoice three-way checks built around validating extracted invoice data in API workflows — and the PreToolUse hook is the last-mile policy gate that catches what the matcher missed or wasn't designed to enforce. Neither layer replaces the other; the matcher is about whether the invoice is internally consistent, and the hook is about whether the agent has the authority to post it.

SessionStart and SessionEnd round out the trail with the bookends an auditor wants. The SessionStart row carries who initiated the batch, the batch identifier, the entity, and the operator's username; the SessionEnd row carries the outcome counts — invoices processed, escalations raised, postings completed, postings blocked. A handler is structurally identical to the audit handler above, with the row schema tailored to batch-level facts rather than per-tool ones; the reader can extrapolate from the PostToolUse example without seeing the full code again.

Splitting the workflow across extractor, matcher, and approver subagents

A single agent loop holding the entire AP workflow — extraction, matching, approval, posting, escalation — has a tools surface and a reasoning surface large enough that compliance review and behaviour testing both get harder. Splitting the workflow across three subagents declared as AgentDefinitions in ClaudeAgentOptions.agents keeps each role's tool allow-list tight, its system prompt focused on one job, and its tool calls easy to attribute. This is the current shape for SDK subagents in production code; the older .claude/agents/*.md Claude Code file convention still works for interactive sessions, but production agent code that ships with the application uses AgentDefinition so the role definitions live in source control rather than on each operator's filesystem.

The three roles map directly onto the AP workflow:

agents = {
    "extractor": {
        "description": (
            "Turn a raw invoice PDF (or PDF page range identified by the "
            "pdf Skill) into structured invoice JSON. Use this for intake "
            "before any matching or approval work."
        ),
        "prompt": (
            "You are the extractor subagent. Given one invoice PDF path or "
            "page range, call scripts/extract_invoice.py to produce "
            "structured JSON. If the script returns a non-zero status, "
            "surface the failure to the parent agent. Do not attempt to "
            "match, approve, or post."
        ),
        "tools": ["Read", "Glob", "Bash"],
    },
    "matcher": {
        "description": (
            "Three-way match an extracted invoice against the corresponding "
            "PO and receipt records. Returns a discrepancy report and a "
            "match-status verdict."
        ),
        "prompt": (
            "You are the matcher subagent. Read THREE-WAY-MATCH.md from the "
            "accounts-payable Skill the first time you run in a session. "
            "Given an invoice JSON, fetch the PO via mcp__erp__get_po and "
            "the receipt via mcp__erp__get_receipt, then call "
            "scripts/three_way_match.py to compute the discrepancy report. "
            "Return the report and a verdict of 'matched', 'tolerance', "
            "or 'discrepancy'. Do not approve or post."
        ),
        "tools": [
            "Read", "Bash",
            "mcp__erp__get_po", "mcp__erp__get_receipt",
        ],
    },
    "approver": {
        "description": (
            "Decide whether a matched invoice clears autonomous approval "
            "or needs human sign-off. Posts approved invoices to the ERP "
            "or escalates per the playbook."
        ),
        "prompt": (
            "You are the approver subagent. Read APPROVAL-THRESHOLDS.md "
            "from the accounts-payable Skill. Given an invoice JSON and a "
            "matcher verdict, decide: autonomous approve, single-approver, "
            "two-approver, or escalate. For human approvals use "
            "AskUserQuestion. Once approved, call "
            "scripts/post_to_erp.py to write the invoice to the ERP. "
            "Respect the PreToolUse veto."
        ),
        "tools": ["Read", "AskUserQuestion", "Bash"],
    },
}

Each subagent's description is what the main agent reads when deciding which subagent to delegate to via the built-in Agent tool. The prompt is the subagent's system prompt — the role definition, the constraints, the explicit boundaries on what the subagent will and will not do. The tools list is the allow-list the subagent runs against; nothing else is callable from inside that subagent's loop. The matcher has no post_to_erp.py access; the approver has no PO-fetch MCP; the extractor has neither. The audit trail at role granularity falls out of this naturally because the tools the matcher could possibly call are a strict subset of the AP surface.

parent_tool_use_id is the thread that ties the subagent execution back to its delegation. When the main agent calls the Agent tool to delegate to, say, the matcher, that delegation call is itself a tool use with its own tool_use_id. Every message and tool call the matcher subagent produces carries parent_tool_use_id set to that delegation ID. The audit handler from the previous section reads that field and resolves it to the subagent's name through context, populating the agent_definition column on each audit row. The result is that an auditor can pull every row tagged agent_definition = 'matcher' and see exactly the tool calls the matcher subagent made during a given session, in order, with their inputs and results.

The matcher's structural validation work is upstream of the PreToolUse veto covered in the previous section. The matcher decides whether the invoice is internally consistent against its PO and receipt; the hook decides whether the approver has the authority to post the result. The matcher catches data problems; the hook catches policy violations. The two are designed to overlap deliberately — the same over-threshold post might be flagged by the matcher as needing escalation and blocked by the hook if it slips through anyway. Belt and braces is the right shape for an AP control surface.

Plugging in MCP servers alongside the Skill and built-in tools

Model Context Protocol servers attach to the AP agent through ClaudeAgentOptions.mcp_servers. The composition surface is what makes the integration story concrete: an existing invoice-extraction MCP, an ERP MCP exposing PO/receipt reads and the post-invoice write, and a vendor-master MCP exposing supplier lookup, all running alongside the built-in tools and the custom accounts-payable Skill in the same loop. Each MCP server stays in its own process and exposes its own tools; the SDK is the surface that lets the agent call any of them as if they were native.

The configuration is a dict (Python) or object (TypeScript) keyed by server name:

mcp_servers = {
    "invoice_extraction": {
        "command": "python",
        "args": ["-m", "invoice_extraction_mcp.server"],
        "env": {
            "INVOICE_DATA_EXTRACTION_API_KEY": os.environ[
                "INVOICE_DATA_EXTRACTION_API_KEY"
            ],
        },
    },
    "erp": {
        "url": "https://erp-mcp.internal.example/sse",
        "headers": {"Authorization": f"Bearer {os.environ['ERP_MCP_TOKEN']}"},
    },
    "vendor_master": {
        "command": "node",
        "args": ["vendor-master-mcp.js"],
    },
}

Local MCP servers (the invoice_extraction and vendor_master entries above) are launched as subprocesses by the SDK at agent start; remote MCP servers (the erp entry) connect over HTTP/SSE. The mcp_servers field accepts both shapes, which means a team can mix self-hosted internal MCPs with whatever their ERP vendor exposes without writing an adapter layer.

The way MCP-exposed tools surface inside the agent is what makes the audit-trail story carry over without modification. Each MCP tool appears in the agent's tool list with a server-prefixed name — mcp__invoice_extraction__extract, mcp__erp__get_po, mcp__erp__get_receipt, mcp__erp__post_invoice, mcp__vendor_master__lookup. These names participate in allowed_tools decisions the same way Read or Bash do, and HookMatcher patterns can match them by prefix (mcp__erp__*) or by exact name. The PostToolUse audit handler from the hooks section captures MCP calls automatically because its matcher is *; the PreToolUse veto for over-threshold posts can be written to match either Bash (when the post happens via scripts/post_to_erp.py) or mcp__erp__post_invoice (when the post happens via the ERP MCP). Either path is policed by the same control surface.

A concrete composition in the loop: the extractor subagent calls Bash to run scripts/extract_invoice.py against an invoice and uses mcp__invoice_extraction__extract as a fallback path when the script isn't installed in the local environment — same upstream API, two integration shapes for resilience. The matcher subagent calls mcp__erp__get_po and mcp__erp__get_receipt to fetch the matching records, then runs scripts/three_way_match.py via Bash to compute the discrepancy report. The approver subagent calls mcp__erp__post_invoice directly for invoices cleared autonomously, falling back to Bash and the post script for entities whose ERP isn't reachable through the MCP server. MCP integration into the agent is composition rather than a separate architecture — built-in tools, custom Skill scripts, and MCP servers all share one tool list and one audit surface, and the invoice-extraction MCP slots in alongside the rest without special handling.

Construction of the underlying MCP servers is its own topic and worth treating separately. The MCP server for invoice extraction walks through what's involved when building a new server from scratch — the protocol shape, the tool-schema declarations, the transport choices. This section is about how an existing MCP plugs into the Agent SDK; the construction guide covers how to get a new one onto the bus in the first place.

Session resumption and hyperscaler authentication

Two operational concerns separate a working demo from a deployable AP agent: what happens when a long-running batch is interrupted mid-flight, and how the same code routes through whichever cloud provider the enterprise's InfoSec team has standardised on. Both are configured outside the agent's logic; neither changes anything in the Skill, the hooks, or the subagents.

Each query() call can be associated with a session, and the session ID can be captured at batch start so a restart resumes the same conversation state rather than re-processing invoices the agent had already partway worked through:

from claude_agent_sdk import query, ClaudeAgentOptions

# At batch start
options = ClaudeAgentOptions(model="claude-sonnet-4-6", ...)
async for msg in query(
    prompt="Process the invoices in ./inbox/q2-2026",
    options=options,
):
    if msg.session_id:
        await checkpoint.save_session_id(batch_id, msg.session_id)
    # ... handle messages

# After worker restart
session_id = await checkpoint.load_session_id(batch_id)
async for msg in query(
    prompt="Continue processing.",
    options=ClaudeAgentOptions(model="claude-sonnet-4-6", resume=session_id, ...),
):
    ...

Resume restores the conversation thread the agent was running before the restart — its tool-call history, the prior subagent delegations, the Skill content it had already loaded. The batch picks up where it left off rather than starting the inbox scan over. For long-running AP work — month-end batches, quarterly statement processing, year-end true-ups — session resumption is the difference between an idempotent worker and one that doubles credits and risks duplicate ERP posts.

Session forking is the lighter operational tool from the same primitive. Forking a session lets a developer or an automated test branch the conversation at a known state, explore a what-if path (auto-approve a borderline invoice, see how the downstream loop reacts, observe the audit-log rows), and discard the fork without affecting the main session. It's the agent equivalent of branching a git history to try something.

The four CLAUDE_CODE_USE_* environment variables route the same agent code through different hyperscaler tenancies. CLAUDE_CODE_USE_BEDROCK=1 routes through AWS Bedrock and expects AWS credentials in the environment (access key, secret, region) the way the AWS SDK expects them. CLAUDE_CODE_USE_VERTEX=1 routes through Google Vertex AI and expects application default credentials configured for the project. CLAUDE_CODE_USE_FOUNDRY=1 routes through Azure AI Foundry and expects an Azure credential — either a service principal, a managed identity, or a developer credential — visible to the runtime. CLAUDE_CODE_USE_ANTHROPIC_AWS=1 routes through Claude Platform on AWS and expects AWS credentials similar to the Bedrock case. The two AWS-resident options differ in who runs the platform: Bedrock is AWS's managed model service (Anthropic is one of several model providers on it, billed and managed by AWS), while Claude Platform on AWS is Anthropic's own platform deployed on AWS infrastructure (billed and managed by Anthropic, with AWS only providing the underlying hosting). The variable choice plus the matching cloud SDK credentials is the entire change.

Everything else stays the same. The model string claude-sonnet-4-6 (or whatever the model ID is on the target hyperscaler), the query() call signature, the ClaudeAgentOptions fields, the Skills folder, the hook handlers, the subagent definitions, the MCP server list — none of these are touched. The agent code is hyperscaler-portable by design, which is the entire reason the environment-variable switch exists rather than four separate client classes. An enterprise team that develops against the direct Anthropic API and deploys against Bedrock for InfoSec reasons sets one env var in the deployment environment and ships the same code.

When the Claude Agent SDK isn't the right tool

The composition this article walks through fits a specific reader profile — Anthropic-committed, document-extraction-heavy, in need of procedural-knowledge packaging and a hook-driven audit trail. Three adjacent options are better fits for adjacent problems.

LangGraph is the better fit when the AP workflow is best expressed as a stateful graph with explicit interrupt points where human approvers re-enter the loop. LangGraph's persistence model treats interrupts as first-class primitives, with checkpoints at each graph node and a clean way to pause a graph waiting for human input that can take hours or days — the end-to-end LangGraph AP build with a StateGraph and interrupt-based approval gate walks through exactly that shape, with the Postgres checkpointer and idempotency rules attached. The Agent SDK can model human-in-the-loop work through AskUserQuestion and session resumption, but if the workflow is genuinely graph-shaped — fixed nodes, explicit transitions, long pauses waiting on approvers — picking the right shape upstream saves rebuilding it later.

Pydantic AI is the better fit when model portability is a real requirement. If the AP system needs to run against multiple model providers without code rewrites — for cost optimisation, for vendor risk distribution, for the ability to swap providers if commercial terms change — Pydantic AI's typed agent definitions for invoice extraction abstract the provider and the Agent SDK does not. The Agent SDK is Anthropic-native by design; that's a strength when the team has committed to Anthropic and a weakness when the team needs to stay provider-neutral.

The OpenAI Agents SDK is the obvious choice when the rest of the team's stack is OpenAI-based. The marginal advantage of Skills as a procedural-knowledge packaging mechanism does not outweigh the cost of running two model-provider integrations in parallel. Pick the SDK that matches the rest of the platform and accept that the procedural-knowledge layer needs to be solved differently — the equivalent AP automation build on the OpenAI Agents SDK maps the same extractor/matcher/approver roles onto Agent, @function_tool, handoffs, guardrails, and Runner.

For a build that's Anthropic-committed, document-extraction-heavy, and in need of Skills-style procedural-knowledge packaging plus the hook-driven audit trail a compliance team will actually accept, the composition in this article — query() with a tight ClaudeAgentOptions, a custom accounts-payable Skill, the pre-built pdf Skill upstream, PreToolUse and PostToolUse hooks for the SOC 2 trail, three AgentDefinition subagents, MCP servers for ERP and vendor-master integration, sessions for batch durability, and the hyperscaler env vars for enterprise routing — is the working AP system the reader came for.

Build AP Automation with the Claude Agent SDK and Skills

Setting up the agent: query() and ClaudeAgentOptions for AP

Building the accounts-payable Skill: SKILL.md, scripts, and progressive disclosure

Composing the pre-built pdf Skill with the custom AP Skill

Wiring up the audit trail: PreToolUse and PostToolUse hooks

Splitting the workflow across extractor, matcher, and approver subagents

Plugging in MCP servers alongside the Skill and built-in tools

Session resumption and hyperscaler authentication

When the Claude Agent SDK isn't the right tool

Extract invoice data to Excel with natural language prompts

LangGraph Accounts Payable Workflow with HITL Approval

Build an AP Automation Agent with the OpenAI Agents SDK

Pydantic AI Invoice Extraction: Build a Typed Agent