How to Build an MCP Server for Invoice Extraction

An MCP server for invoice extraction does one thing well: it exposes a document extraction API as a tool that any MCP-compatible AI assistant can call on demand. When an assistant receives an invoice from a user, it calls the MCP tool with the document, the server forwards it to the extraction API, and structured JSON comes back containing the invoice number, vendor name, dates, line items, tax amounts, and totals. The assistant can then act on that data immediately, routing it into payment approval workflows, generating bookkeeping entries, or running spend analysis across multiple invoices. No copy-pasting, no manual data entry, no intermediate file formats.

The Model Context Protocol is gaining traction in document processing because it solves a real integration problem. Before MCP, connecting an AI assistant to an external capability like invoice extraction meant building custom tool-calling logic for each assistant platform. MCP standardizes that interface. Define a tool once, and any compatible client (Claude Desktop, Cursor, a custom agent framework) can discover and invoke it. For invoice processing specifically, this opens up a conversational pattern: a finance team member drops a PDF into a chat, the assistant extracts the data, and the user can ask follow-up questions like "What's the total before tax?" or "Does this vendor match our records?" The assistant has the structured data to answer.

Developer interest in AI tools is broad, but daily agent use is still early: the 2025 Stack Overflow Developer Survey reports that 84% of developers use or plan to use AI tools, while 14.1% use AI agents daily at work. One blocker is the integration work needed to connect assistants to domain-specific systems; MCP standardizes that interface.

Most MCP extraction tutorials are locked to a single vendor's platform or cover generic document processing rather than invoices. This guide builds a TypeScript MCP server around a purpose-built invoice extraction API: define the tool schema, accept document input from the assistant, send the file to the API for MCP-based invoice processing, and return structured JSON the assistant can pass to downstream tools.

Structured Extraction vs. Raw OCR Wrappers

The backend you wire into your MCP tool determines whether your AI assistant receives usable data or a wall of text it has to decipher on every call.

The raw OCR approach looks straightforward at first. Wrap Tesseract or a cloud OCR service as an MCP tool, pass it an invoice image, and return the recognized text. The problem: what comes back is unstructured text with no semantic labels. The AI assistant must figure out which string is the invoice number, which is the date, where line items start and end. That parsing logic lives in the model's reasoning layer, where it breaks differently for every invoice layout.

A purpose-built extraction API changes what the MCP tool returns. Instead of raw text, the tool responds with typed, structured JSON containing named fields: invoice_number, vendor_name, invoice_date, a line_items array (each with description, quantity, unit_price, and line-level tax), tax_amount, and total. The AI assistant receives data it can immediately act on, pass to downstream tools, or present to the user with zero parsing.

The practical difference shows up the moment your assistant handles real queries. When a user asks "What's the total across these three invoices?", structured output lets the assistant sum three total fields directly. With raw OCR text, the model has to re-interpret each document's unstructured content, locate what it thinks is the total, and hope it parsed correctly. The same problem multiplies for questions like "Which vendor charged the most?" or "Show me all line items over $500." Every query becomes an unreliable text-extraction exercise repeated at inference time.

This distinction maps directly to what a purpose-built extraction engine does differently under the hood. Rather than converting images to text and stopping there, the AI system understands document types, recognizes the relationships between data fields (invoice date vs. due date, subtotal vs. total after tax), and extracts invoice-level headers alongside individual line items with product codes, descriptions, quantities, and amounts. That is the kind of extracting structured JSON from invoices output that MCP tools need to return.

Structured extraction also handles the cases that break naive OCR wrappers in production:

Multi-page invoices where line item tables span page breaks and headers repeat
Tables with merged cells that confuse layout-based text extraction
Invoices in different languages and scripts, including Arabic, East Asian, Cyrillic, and Devanagari, where OCR accuracy drops sharply without language-aware models
Mixed document batches where non-invoice pages (cover letters, packing slips, remittance advice) need to be identified and filtered before extraction

For an MCP server handling AI agent document processing, this is the difference between a tool that returns reliable structured data and one that offloads the hardest part of the problem onto the language model calling it.

Defining the MCP Invoice Extraction Tool

Project Setup

Start by initializing a TypeScript project configured for ESM, since the Invoice Data Extraction Node SDK is ESM-only.

mkdir mcp-invoice-server && cd mcp-invoice-server
npm init -y

Set the project to use ES modules and install your dependencies: the MCP TypeScript SDK (FastMCP) for building the server, and the Invoice Data Extraction Node SDK for the actual extraction work.

npm install fastmcp @invoicedataextraction/sdk
npm install -D typescript @types/node

Update your package.json to include "type": "module" and ensure you are running Node.js 18+. Your tsconfig.json should target ESNext with module resolution set to NodeNext:

{
  "compilerOptions": {
    "target": "ESNext",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "outDir": "./dist",
    "strict": true,
    "declaration": true
  },
  "include": ["src/**/*"]
}

If you have worked with Node.js invoice extraction with the SDK before, the SDK setup will feel familiar. The difference here is that you are wrapping it inside an MCP tool that AI assistants can discover and invoke autonomously.

Defining the Tool

An MCP tool definition has three components: a name that agents use to call it, a description that tells the AI assistant when and why to use it, and an input schema that declares the parameters the tool accepts. The description matters more than you might expect. It is the primary mechanism by which an AI assistant decides whether this tool fits the user's request.

Here is the complete tool definition with server setup:

import { FastMCP } from "fastmcp";
import { z } from "zod";
import InvoiceDataExtraction from "@invoicedataextraction/sdk";
import { writeFile, rm, mkdtemp } from "fs/promises";
import { basename, join } from "path";
import { tmpdir } from "os";

const client = new InvoiceDataExtraction({
  api_key: process.env.INVOICE_EXTRACTION_API_KEY,
});

const server = new FastMCP({
  name: "invoice-extraction-server",
  version: "1.0.0",
});

server.addTool({
  name: "extract_invoice_data",
  description:
    "Extracts structured data from invoice documents (PDF, PNG, JPG). " +
    "Returns parsed fields such as vendor name, invoice number, dates, " +
    "line items, totals, and tax amounts as structured JSON. " +
    "Accepts base64-encoded file content.",
  parameters: z.object({
    file_content: z
      .string()
      .describe("Base64-encoded content of the invoice file"),
    filename: z
      .string()
      .describe("Original filename with extension (e.g., 'invoice.pdf')"),
    prompt: z
      .string()
      .optional()
      .describe("Optional extraction prompt to control what data to extract"),
    output_structure: z
      .enum(["per_invoice", "per_line_item"])
      .default("per_invoice")
      .describe(
        "Controls extraction granularity: 'per_invoice' returns one record " +
        "per document, 'per_line_item' returns one record per line item"
      ),
  }),
  execute: async ({ file_content, filename, prompt, output_structure }) => {
    // Full handler implementation below
  },
});

server.start({ transportType: "stdio" });

The input schema uses Zod (which FastMCP converts to JSON Schema for the MCP protocol). Each parameter is typed and described so the AI assistant knows exactly what to provide. The output_structure parameter gives the calling agent control over extraction granularity, which is critical when one workflow needs invoice-level summaries and another needs individual line items for reconciliation.

One constraint to plan for: MCP tools receive file data from AI assistants as base64-encoded content, but the Invoice Data Extraction Node SDK v1 accepts only local file paths (not buffers or streams). The tool handler needs to decode the incoming base64 data, write it to a temporary file, pass that path to the SDK's extract() method, and clean up the temp file afterward.

To connect this server to an MCP client, register it in the client's configuration. For Claude Desktop, add an entry to your claude_desktop_config.json:

{
  "mcpServers": {
    "invoice-extraction": {
      "command": "node",
      "args": ["./dist/index.js"],
      "env": {
        "INVOICE_EXTRACTION_API_KEY": "your-api-key"
      }
    }
  }
}

Once registered, the AI assistant discovers the tool automatically and can call it whenever a user's request involves invoice data.

Connecting the Extraction API

With the tool defined, the handler needs to receive MCP tool calls, run the extraction, and return structured invoice data. The invoice extraction API and its Node SDK handle the heavy lifting: uploading files, submitting the job, polling for completion, and retrieving results all collapse into a single async call.

Here is the full execute handler for the tool defined above:

execute: async ({ file_content, filename, prompt, output_structure }) => {
  let tempDir: string | null = null;

  try {
    // Decode base64 and write to a temp file
    tempDir = await mkdtemp(join(tmpdir(), "mcp-invoice-"));
    const safeFilename = basename(filename);
    const tempFilePath = join(tempDir, safeFilename);
    const fileBuffer = Buffer.from(file_content, "base64");
    await writeFile(tempFilePath, fileBuffer);

    const result = await client.extract({
      files: [tempFilePath],
      prompt:
        prompt ||
        "Extract invoice number, date, vendor name, line items, tax, and total",
      output_structure: output_structure,
      download: {
        formats: ["json"],
        output_path: tempDir,
      },
    });

    return JSON.stringify(result, null, 2);
  } finally {
    if (tempDir) {
      await rm(tempDir, { recursive: true, force: true });
    }
  }
},

The extract() method manages the full lifecycle internally (upload, submission, polling, result retrieval), so your MCP server never touches job state.

The prompt parameter is flexible. You can specify exact fields like the default above, provide a goal-oriented instruction such as "I need to extract data for 1099 reporting," or omit it entirely and let the assistant determine the relevant fields. In an MCP workflow, the assistant can pass the user's natural-language request through as the extraction prompt.

The download parameter with formats: ["json"] ensures you get structured JSON back rather than a spreadsheet file. JSON is the natural choice for MCP tool responses since the calling AI assistant needs to parse and reason over the data directly. Alternatively, you can call getDownloadUrl() with format "json" to retrieve a download URL if you prefer to defer retrieval.

The structured JSON response the AI assistant receives looks like this:

{
  "invoices": [
    {
      "invoice_number": "INV-2024-0847",
      "vendor_name": "Acme Supplies Ltd.",
      "invoice_date": "2024-11-15",
      "due_date": "2024-12-15",
      "subtotal": 4250.00,
      "tax": 382.50,
      "total": 4632.50,
      "line_items": [
        {
          "description": "Office furniture - standing desk",
          "quantity": 5,
          "unit_price": 850.00,
          "amount": 4250.00
        }
      ]
    }
  ]
}

Field-level values like invoice number, vendor, dates, amounts, and line items arrive already parsed and typed.

Consider the practical flow: a user uploads an invoice in Claude, asks "What's the total on this invoice and when is it due?" Claude calls the extraction tool through MCP, receives the structured data above, and answers immediately with the exact figures. It can also trigger follow-up workflows, comparing the total against a purchase order or flagging invoices past their due date. For teams building a complete agentic workflow around this MCP tool, the Claude Agent SDK paired with custom Skills for AP automation composes the query loop, hooks, and subagents that turn one-off extractions into an end-to-end accounts payable pipeline. If you prefer defining tools natively in Python rather than over MCP, the same flow can be built with the OpenAI Agents SDK using @function_tool, handoffs, and guardrails for AP automation.

If you are new to the SDK, the guide on getting started with the extraction API covers authentication setup and your first extraction call outside the MCP context. The patterns transfer directly into the handler above. If you prefer to call the extraction API directly over HTTP rather than through the SDK, the REST endpoints support the same workflow (upload, submit, poll, download), and the handler would make fetch calls instead of using extract().

Production Patterns for MCP Extraction Servers

A working MCP server is a starting point. Before exposing it to real users and AI assistants, you need to handle authentication, failures, and operational limits that only surface under actual load.

API Key Management

The extraction API authenticates every request with a Bearer token. Store this key as an environment variable and read it at server startup — never hardcode it in your tool definitions or source control.

const apiKey = process.env.INVOICE_EXTRACTION_API_KEY;
if (!apiKey) {
  throw new Error("INVOICE_EXTRACTION_API_KEY environment variable is required");
}

If your MCP server supports multiple users, you have two options: a single shared API key configured at the server level, or per-session keys passed by each connecting client. MCP handles authentication at the transport level between the AI assistant and the server. The server manages its own extraction API credentials internally — the assistant never sees or handles them.

Error Handling

The Node SDK distinguishes between two error categories, and your tool handler needs to catch both.

SdkError covers filesystem, network, and timeout issues — problems between your server and the extraction API. ApiResponseError covers API-level rejections: insufficient credits, rate limiting, encrypted files.

The critical pattern: catch these errors and return descriptive MCP tool responses instead of letting exceptions propagate and crash the server process.

try {
  const result = await client.extract({ /* ... */ });

  if (result.status === "failed") {
    return `Extraction failed: ${result.error_message}. ` +
      "The document may be corrupted or in an unsupported format.";
  }

  return JSON.stringify(result, null, 2);
} catch (error: unknown) {
  const apiError = error as {
    body?: { error?: { code?: string } };
    message?: string;
  };

  if (apiError.body?.error?.code === "INSUFFICIENT_CREDITS") {
    return "Extraction requires credits. The account has " +
      "insufficient balance — add credits to continue.";
  }
  if (apiError.body?.error?.code === "ENCRYPTED_FILE") {
    return "This PDF is password-protected. " +
      "Remove encryption before extracting.";
  }

  return `Extraction service error: ${apiError.message || "Unknown error"}. ` +
    "Retry in a few moments.";
}

Note that extraction task failures — where the API processes the request but the extraction itself fails — come back as a result with status "failed", not as thrown exceptions. Always check the result status before formatting the response.

The SDK auto-retries on rate limiting and transient internal errors, so most throttling resolves without your intervention. For persistent rate limit failures, surface the error clearly so the assistant can inform the user rather than silently hanging. The API allows 30 extraction submissions per minute and 120 polling requests per minute, which is generous for interactive use but worth monitoring if you batch requests.

Timeout and Polling Configuration

The SDK's extract() method accepts a polling parameter that controls how it waits for results. Two values matter:

interval_ms — how often to poll for completion (minimum 5000ms)
timeout_ms — when to give up waiting

Since the AI assistant's user is waiting interactively, set a reasonable ceiling. A 120-second timeout works well for single documents:

const result = await client.extract({
  files: [tempFilePath],
  prompt: "Extract invoice number, date, vendor name, line items, tax, and total",
  output_structure: "per_invoice",
  polling: {
    interval_ms: 5000,
    timeout_ms: 120000,
  },
});

If the timeout fires, the SDK throws an error — handle it the same way as other SDK errors with a message suggesting the user retry.

Expanding Beyond Single Extraction

A production MCP server benefits from more than one tool. Each additional capability becomes a separate tool definition, turning your server into a full extraction toolkit the assistant can draw from.

Credit balance check. Expose getCreditsBalance() as its own tool. It returns available and reserved credits, letting the assistant verify capacity before attempting a large batch. Each processed page consumes 1 credit; failed pages cost nothing. Surfacing this information lets the assistant decide whether to proceed.

Batch extraction. A tool that accepts multiple documents and processes them sequentially, aggregating results into a single response. Space requests to respect the 30 submissions-per-minute rate limit.

Specialized extraction prompts. Expose distinct tools for line-item extraction, header-only extraction, or custom field targeting. Same underlying API, different prompt instructions, each registered as its own MCP tool.

server.addTool({
  name: "check_extraction_credits",
  description: "Check available credits for invoice extraction",
  parameters: z.object({}),
  execute: async () => {
    const balance = await client.getCreditsBalance();
    return `Available credits: ${balance.credits_balance} | ` +
      `Reserved: ${balance.credits_reserved}`;
  },
});

The result is a focused MCP server: the assistant gets a dependable invoice-extraction tool, and downstream finance workflows receive typed JSON instead of model-interpreted OCR text.