How to Build an MCP Server for Invoice Extraction

Build an MCP server that exposes invoice extraction as a tool for AI assistants. Covers tool definition, API integration, and structured JSON responses.

Published
Updated
Reading Time
14 min
Topics:
API & Developer IntegrationMCPAI agentstool calling

An MCP server for invoice extraction does one thing well: it exposes a document extraction API as a tool that any MCP-compatible AI assistant can call on demand. When an assistant receives an invoice from a user, it calls the MCP tool with the document, the server forwards it to the extraction API, and structured JSON comes back containing the invoice number, vendor name, dates, line items, tax amounts, and totals. The assistant can then act on that data immediately, routing it into payment approval workflows, generating bookkeeping entries, or running spend analysis across multiple invoices. No copy-pasting, no manual data entry, no intermediate file formats.

The Model Context Protocol is gaining traction in document processing because it solves a real integration problem. Before MCP, connecting an AI assistant to an external capability like invoice extraction meant building custom tool-calling logic for each assistant platform. MCP standardizes that interface. Define a tool once, and any compatible client (Claude Desktop, Cursor, a custom agent framework) can discover and invoke it. For invoice processing specifically, this opens up a conversational pattern: a finance team member drops a PDF into a chat, the assistant extracts the data, and the user can ask follow-up questions like "What's the total before tax?" or "Does this vendor match our records?" The assistant has the structured data to answer.

The demand for this kind of agent integration is large and largely unmet. According to the 2025 Stack Overflow Developer Survey, 84% of developers now use or plan to use AI tools in their development process, yet only 14.1% use AI agents daily at work. That gap exists partly because connecting AI assistants to domain-specific tools still requires custom integration work, which is exactly what MCP standardizes.

Most MCP extraction tutorials are locked to a single vendor's platform or cover generic document processing rather than invoices specifically. This guide takes a different approach: building an MCP server in TypeScript that wraps a purpose-built invoice extraction API. You will define the tool schema, handle document input from the assistant, send documents to the API for model context protocol invoice processing, and return structured JSON that AI assistants can reason about and pass to downstream tools. The focus is on a working implementation you can adapt to your own agentic workflows.


Structured Extraction vs. Raw OCR Wrappers

The backend you wire into your MCP tool determines whether your AI assistant receives usable data or a wall of text it has to decipher on every call.

The raw OCR approach looks straightforward at first. Wrap Tesseract or a cloud OCR service as an MCP tool, pass it an invoice image, and return the recognized text. The problem: what comes back is unstructured text with no semantic labels. The AI assistant must figure out which string is the invoice number, which is the date, where line items start and end. That parsing logic lives in the model's reasoning layer, where it breaks differently for every invoice layout.

A purpose-built extraction API changes what the MCP tool returns. Instead of raw text, the tool responds with typed, structured JSON containing named fields: invoice_number, vendor_name, invoice_date, a line_items array (each with description, quantity, unit_price, and line-level tax), tax_amount, and total. The AI assistant receives data it can immediately act on, pass to downstream tools, or present to the user with zero parsing.

The practical difference shows up the moment your assistant handles real queries. When a user asks "What's the total across these three invoices?", structured output lets the assistant sum three total fields directly. With raw OCR text, the model has to re-interpret each document's unstructured content, locate what it thinks is the total, and hope it parsed correctly. The same problem multiplies for questions like "Which vendor charged the most?" or "Show me all line items over $500." Every query becomes an unreliable text-extraction exercise repeated at inference time.

This distinction maps directly to what a purpose-built extraction engine does differently under the hood. Rather than converting images to text and stopping there, the AI system understands document types, recognizes the relationships between data fields (invoice date vs. due date, subtotal vs. total after tax), and extracts invoice-level headers alongside individual line items with product codes, descriptions, quantities, and amounts. That is the kind of extracting structured JSON from invoices output that MCP tools need to return.

Structured extraction also handles the cases that break naive OCR wrappers in production:

  • Multi-page invoices where line item tables span page breaks and headers repeat
  • Tables with merged cells that confuse layout-based text extraction
  • Invoices in different languages and scripts, including Arabic, East Asian, Cyrillic, and Devanagari, where OCR accuracy drops sharply without language-aware models
  • Mixed document batches where non-invoice pages (cover letters, packing slips, remittance advice) need to be identified and filtered before extraction

For an MCP server handling AI agent document processing, this is the difference between a tool that returns reliable structured data and one that offloads the hardest part of the problem onto the language model calling it.


Defining the MCP Invoice Extraction Tool

Project Setup

Start by initializing a TypeScript project configured for ESM, since the Invoice Data Extraction Node SDK is ESM-only.

mkdir mcp-invoice-server && cd mcp-invoice-server
npm init -y

Set the project to use ES modules and install your dependencies: the MCP TypeScript SDK (FastMCP) for building the server, and the Invoice Data Extraction Node SDK for the actual extraction work.

npm install fastmcp @invoicedataextraction/sdk
npm install -D typescript @types/node

Update your package.json to include "type": "module" and ensure you are running Node.js 18+. Your tsconfig.json should target ESNext with module resolution set to NodeNext:

{
  "compilerOptions": {
    "target": "ESNext",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "outDir": "./dist",
    "strict": true,
    "declaration": true
  },
  "include": ["src/**/*"]
}

If you have worked with Node.js invoice extraction with the SDK before, the SDK setup will feel familiar. The difference here is that you are wrapping it inside an MCP tool that AI assistants can discover and invoke autonomously.

Defining the Tool

An MCP tool definition has three components: a name that agents use to call it, a description that tells the AI assistant when and why to use it, and an input schema that declares the parameters the tool accepts. The description matters more than you might expect. It is the primary mechanism by which an AI assistant decides whether this tool fits the user's request.

Here is the complete tool definition with server setup:

import { FastMCP } from "fastmcp";
import { z } from "zod";
import InvoiceDataExtraction from "@invoicedataextraction/sdk";
import { writeFile, unlink, mkdtemp } from "fs/promises";
import { join } from "path";
import { tmpdir } from "os";

const client = new InvoiceDataExtraction({
  api_key: process.env.INVOICE_EXTRACTION_API_KEY,
});

const server = new FastMCP({
  name: "invoice-extraction-server",
  version: "1.0.0",
});

server.addTool({
  name: "extract_invoice_data",
  description:
    "Extracts structured data from invoice documents (PDF, PNG, JPG). " +
    "Returns parsed fields such as vendor name, invoice number, dates, " +
    "line items, totals, and tax amounts as structured JSON. " +
    "Accepts base64-encoded file content.",
  parameters: z.object({
    file_content: z
      .string()
      .describe("Base64-encoded content of the invoice file"),
    filename: z
      .string()
      .describe("Original filename with extension (e.g., 'invoice.pdf')"),
    prompt: z
      .string()
      .optional()
      .describe("Optional extraction prompt to control what data to extract"),
    output_structure: z
      .enum(["per_invoice", "per_line_item"])
      .default("per_invoice")
      .describe(
        "Controls extraction granularity: 'per_invoice' returns one record " +
        "per document, 'per_line_item' returns one record per line item"
      ),
  }),
  execute: async ({ file_content, filename, prompt, output_structure }) => {
    // Full handler implementation below
  },
});

server.start({ transportType: "stdio" });

The input schema uses Zod (which FastMCP converts to JSON Schema for the MCP protocol). Each parameter is typed and described so the AI assistant knows exactly what to provide. The output_structure parameter gives the calling agent control over extraction granularity, which is critical when one workflow needs invoice-level summaries and another needs individual line items for reconciliation.

One constraint to plan for: MCP tools receive file data from AI assistants as base64-encoded content, but the Invoice Data Extraction Node SDK v1 accepts only local file paths (not buffers or streams). The tool handler needs to decode the incoming base64 data, write it to a temporary file, pass that path to the SDK's extract() method, and clean up the temp file afterward.

To connect this server to an MCP client, register it in the client's configuration. For Claude Desktop, add an entry to your claude_desktop_config.json:

{
  "mcpServers": {
    "invoice-extraction": {
      "command": "node",
      "args": ["./dist/index.js"],
      "env": {
        "INVOICE_EXTRACTION_API_KEY": "your-api-key"
      }
    }
  }
}

Once registered, the AI assistant discovers the tool automatically and can call it whenever a user's request involves invoice data.


Connecting the Extraction API

With the tool defined, the handler needs to receive MCP tool calls, run the extraction, and return structured invoice data. The invoice extraction API and its Node SDK handle the heavy lifting: uploading files, submitting the job, polling for completion, and retrieving results all collapse into a single async call.

Here is the full execute handler for the tool defined above:

execute: async ({ file_content, filename, prompt, output_structure }) => {
  let tempDir: string | null = null;

  try {
    // Decode base64 and write to a temp file
    tempDir = await mkdtemp(join(tmpdir(), "mcp-invoice-"));
    const tempFilePath = join(tempDir, filename);
    const fileBuffer = Buffer.from(file_content, "base64");
    await writeFile(tempFilePath, fileBuffer);

    const result = await client.extract({
      files: [tempFilePath],
      prompt:
        prompt ||
        "Extract invoice number, date, vendor name, line items, tax, and total",
      output_structure: output_structure,
      download: {
        formats: ["json"],
        output_path: tempDir,
      },
    });

    return JSON.stringify(result, null, 2);
  } finally {
    if (tempDir) {
      await unlink(join(tempDir, filename)).catch(() => {});
    }
  }
},

The extract() method manages the full lifecycle internally (upload, submission, polling, result retrieval), so your MCP server never touches job state.

The prompt parameter is flexible. You can specify exact fields like the default above, provide a goal-oriented instruction such as "I need to extract data for 1099 reporting," or omit it entirely and let the AI determine the relevant fields. This maps well to the model context protocol invoice processing pattern, where an AI assistant might pass along a user's natural-language request as the extraction prompt.

The download parameter with formats: ["json"] ensures you get structured JSON back rather than a spreadsheet file. JSON is the natural choice for MCP tool responses since the calling AI assistant needs to parse and reason over the data directly. Alternatively, you can call getDownloadUrl() with format "json" to retrieve a download URL if you prefer to defer retrieval.

The structured JSON response the AI assistant receives looks like this:

{
  "invoices": [
    {
      "invoice_number": "INV-2024-0847",
      "vendor_name": "Acme Supplies Ltd.",
      "invoice_date": "2024-11-15",
      "due_date": "2024-12-15",
      "subtotal": 4250.00,
      "tax": 382.50,
      "total": 4632.50,
      "line_items": [
        {
          "description": "Office furniture - standing desk",
          "quantity": 5,
          "unit_price": 850.00,
          "amount": 4250.00
        }
      ]
    }
  ]
}

Field-level values like invoice number, vendor, dates, amounts, and line items arrive already parsed and typed.

Consider the practical flow: a user uploads an invoice in Claude, asks "What's the total on this invoice and when is it due?" Claude calls the extraction tool through MCP, receives the structured data above, and answers immediately with the exact figures. It can also trigger follow-up workflows, comparing the total against a purchase order or flagging invoices past their due date.

If you are new to the SDK, the guide on getting started with the extraction API covers authentication setup and your first extraction call outside the MCP context. The patterns transfer directly into the handler above. If you prefer to call the extraction API directly over HTTP rather than through the SDK, the REST endpoints support the same workflow (upload, submit, poll, download), and the handler would make fetch calls instead of using extract().


Production Patterns for MCP Extraction Servers

A working MCP server is a starting point. Before exposing it to real users and AI assistants, you need to handle authentication, failures, and operational limits that only surface under actual load.

API Key Management

The extraction API authenticates every request with a Bearer token. Store this key as an environment variable and read it at server startup — never hardcode it in your tool definitions or source control.

const apiKey = process.env.INVOICE_EXTRACTION_API_KEY;
if (!apiKey) {
  throw new Error("INVOICE_EXTRACTION_API_KEY environment variable is required");
}

If your MCP server supports multiple users, you have two options: a single shared API key configured at the server level, or per-session keys passed by each connecting client. MCP handles authentication at the transport level between the AI assistant and the server. The server manages its own extraction API credentials internally — the assistant never sees or handles them.

Error Handling

The Node SDK distinguishes between two error categories, and your tool handler needs to catch both.

SdkError covers filesystem, network, and timeout issues — problems between your server and the extraction API. ApiResponseError covers API-level rejections: insufficient credits, rate limiting, encrypted files.

The critical pattern: catch these errors and return descriptive MCP tool responses instead of letting exceptions propagate and crash the server process.

try {
  const result = await client.extract({ /* ... */ });

  if (result.status === "failed") {
    return `Extraction failed: ${result.error_message}. ` +
      "The document may be corrupted or in an unsupported format.";
  }

  return JSON.stringify(result, null, 2);
} catch (error) {
  if (error.body?.error?.code === "INSUFFICIENT_CREDITS") {
    return "Extraction requires credits. The account has " +
      "insufficient balance — add credits to continue.";
  }
  if (error.body?.error?.code === "ENCRYPTED_FILE") {
    return "This PDF is password-protected. " +
      "Remove encryption before extracting.";
  }

  return `Extraction service error: ${error.message}. Retry in a few moments.`;
}

Note that extraction task failures — where the API processes the request but the extraction itself fails — come back as a result with status "failed", not as thrown exceptions. Always check the result status before formatting the response.

The SDK auto-retries on rate limiting and transient internal errors, so most throttling resolves without your intervention. For persistent rate limit failures, surface the error clearly so the assistant can inform the user rather than silently hanging. The API allows 30 extraction submissions per minute and 120 polling requests per minute, which is generous for interactive use but worth monitoring if you batch requests.

Timeout and Polling Configuration

The SDK's extract() method accepts a polling parameter that controls how it waits for results. Two values matter:

  • interval_ms — how often to poll for completion (minimum 5000ms)
  • timeout_ms — when to give up waiting

Since the AI assistant's user is waiting interactively, set a reasonable ceiling. A 120-second timeout works well for single documents:

const result = await client.extract({
  files: [tempFilePath],
  prompt: "Extract invoice number, date, vendor name, line items, tax, and total",
  output_structure: "per_invoice",
  polling: {
    interval_ms: 5000,
    timeout_ms: 120000,
  },
});

If the timeout fires, the SDK throws an error — handle it the same way as other SDK errors with a message suggesting the user retry.

Expanding Beyond Single Extraction

A production MCP server benefits from more than one tool. Each additional capability becomes a separate tool definition, turning your server into a full extraction toolkit the assistant can draw from.

Credit balance check. Expose getCreditsBalance() as its own tool. It returns available and reserved credits, letting the assistant verify capacity before attempting a large batch. Each processed page consumes 1 credit; failed pages cost nothing. Surfacing this information lets the assistant decide whether to proceed.

Batch extraction. A tool that accepts multiple documents and processes them sequentially, aggregating results into a single response. Space requests to respect the 30 submissions-per-minute rate limit.

Specialized extraction prompts. Expose distinct tools for line-item extraction, header-only extraction, or custom field targeting. Same underlying API, different prompt instructions, each registered as its own MCP tool.

server.addTool({
  name: "check_extraction_credits",
  description: "Check available credits for invoice extraction",
  parameters: z.object({}),
  execute: async () => {
    const balance = await client.getCreditsBalance();
    return `Available credits: ${balance.credits_balance} | ` +
      `Reserved: ${balance.credits_reserved}`;
  },
});

As MCP adoption grows, connecting an AI assistant to an invoice API through a dedicated MCP server becomes a standard integration pattern, and a purpose-built extraction backend that returns typed, structured data is a natural fit for it.

About the author

DH

David Harding

Founder, Invoice Data Extraction

David Harding is the founder of Invoice Data Extraction and a software developer with experience building finance-related systems. He oversees the product and the site's editorial process, with a focus on practical invoice workflows, document automation, and software-specific processing guidance.

Editorial process

This page is reviewed as part of Invoice Data Extraction's editorial process.

If this page discusses tax, legal, or regulatory requirements, treat it as general information only and confirm current requirements with official guidance before acting. The updated date shown above is the latest editorial review date for this page.

Continue Reading

Extract invoice data to Excel with natural language prompts

Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.

Exceptional accuracy on financial documents
1–8 seconds per page with parallel processing
50 free pages every month — no subscription
Any document layout, language, or scan quality
Native Excel types — numbers, dates, currencies
Files encrypted and auto-deleted within 24 hours