C# Invoice Extraction API: .NET REST Integration Guide

REST guide for .NET developers integrating invoice extraction: upload files with HttpClient, submit jobs, poll safely, and map JSON, CSV, or XLSX results.

Published
Updated
Reading Time
9 min
Topics:
API & Developer IntegrationC#.NETHttpClientASP.NET CoreREST API

A C# invoice extraction API integration works by calling a REST API from HttpClient: create an upload session, send the invoice files, submit an extraction task, poll until processing finishes, and then download JSON, CSV, or XLSX results. When you use Invoice Data Extraction from C#, that is the intended model. The platform has official SDKs for Python and Node.js, while .NET teams integrate directly through HTTP without leaving the Microsoft stack.

That matters because this query is not really about OCR in isolation. Hosted invoice extraction from .NET usually sits inside a wider finance workflow where extracted data feeds AP automation, approval routing, reconciliation, or ERP integration. According to JetBrains' State of .NET 2025, 98% of surveyed .NET professionals use C# in .NET projects, so guidance for Microsoft-stack teams is not a fringe need. It fills a practical gap between generic vendor pages and local OCR tutorials.

If you are evaluating an invoice extraction REST API for .NET backends, the useful question is not whether C# can call it. It can. The real question is how to structure upload, submission, polling, and result handling so the integration fits a typed backend and a real finance workflow. That is why this guide stays focused on the staged REST path rather than drifting into local OCR setup or a broad product comparison.

Direct REST is usually the right path when the surrounding application is already in C#, the team wants full control over HttpClient, background processing, and typed result handling, and a Python or Node sidecar would add operational surface area without much value.

Create the Upload Session and Send Files with HttpClient

In a .NET service, the upload layer should do one job well: register files, stream bytes to the returned storage URLs, and hand completed file references to the extraction layer. The flow starts with POST /uploads/sessions. Your app sends a unique upload_session_id and a files array that includes each file_id, file_name, and file_size_bytes. Invoice Data Extraction then returns the upload metadata your service needs, including the part_size value that determines whether a file can be uploaded in one request or needs chunking.

The practical constraints belong in your application design, not buried in a troubleshooting guide. One upload session can contain 1 to 6,000 files. PDFs can be up to 150 MB, image files up to 5 MB, and the total batch can reach 2 GB. In an ASP.NET Core service, those are good validation boundaries to enforce before a job enters the worker queue, because predictable size errors are cheaper to reject early than to discover after orchestration has started.

From there, HttpClient handles two different request types:

  1. JSON POST requests to create the session and request part upload URLs.
  2. Raw PUT requests that stream each file or file chunk to the presigned URLs returned by the API.

For small files, that may be a single PUT. For larger PDFs, the C# HttpClient multipart upload API pattern is to split the file according to the returned part_size, upload each chunk to its matching URL, capture the ETag response header for every successful PUT, and then call the completion endpoint with the part numbers and quoted ETag values. The presigned upload URLs are short-lived, so background workers should request them only when they are ready to send bytes immediately.

If you want the generic sequence before translating it into C# service boundaries, the generic invoice extraction REST quickstart is a useful companion. For production code, though, it helps to keep upload orchestration separate from extraction submission so retries, telemetry, and fault handling stay predictable.

A practical .NET request shape usually looks like this:

  • CreateUploadSessionRequest with upload_session_id and a files collection containing file_id, file_name, and file_size_bytes
  • GetPartUploadUrls request for the file_id and required part numbers
  • CompleteUploadRequest with the same file_id plus a parts collection containing each part_number and quoted e_tag

A compact C# upload boundary can keep those concepts explicit without turning the integration into a framework:

public sealed record UploadFile(string file_id, string file_name, long file_size_bytes);
public sealed record CreateUploadSessionRequest(string upload_session_id, UploadFile[] files);
public sealed record UploadedPart(int part_number, string e_tag);
public sealed record CompleteUploadRequest(string file_id, UploadedPart[] parts);

var session = new CreateUploadSessionRequest(
    uploadSessionId,
    new[] { new UploadFile(fileId, fileName, fileLength) });

using var createResponse = await http.PostAsJsonAsync(
    "/uploads/sessions",
    session,
    cancellationToken);
createResponse.EnsureSuccessStatusCode();

await using var stream = File.OpenRead(path);
using var put = new HttpRequestMessage(HttpMethod.Put, presignedUrl)
{
    Content = new StreamContent(stream)
};

using var uploadResponse = await http.SendAsync(put, cancellationToken);
uploadResponse.EnsureSuccessStatusCode();

var etag = uploadResponse.Headers.ETag?.Tag
    ?? throw new InvalidOperationException("Upload response did not include an ETag.");

var complete = new CompleteUploadRequest(fileId, new[] { new UploadedPart(1, etag) });

For multi-part PDFs, repeat the PUT for each returned part URL, keep every part_number and e_tag, and submit the complete ordered list to the completion endpoint.

Submit Extraction Jobs and Poll Long-Running Tasks Safely

Once every file is completed, the next step is not another upload call. It is a task submission. POST /extractions takes a unique submission_id, the upload_session_id, the completed file_ids, a task name, a prompt, and the output_structure you want back. This is where a .NET service stops being a transport wrapper and starts acting like an extraction orchestrator.

The prompt design matters. Invoice Data Extraction accepts either natural-language instructions or a structured prompt object with named fields and general rules. Natural language is fine when the use case is flexible, but many finance teams should prefer the structured form because it gives them stable column names and cleaner downstream handling. If you know you need invoice number, invoice date, supplier name, totals, taxes, or line items to land in specific fields, defining that explicitly reduces ambiguity before the result ever reaches your C# models.

The output_structure choice should also be deliberate:

  • automatic when you want the API to choose the best shape
  • per_invoice when one row or object per invoice fits the workflow
  • per_line_item when the downstream task is spend analysis, coding, or reconciliation at line level

If you submit with automatic, the completed response tells you which structure the service actually used, so your .NET pipeline can branch on the final shape instead of guessing.

After submission, persist the returned extraction_id and poll from a worker, not from a long-running controller action. Honor Retry-After when present and keep a minimum delay while the job is still processing:

public sealed record ExtractionStatus(string status);

static async Task<ExtractionStatus> PollExtractionAsync(
    HttpClient http,
    string extractionId,
    CancellationToken cancellationToken)
{
    while (true)
    {
        using var response = await http.GetAsync(
            $"/extractions/{extractionId}",
            cancellationToken);

        var retryAfter = response.Headers.RetryAfter?.Delta ?? TimeSpan.FromSeconds(5);

        if ((int)response.StatusCode == 429)
        {
            await Task.Delay(retryAfter, cancellationToken);
            continue;
        }

        response.EnsureSuccessStatusCode();

        var current = await response.Content.ReadFromJsonAsync<ExtractionStatus>(
            cancellationToken);

        if (current?.status == "completed")
        {
            return current;
        }

        if (current?.status == "failed")
        {
            throw new InvalidOperationException("Invoice extraction failed.");
        }

        await Task.Delay(retryAfter, cancellationToken);
    }
}

Keep submission_id with the job record. If the request times out after the API receives it, resending the same submission_id avoids a duplicate extraction job; the web dashboard can also show API tasks during rollout.

Turn JSON, CSV, and XLSX Output into Typed Finance Workflows

A completed extraction response gives your .NET service more than a file to download. It tells you which output structure came back, how many credits were consumed, how many pages succeeded or failed, whether the AI raised uncertainty notes, and where to fetch JSON, CSV, or XLSX output. Those format URLs are temporary, so long-running workflows should download promptly or request a fresh format-specific link if needed. That context matters because invoice automation should not treat every extraction as equally trustworthy.

For most C# systems, JSON is the best default because it maps cleanly into typed models, validators, and service contracts. A practical C# invoice parser should define models for header fields, supplier identity, totals, tax lines, line items, and source references. If you asked for per_invoice output, your objects can stay compact and document-oriented. If you asked for per_line_item output, repeat the invoice-level context intentionally so downstream reconciliation, spend analysis, and exception handling do not lose the document relationship.

The failed-page list and AI uncertainty notes should flow into operational handling, not disappear after deserialization. If a damaged page failed processing, you may need a manual review queue. If the API flags uncertainty about which pages represented the invoice versus an attached delivery note, that belongs in your accounts payable automation rules, audit trail, or exception workflow before data moves into approvals or payments.

Each output format fits a different downstream job:

  • JSON is usually best for application logic, validation, and ERP integration.
  • CSV works well for lightweight ingestion into import tools or intermediate data pipelines.
  • XLSX is useful when finance analysts want to review the result in a spreadsheet before posting or reconciling it.

Production Patterns for Retries, Rate Limits, and Security

Production reliability depends on three controls: persisted recovery state, per-endpoint throughput limits, and security review.

For recovery state, persist two identifiers throughout the lifecycle: upload_session_id for the upload phase and submission_id for the extraction phase. Those IDs make recovery much cleaner when a worker restarts or a network call times out.

The upload completion step is designed for that reality. If a file has already been completed, calling the completion endpoint again returns success instead of duplicating the upload. That lets a C# worker retry safely after a connection drop, as long as it kept the original session and part metadata.

Rate limits should shape your worker design from day one:

  • Upload endpoints: 600 requests per minute
  • Submit extraction: 30 requests per minute
  • Poll status: 120 requests per minute
  • Download output: 30 requests per minute
  • Credit balance: 60 requests per minute

Those numbers argue for background workers, bounded concurrency, and per-endpoint backoff policies instead of long-running web requests. Respect the Retry-After header when it appears, and treat retryable and non-retryable failures differently so a malformed prompt does not get the same handling as a transient infrastructure problem.

Security belongs in the same architecture discussion. Before rollout, review the service's retention and access model the same way you would for any other finance-data pipeline. Invoice Data Extraction applies HTTPS/TLS in transit and AES-256 at rest, and the API follows the same handling model as the web product. Uploaded source documents and processing logs are automatically deleted within 24 hours, while generated outputs are retained for 90 days unless a user deletes them sooner. The document extraction API security checklist is a practical companion when you need to turn those facts into an internal review or vendor questionnaire.

If the pilot works, the next design question is scale, not language choice. The batch invoice processing architecture guide is the right next read for teams moving from a single-run prototype to sustained finance workloads.

Extract invoice data to Excel with natural language prompts

Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.

Exceptional accuracy on financial documents
1–8 seconds per page with parallel processing
50 free pages every month — no subscription
Any document layout, language, or scan quality
Native Excel types — numbers, dates, currencies
Files encrypted and auto-deleted within 24 hours
Continue Reading