Invoice Data Extraction API
Overview
Extracting data from invoices is a three-step process:
- Upload — Create an upload session, upload your files in chunks, then complete each upload.
- Submit — Submit an extraction task referencing your uploaded files.
- Poll — Check the task status until processing completes, then download your results.
Each file in the session is uploaded and completed independently. If a file fails at any stage, you can still upload, complete, and submit the other files.
Extraction tasks submitted via API appear in your web dashboard alongside tasks submitted from the web app — you can view progress, results, and download output from either.
Authentication
All API requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Generate and manage your API keys from your dashboard at https://invoicedataextraction.com/dashboard?view=API. Every account includes 50 free pages per month.
Error Responses
All endpoints return errors in this format:
{
"success": false,
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message.",
"retryable": false,
"details": null
}
}
retryable indicates whether the same request can be retried. When true, the error is transient (e.g., a temporary server issue) and retrying after a short delay may succeed. When false, the request itself is invalid and retrying will produce the same error.
The following errors can be returned by any endpoint:
| Code | Status | Retryable | Message |
|---|---|---|---|
UNAUTHENTICATED | 401 | No | Missing or invalid bearer token. |
API_KEY_EXPIRED | 401 | No | API key has expired. Please create a new key. |
API_KEY_REVOKED | 401 | No | API key has been revoked. Please create a new key. |
NOT_FOUND | 404 | No | The requested endpoint does not exist. |
RATE_LIMITED | 429 | Yes | Too many requests. Retry after the period indicated in the Retry-After header. |
INTERNAL_ERROR | 500 | Yes | An unexpected error occurred. |
details is always present. It is null when there is no additional context, or an object with error-specific information. For example, INVALID_INPUT errors include validation issues:
{
"success": false,
"error": {
"code": "INVALID_INPUT",
"message": "Request validation failed. Check details for specific issues.",
"retryable": false,
"details": {
"issues": [
{ "message": "file_name must end with a supported extension: .pdf, .jpg, .jpeg, or .png.", "path": ["files", 0, "file_name"] }
]
}
}
}
Rate Limits
All endpoints are rate limited per API key. If you exceed the limit, the API returns a 429 status with a Retry-After header indicating how many seconds to wait before retrying.
| Endpoints | Limit |
|---|---|
| Upload endpoints (create session, get part URLs, complete upload) | 600 requests per minute |
| Submit extraction | 30 requests per minute |
| Poll extraction status | 120 requests per minute |
| Download output | 30 requests per minute |
| Delete extraction | 30 requests per minute |
| Check credit balance | 60 requests per minute |
Step 1: Create Upload Session
Creates an upload for one or more files. Returns the part size you should use when chunking files for upload.
Endpoint
POST https://api.invoicedataextraction.com/v1/uploads/sessions
Authentication: Bearer token in the Authorization header.
Authorization: Bearer YOUR_API_KEY
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
upload_session_id | string | Yes | Your unique identifier for this upload session. Use a different ID for each new session. If a request fails or times out, you can safely retry with the same ID and files — the existing session will be returned without creating duplicates. |
files | array | Yes | The files you want to upload (1 to 6,000 files). |
Each item in files:
| Field | Type | Required | Description |
|---|---|---|---|
file_id | string | Yes | Your unique identifier for this file within the session. Only letters, numbers, dots, underscores, colons, and hyphens (1-200 characters). You'll use this ID to reference the file when requesting part URLs and completing the upload. |
file_name | string | Yes | The file name, including extension. Must end in .pdf, .jpg, .jpeg, or .png. |
file_size_bytes | integer | Yes | The exact size of the file in bytes. |
File Limits
| Type | Max Size |
|---|---|
| 150 MB | |
| JPG / JPEG / PNG | 5 MB |
| Total batch size | 2 GB |
| Max files per session | 6,000 |
Example Request
curl -X POST "https://api.invoicedataextraction.com/v1/uploads/sessions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"upload_session_id": "sess_001",
"files": [
{
"file_id": "file_001",
"file_name": "invoice-1.pdf",
"file_size_bytes": 120450
},
{
"file_id": "file_002",
"file_name": "receipt.jpg",
"file_size_bytes": 84200
}
]
}'
Success Response (200)
{
"success": true,
"upload_session_id": "sess_001",
"files": [
{
"file_id": "file_001",
"file_name": "invoice-1.pdf",
"part_size": 8388608
},
{
"file_id": "file_002",
"file_name": "receipt.jpg",
"part_size": 8388608
}
]
}
part_size is the chunk size in bytes to use when splitting files for multipart upload. This value is the same for all files in the session. Files smaller than part_size are uploaded as a single part.
Error Codes
| Code | Status | Retryable | Message |
|---|---|---|---|
DUPLICATE_FILE_NAME | 400 | No | Each file must have a unique file_name. Check details for the duplicates. |
DUPLICATE_FILE_ID | 400 | No | Each file must have a unique file_id. Check details for the duplicates. |
FILE_TOO_LARGE | 400 | No | A file exceeds the maximum size for its type. Check details for the file and size limit. |
TOTAL_UPLOAD_SIZE_LIMIT_EXCEEDED | 400 | No | The combined size of all files exceeds the maximum upload size. Check details for the limit. |
INSUFFICIENT_CREDITS | 402 | No | Not enough credits for this upload session. Each file requires at least one credit. Check details for your balance. credits_reserved are credits held by extractions currently being processed. |
SESSION_ALREADY_INITIALIZED | 409 | No | This upload_session_id is already in use. Please use a different upload_session_id. |
Idempotency
You can safely retry a failed or timed-out request using the same upload_session_id and files. If the session was already created, the existing session is returned. If you need a new session with different files, use a different upload_session_id.
Next Step
After creating the upload session, request presigned part URLs for each file to begin uploading.
Step 2: Get Part Upload URLs
For each file, request presigned URLs for the parts you need to upload. You then PUT your file bytes directly to these URLs.
Endpoint
POST https://api.invoicedataextraction.com/v1/uploads/sessions/{upload_session_id}/parts
{upload_session_id} is the ID you provided when creating the upload session in Step 1.
Authentication: Bearer token in the Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
file_id | string | Yes | The file ID you used when creating the upload session. |
part_numbers | array of integers | Yes | The part numbers you want upload URLs for (1-indexed). |
How to calculate part numbers
Use the part_size from the Step 1 response to determine how many parts your file needs:
total_parts = ceil(file_size_bytes / part_size)
part_numbers = [1, 2, 3, ..., total_parts]
Files smaller than part_size need only one part: [1].
In the examples below, part_size is 8388608 (8 MB):
- A 120 KB file is smaller than 8 MB, so it needs only part
[1]. - A 20 MB file needs
ceil(20_000_000 / 8_388_608) = 3parts:[1, 2, 3].
Example: Small file (single part)
curl -X POST "https://api.invoicedataextraction.com/v1/uploads/sessions/sess_001/parts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "file_001",
"part_numbers": [1]
}'
{
"success": true,
"upload_session_id": "sess_001",
"file_id": "file_001",
"file_name": "invoice-1.pdf",
"part_size": 8388608,
"part_urls": [
{
"part_number": 1,
"url": "https://storage.example.com/...?X-Amz-Signature=..."
}
]
}
Example: Large file (multiple parts)
curl -X POST "https://api.invoicedataextraction.com/v1/uploads/sessions/sess_001/parts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "file_002",
"part_numbers": [1, 2, 3]
}'
{
"success": true,
"upload_session_id": "sess_001",
"file_id": "file_002",
"file_name": "large-report.pdf",
"part_size": 8388608,
"part_urls": [
{
"part_number": 1,
"url": "https://storage.example.com/...?X-Amz-Signature=..."
},
{
"part_number": 2,
"url": "https://storage.example.com/...?X-Amz-Signature=..."
},
{
"part_number": 3,
"url": "https://storage.example.com/...?X-Amz-Signature=..."
}
]
}
Uploading parts
Once you have the presigned URLs, split your file into chunks and upload each one. Each presigned URL is valid for 15 minutes.
How it works
- Read the file as binary (Buffer, ArrayBuffer, Uint8Array, etc.).
- Slice into chunks of
part_sizebytes (returned in the Step 1 response). The last chunk will usually be smaller — that's fine. - PUT each chunk to the corresponding presigned URL. Send the raw bytes as the request body — no special headers or encoding needed.
- Capture the
ETagresponse header from each PUT response. The ETag is a quoted string (e.g.,"d41d8cd98f00b204e9800998ecf8427e"). Keep the quotes — you'll need the exact value in Step 3.
See the full Node.js example at the end of this document.
Error Codes
| Code | Status | Retryable | Message |
|---|---|---|---|
FILE_NOT_FOUND | 404 | No | This file_id was not registered when the upload session was created. Check the file_id and upload_session_id. |
FILE_NOT_UPLOADABLE | 409 | No | This file has already been completed or aborted. |
Next Step
After uploading all parts for a file, complete the upload with the ETags from each part.
Step 3: Complete File Upload
After uploading all parts for a file, call this endpoint with the ETags to finalize the upload. Call this once per file.
Endpoint
POST https://api.invoicedataextraction.com/v1/uploads/sessions/{upload_session_id}/complete
{upload_session_id} is the ID you provided when creating the upload session in Step 1.
Authentication: Bearer token in the Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
file_id | string | Yes | The file ID you used when creating the upload session. |
parts | array | Yes | The part numbers and ETags from your part uploads. |
Each item in parts:
| Field | Type | Required | Description |
|---|---|---|---|
part_number | integer | Yes | The part number (matches what you requested in Step 2). |
e_tag | string | Yes | The ETag returned in the response header when you uploaded this part. Include the surrounding quotes (e.g., "\"a1b2c3...\"") |
Example Request
curl -X POST "https://api.invoicedataextraction.com/v1/uploads/sessions/sess_001/complete" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "file_001",
"parts": [
{
"part_number": 1,
"e_tag": "\"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4\""
}
]
}'
For a multi-part file:
curl -X POST "https://api.invoicedataextraction.com/v1/uploads/sessions/sess_001/complete" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "file_002",
"parts": [
{ "part_number": 1, "e_tag": "\"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4\"" },
{ "part_number": 2, "e_tag": "\"f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3\"" },
{ "part_number": 3, "e_tag": "\"9876543210ab9876543210ab9876543210\"" }
]
}'
Success Response (200)
{
"success": true,
"upload_session_id": "sess_001",
"file_id": "file_001",
"file_name": "invoice-1.pdf"
}
Idempotency
If a file has already been completed, calling this endpoint again returns a success response. This makes it safe to retry if your connection drops before you receive the response.
Error Codes
| Code | Status | Retryable | Message |
|---|---|---|---|
FILE_NOT_FOUND | 404 | No | This file_id was not registered when the upload session was created. Check the file_id and upload_session_id. |
FILE_ABORTED | 409 | No | This file has been aborted and can no longer be completed. |
INVALID_COMPLETION_PARTS | 400 | No | The parts provided to complete this file upload are invalid. Check details for the specific reason. |
OBJECT_SIZE_MISMATCH | 422 | No | The uploaded file size does not match the file_size_bytes declared when the upload session was created. Check details for the declared and actual sizes. |
UPLOAD_ID_NOT_FOUND | 409 | No | This upload session is no longer available. Please create a new upload session and re-upload your files. |
UPLOAD_COMPLETE_FAILED | 502 | Yes | File upload completion failed. This may be a temporary issue — please retry. |
Next Step
After completing all files, submit an extraction task.
Step 4: Submit Extraction Task
Submit an extraction task referencing your uploaded files. You tell the API what data to extract using a prompt.
Endpoint
POST https://api.invoicedataextraction.com/v1/extractions
Authentication: Bearer token in the Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
submission_id | string | Yes | Your unique identifier for this submission. If a request fails or times out, retry with the same submission_id to safely retrieve the existing task instead of creating a duplicate. Use a different ID for each new extraction task (e.g., a UUID). |
upload_session_id | string | Yes | The upload session ID from Step 1. |
file_ids | array of strings | Yes | The file IDs to include in this extraction. Must reference files that were completed in Step 3. |
task_name | string | Yes | Your own label for this extraction task, for your internal reference (3-40 characters). |
prompt | string or object | Yes | Your extraction instructions. See below. |
output_structure | string | Yes | "automatic", "per_invoice", or "per_line_item". |
options | object | No | Configuration options. See below. |
Output structure
Controls how the extracted data is structured:
| Value | Meaning |
|---|---|
automatic | The AI decides based on your prompt and documents. |
per_invoice | Each invoice becomes a single row (spreadsheet/CSV) or object (JSON). |
per_line_item | Each individual product/service listed within an invoice becomes its own row (spreadsheet/CSV) or object (JSON). |
Prompt
The prompt field tells the AI what data to extract from your documents. It can be either a string or an object.
As a string — describe what you want in natural language:
"prompt": "Extract invoice number, date, vendor name, total amount, and all line items with descriptions and amounts"
As an object — define exact output field names, with optional per-field and general instructions:
"prompt": {
"fields": [
{ "name": "Invoice Number" },
{ "name": "Invoice Date", "prompt": "The date the invoice was issued, NOT the payment due date" },
{ "name": "Vendor Name" },
{ "name": "Total Amount", "prompt": "Do not include currency symbol, use 2 decimal places" }
],
"general_prompt": "One row for each product. Do not extract shipping lines."
}
Use an object when you need exact output field names — each name is guaranteed to appear exactly as written in the extracted data. With a string, the AI chooses field names based on your instructions.
For guidance on writing effective prompts, see the Prompt Guide.
Each item in fields:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | The name for this data point in the output (2-50 characters). Prefer clear, descriptive names (e.g., "Invoice Number", not "Field A"). |
prompt | string | No | Specific instructions for extracting this data point. Use this to clarify ambiguities or instruct special handling. |
| Field | Type | Required | Description |
|---|---|---|---|
general_prompt | string | No | Instructions that apply to the full task (and across all fields). Use this to provide special handling instructions, specify output structure/formatting, or describe the extraction goal. |
Options
The options object is optional. All fields within it are optional and have sensible defaults.
| Field | Type | Default | Description |
|---|---|---|---|
exclude_columns | array of strings | [] | System-generated columns to exclude from output files. By default, a "Source File" column is added to every row indicating which uploaded file/page the data was extracted from. If your workflow requires an exact output structure, you can exclude it. Valid values: "source_file". |
Example: String prompt
curl -X POST "https://api.invoicedataextraction.com/v1/extractions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"submission_id": "sub_001",
"upload_session_id": "sess_001",
"file_ids": ["file_001", "file_002"],
"task_name": "January invoices",
"prompt": "Extract invoice number, date, vendor name, and total amount",
"output_structure": "per_invoice"
}'
Example: Object prompt
curl -X POST "https://api.invoicedataextraction.com/v1/extractions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"submission_id": "sub_002",
"upload_session_id": "sess_001",
"file_ids": ["file_001", "file_002"],
"task_name": "January invoices",
"prompt": {
"fields": [
{ "name": "Invoice Number" },
{ "name": "Invoice Date", "prompt": "The date the invoice was issued, NOT the payment due date" },
{ "name": "Vendor Name" },
{ "name": "Line Item Description" },
{ "name": "Line Item Amount", "prompt": "Do not include currency symbol, use 2 decimal places" }
],
"general_prompt": "Dates should be in YYYY-MM-DD format. Ignore email cover letters."
},
"output_structure": "per_line_item"
}'
Success Response (202)
{
"success": true,
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"submission_state": "received"
}
The task is now queued for processing. Use the extraction_id to poll for results (Step 5).
Once submitted, the extraction task also appears in the web dashboard alongside tasks submitted from the web app — you can view its progress and results there.
Idempotency
If a request fails or times out, you can safely retry with the same submission_id. If the task was already created, the existing task is returned without creating a duplicate. Use a different submission_id for each new extraction task.
Next Step
After submitting, poll the task status until processing completes.
Step 5: Poll for Results
After submitting an extraction task, poll this endpoint until the task completes or fails.
Endpoint
GET https://api.invoicedataextraction.com/v1/extractions/{extraction_id}
{extraction_id} is returned in the Step 4 response.
Authentication: Bearer token in the Authorization header.
Response
The response always includes success, status, and extraction_id. The rest of the response depends on the status.
Processing (keep polling)
{
"success": true,
"status": "processing",
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"progress": 42
}
progress is an integer from 0 to 100 indicating approximate completion. The task is still being processed — wait a few seconds and poll again.
Completed
{
"success": true,
"status": "completed",
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"credits_deducted": 25,
"output_structure": "per_invoice",
"pages": {
"successful_count": 10,
"failed_count": 2,
"successful": [
{ "file_name": "invoice-1.pdf", "page": 1 },
{ "file_name": "invoice-1.pdf", "page": 2 }
],
"failed": [
{ "file_name": "damaged.pdf", "page": 1 }
]
},
"ai_uncertainty_notes": [
{
"topic": "Documents to extract from",
"description": "Your files often contain a 'Tax Invoice' with an attached 'Delivery Note'. I treated the 'Tax Invoice' pages as the main source of data, and ignored the attached 'Delivery Note' pages as supporting context.",
"suggested_prompt_additions": [
{
"purpose": "To confirm this handling",
"instructions": ["Extract from 'Tax Invoice' only"]
},
{
"purpose": "To extract from both",
"instructions": ["Extract from 'Tax Invoice' and 'Delivery Note'"]
}
]
}
],
"output": {
"xlsx_url": "https://storage.example.com/...?X-Amz-Signature=...",
"csv_url": "https://storage.example.com/...?X-Amz-Signature=...",
"json_url": "https://storage.example.com/...?X-Amz-Signature=..."
}
}
| Field | Description |
|---|---|
credits_deducted | The number of credits charged for this extraction (one credit per successful page). |
output_structure | The output structure used: "per_invoice" or "per_line_item". If you submitted with "automatic", this tells you what the AI chose. |
pages.successful_count | Number of pages successfully processed. |
pages.failed_count | Number of pages that failed processing. |
pages.successful | List of successfully processed pages. Each item has file_name (the uploaded file name) and page (the page number within that file). |
pages.failed | List of pages that failed processing. Same shape as successful. |
ai_uncertainty_notes | Areas where the AI made assumptions due to ambiguity in your prompt. Empty array if none. Each note has a topic, a description of what was assumed, and a suggested_prompt_additions array of prompt additions you can use to remove the ambiguity in future extractions. Each item has a purpose (why you'd add it) and instructions (prompt text you can add). |
output.xlsx_url | Presigned download URL for the Excel (.xlsx) file. null if not available. |
output.csv_url | Presigned download URL for the CSV file. null if not available. |
output.json_url | Presigned download URL for the JSON file. null if not available. |
Download URLs are temporary, pre-authenticated URLs. To download a file, make a plain GET request to the URL — no Authorization header or other authentication needed. URLs expire after 5 minutes. If a URL has expired, use the download endpoint to get a fresh one.
Failed
When an extraction fails, the response uses the standard error format plus status: "failed" and extraction_id.
INSUFFICIENT_CREDITS — credits_balance is your total credit balance. credits_reserved are credits held by extractions currently being processed (your available credits = balance minus reserved).
{
"success": false,
"status": "failed",
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"error": {
"code": "INSUFFICIENT_CREDITS",
"message": "Insufficient credits to process this extraction. Check details for your balance and required credits.",
"retryable": false,
"details": {
"credits_required": 25,
"credits_balance": 15,
"credits_reserved": 10
}
}
}
FILE_PAGE_LIMIT_EXCEEDED / ENCRYPTED_FILE — details.file_names lists the affected files.
{
"success": false,
"status": "failed",
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"error": {
"code": "ENCRYPTED_FILE",
"message": "One or more files are encrypted. Remove the encryption and re-upload. Check details for the affected files.",
"retryable": false,
"details": {
"file_names": ["protected-invoice.pdf"]
}
}
}
All other error codes have details: null:
| Code | Retryable | Message |
|---|---|---|
CONCURRENT_TASK_LIMIT | Yes | Too many extractions running at once. Wait for one to complete, then retry. |
NO_PAGES_FOUND | No | No extractable pages found. Files may be empty or corrupted. |
PROMPT_REJECTED | No | The prompt did not describe data extraction. Please revise your prompt. |
PROMPT_UNCLEAR | No | The AI could not understand the prompt well enough. Please adjust your instructions. |
FILE_SIZE_LIMIT_EXCEEDED | No | A file exceeded the size limit during processing. Split large files and retry. |
SUBMISSION_STALLED | Yes | This extraction was not picked up for processing. Please resubmit. |
EXTRACTION_NOT_FOUND | No | No extraction found for this extraction_id. |
INTERNAL_ERROR | Yes | An unexpected error occurred. Retry after a short delay. |
Polling Strategy
Poll no more frequently than every 5 seconds. Processing time depends on the number and size of your files.
while status == "processing":
wait 5+ seconds
GET /extractions/{extraction_id}
if success == false:
check error.retryable — if true, wait and resubmit; if false, fix the issue first
else if status == "completed":
download output files
Next Step
Download the output files using the URLs in the response. If a download URL has expired, request a fresh one.
Step 6: Download Output
If a download URL from the polling response has expired (URLs are valid for 5 minutes), request a fresh one.
Endpoint
GET https://api.invoicedataextraction.com/v1/extractions/{extraction_id}/output?format={format}
Authentication: Bearer token in the Authorization header.
Query Parameters
| Parameter | Required | Description |
|---|---|---|
format | Yes | xlsx, csv, or json |
Example Request
curl "https://api.invoicedataextraction.com/v1/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/output?format=xlsx" \
-H "Authorization: Bearer $API_KEY"
Success Response (200)
{
"download_url": "https://storage.example.com/...?X-Amz-Signature=...",
"format": "xlsx",
"expires_in_seconds": 300
}
Error Codes
| Code | Status | Retryable | Message |
|---|---|---|---|
EXTRACTION_NOT_FOUND | 404 | No | No extraction found for this extraction_id. |
OUTPUT_NOT_AVAILABLE | 404 | No | Output is not available. The extraction may not be completed, or this format was not generated. |
Delete Extraction
Permanently deletes an extraction, its output files, and its uploaded source files. Extractions that are currently being processed cannot be deleted.
Note: Deleting an extraction removes the uploaded source files associated with it. If you created multiple extractions from the same upload session, deleting one will not affect the others — source files are only removed when no other extraction is using them.
Our standard data retention policies apply automatically — uploaded documents and processing data are deleted on a schedule. Use this endpoint if you need to delete an extraction and its data immediately rather than waiting for automatic retention.
Endpoint
DELETE https://api.invoicedataextraction.com/v1/extractions/{extraction_id}
Authentication: Bearer token in the Authorization header.
Example Request
curl -X DELETE "https://api.invoicedataextraction.com/v1/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-H "Authorization: Bearer $API_KEY"
Success Response (200)
{
"success": true
}
Error Codes
| Code | Status | Retryable | Message |
|---|---|---|---|
EXTRACTION_NOT_FOUND | 404 | No | No extraction found for this extraction_id. |
EXTRACTION_IN_PROGRESS | 409 | No | This extraction is currently being processed and cannot be deleted. Wait for it to complete or fail, then try again. |
Check Credit Balance
Returns your current credit balance, including credits reserved by extractions that are currently being processed.
Endpoint
GET https://api.invoicedataextraction.com/v1/credits/balance
Authentication: Bearer token in the Authorization header.
Example Request
curl "https://api.invoicedataextraction.com/v1/credits/balance" \
-H "Authorization: Bearer $API_KEY"
Success Response (200)
{
"success": true,
"credits_balance": 150,
"credits_reserved": 10
}
| Field | Description |
|---|---|
credits_balance | Your total credit balance (paid + free credits). |
credits_reserved | Credits reserved by extractions currently being processed. Up to this amount will be deducted when processing completes depending on number of successful pages. Your usable balance is credits_balance minus credits_reserved. |
Node.js Example
A ready-to-run script that handles the full workflow — reads files from a local folder, uploads them, submits an extraction task, polls until completion, and downloads the results. No dependencies beyond Node.js 18+.
Save this as extract.js, set the three configuration variables at the top (API_KEY, FOLDER_PATH, PROMPT), and run with node extract.js. You'll have extraction results in minutes.
import { readdir, readFile, stat, writeFile, mkdir } from "fs/promises";
import { join, extname } from "path";
// ---------------------------------------------------------------------------
// Configuration — set these before running
// ---------------------------------------------------------------------------
// Your API key. Get one at: https://invoicedataextraction.com/dashboard?view=API
// IMPORTANT: This is hardcoded here for simplicity. In production, load from an
// environment variable (e.g. process.env.IDE_API_KEY) and never commit to Git.
const API_KEY = "YOUR_API_KEY";
// Absolute path to the local folder containing the files you want to process.
const FOLDER_PATH = "/Users/you/Documents/invoices";
// Tell the AI what data to extract from each document (plain-text instruction).
const PROMPT = "Extract invoice number, date, vendor name, and total amount";
// For exact output column names, pass an object instead:
// const PROMPT = { fields: [{ name: "Invoice Number" }, { name: "Total", prompt: "No currency symbol" }], general_prompt: "..." };
// A label for this extraction task (3-40 characters). Used in your dashboard and output filenames.
const TASK_NAME = "My extraction task";
// How rows are grouped in the output: "automatic" (AI decides), "per_invoice", or "per_line_item".
const OUTPUT_STRUCTURE = "automatic";
// Which output formats to download. Any combination of "xlsx", "csv", "json".
const DOWNLOAD_FORMATS = ["xlsx", "csv", "json"];
// ---------------------------------------------------------------------------
// Internal constants — no changes needed
// ---------------------------------------------------------------------------
const API_BASE = "https://api.invoicedataextraction.com/v1";
const SUPPORTED_EXTENSIONS = new Set([".pdf", ".jpg", ".jpeg", ".png"]);
const MAX_RETRIES = 3;
async function apiRequest(path, body) {
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
const response = await fetch(`${API_BASE}${path}`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
const text = await response.text();
let data;
try {
data = JSON.parse(text);
} catch {
// Non-JSON response — could be a Cloudflare rate limit or infrastructure error.
// Retry on 429/503, throw on anything else.
if ((response.status === 429 || response.status === 503) && attempt < MAX_RETRIES) {
const delayMs = 5000 * attempt;
console.warn(`Non-JSON ${response.status} response, retrying in ${delayMs / 1000}s...`);
await new Promise((resolve) => setTimeout(resolve, delayMs));
continue;
}
throw new Error(`API returned non-JSON response (${response.status}): ${text.slice(0, 200)}`);
}
if (data.success) return data;
// If the error is retryable and we have attempts left, wait and retry
if (data.error.retryable && attempt < MAX_RETRIES) {
// Use the Retry-After header if present (rate limit responses), otherwise exponential backoff
const retryAfter = response.headers.get("Retry-After");
const delayMs = retryAfter ? parseInt(retryAfter, 10) * 1000 : 1000 * attempt;
console.warn(`Retryable error (${data.error.code}), retrying in ${delayMs / 1000}s...`);
await new Promise((resolve) => setTimeout(resolve, delayMs));
continue;
}
throw new Error(
`API error: ${data.error.code} — ${data.error.message}` +
(data.error.details ? `\nDetails: ${JSON.stringify(data.error.details)}` : "")
);
}
}
// ---------------------------------------------------------------------------
// Step 1: Discover local files and create an upload session
// ---------------------------------------------------------------------------
// Scan the folder for supported file types
const entries = await readdir(FOLDER_PATH);
const files = [];
for (const entry of entries) {
// Skip unsupported file types and subfolders
const ext = extname(entry).toLowerCase();
if (!SUPPORTED_EXTENSIONS.has(ext)) continue;
const filePath = join(FOLDER_PATH, entry);
const fileStat = await stat(filePath);
if (!fileStat.isFile()) continue;
// Add this file to the list with its size in bytes.
// file_id must be unique within the session and can only contain letters, numbers,
// dots, underscores, colons, and hyphens (no spaces). Use your own IDs (e.g., database
// row IDs, UUIDs, or a simple counter).
files.push({
file_id: `file_${files.length + 1}`,
file_name: entry,
file_size_bytes: fileStat.size,
localPath: filePath, // kept locally — not sent to the API
});
}
// Optional: before uploading, you could calculate the credits required and check
// your balance. Each page costs one credit — for PDFs, count the pages; for
// images, each file is one credit. Then call GET /credits/balance to compare
// against your available balance (credits_balance minus credits_reserved).
// Generate a unique ID for this upload session (must be different for each new session)
const uploadSessionId = `session_${Date.now()}`;
// Create the upload session — registers all files with the API
let session;
try {
session = await apiRequest("/uploads/sessions", {
upload_session_id: uploadSessionId,
files: files.map(({ file_id, file_name, file_size_bytes }) => ({
file_id,
file_name,
file_size_bytes,
})),
});
} catch (error) {
// Session creation failure is fatal — no files can be uploaded without a session
console.error(`Failed to create upload session: ${error.message}`);
process.exit(1);
}
console.log(`Upload session created: ${session.upload_session_id} (${files.length} files)`);
// The chunk size in bytes — always the same for all files in the session, so we read it from the first
const partSize = session.files[0].part_size;
// ---------------------------------------------------------------------------
// Steps 2 & 3: For each file — upload chunks, then complete the upload
// ---------------------------------------------------------------------------
const completedFileIds = [];
for (const file of files) {
try {
// Read the entire file into memory as a binary buffer
const fileBuffer = await readFile(file.localPath);
// Calculate how many parts this file needs
const totalParts = Math.ceil(fileBuffer.length / partSize);
const partNumbers = Array.from({ length: totalParts }, (_, i) => i + 1);
// Request a presigned upload URL for each part
const partsData = await apiRequest(`/uploads/sessions/${uploadSessionId}/parts`, {
file_id: file.file_id,
part_numbers: partNumbers,
});
// Upload each chunk to its presigned URL via PUT
const completedParts = [];
for (const { part_number, url } of partsData.part_urls) {
// Slice the file buffer into a chunk for this part
const start = (part_number - 1) * partSize;
const end = Math.min(start + partSize, fileBuffer.length);
const chunk = fileBuffer.subarray(start, end);
// PUT the raw bytes directly to the presigned URL
const putResponse = await fetch(url, { method: "PUT", body: chunk });
if (!putResponse.ok) {
const errorBody = await putResponse.text();
throw new Error(
`Upload failed for ${file.file_name} part ${part_number}: ${putResponse.status} ${putResponse.statusText}\n${errorBody}`
);
}
// Save the ETag — needed to complete the upload in Step 3
completedParts.push({
part_number,
e_tag: putResponse.headers.get("etag"),
});
}
console.log(`Uploaded: ${file.file_name} (${totalParts} part${totalParts > 1 ? "s" : ""})`);
// Complete the file upload with the collected ETags
await apiRequest(`/uploads/sessions/${uploadSessionId}/complete`, {
file_id: file.file_id,
parts: completedParts,
});
console.log(`Completed: ${file.file_name}`);
completedFileIds.push(file.file_id);
} catch (error) {
// By default, abort on any file failure to avoid silent partial uploads.
// If you'd prefer to continue with remaining files, remove the process.exit.
console.error(`Failed: ${file.file_name} — ${error.message}`);
process.exit(1);
}
}
// All files uploaded and completed successfully
console.log(`\n${completedFileIds.length} files ready for extraction.`);
// ---------------------------------------------------------------------------
// Steps 4 & 5: Submit the extraction task and poll until it completes
// ---------------------------------------------------------------------------
// Retryable polling errors (e.g., concurrent task limit, temporary server issues) trigger
// a fresh submission. Non-retryable errors require action from you — the log message tells
// you what to fix before re-running the script.
// Optional: human-readable guidance for non-retryable error codes (see error reference above).
// This just improves the console output — the API works the same without it.
const NON_RETRYABLE_GUIDANCE = {
INSUFFICIENT_CREDITS: "Purchase credits at https://invoicedataextraction.com/dashboard?view=Billing then re-run this script.",
FILE_PAGE_LIMIT_EXCEEDED: "Split the affected files into smaller documents and re-upload.",
ENCRYPTED_FILE: "Remove encryption from the affected files and re-upload.",
NO_PAGES_FOUND: "Check that your files are valid and contain extractable content.",
PROMPT_REJECTED: "Revise your prompt to clearly describe what data to extract.",
PROMPT_UNCLEAR: "Revise your prompt with clearer instructions and re-run.",
FILE_SIZE_LIMIT_EXCEEDED: "Split large files into smaller documents and re-upload.",
};
const MAX_SUBMISSION_ATTEMPTS = 2;
const POLL_INTERVAL_MS = 5000;
let result;
for (let attempt = 1; attempt <= MAX_SUBMISSION_ATTEMPTS; attempt++) {
// Each attempt needs a unique submission_id
const submissionId = `sub_${Date.now()}_${attempt}`;
const run = await apiRequest("/extractions", {
submission_id: submissionId,
upload_session_id: uploadSessionId,
file_ids: completedFileIds,
task_name: TASK_NAME,
prompt: PROMPT,
output_structure: OUTPUT_STRUCTURE,
});
console.log(`\nExtraction task submitted (extraction_id: ${run.extraction_id})`);
// Poll until completed or failed
let lastFailureCode = null;
let consecutivePollErrors = 0;
const MAX_CONSECUTIVE_POLL_ERRORS = 10;
while (true) {
const response = await fetch(`${API_BASE}/extractions/${run.extraction_id}`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const data = await response.json();
if (data.status === "completed") {
result = data;
break;
}
if (data.status === "failed") {
const { code, message, details, retryable } = data.error;
console.error(`\nExtraction failed: ${code} — ${message}`);
if (details) console.error(`Details: ${JSON.stringify(details)}`);
if (!retryable) {
const guidance = NON_RETRYABLE_GUIDANCE[code] || "Check the error above and re-run when resolved.";
console.error(`\nAction required: ${guidance}`);
process.exit(1);
}
// Retryable — wait then submit again.
// Concurrent task limit means we wait longer (5 min) for other processing tasks to finish.
// Other retryable errors are transient, so a short delay (10s) suffices.
const delayMs = code === "CONCURRENT_TASK_LIMIT" ? 300_000 : 10_000;
console.log(`Retrying in ${delayMs / 1000}s (attempt ${attempt}/${MAX_SUBMISSION_ATTEMPTS})...`);
await new Promise((resolve) => setTimeout(resolve, delayMs));
lastFailureCode = code;
break;
}
// Still processing — reset error counter and poll again
if (data.status === "processing") {
consecutivePollErrors = 0;
console.log(`Processing... ${data.progress ?? 0}%`);
} else {
consecutivePollErrors++;
console.warn(`Polling issue (HTTP ${response.status}) — retrying in ${POLL_INTERVAL_MS / 1000}s... (${consecutivePollErrors}/${MAX_CONSECUTIVE_POLL_ERRORS})`);
if (consecutivePollErrors >= MAX_CONSECUTIVE_POLL_ERRORS) {
console.error(`\nToo many consecutive polling errors. The extraction may still be processing — check your dashboard or retry later.`);
process.exit(1);
}
}
await new Promise((resolve) => setTimeout(resolve, POLL_INTERVAL_MS));
}
if (result) break;
if (lastFailureCode && attempt === MAX_SUBMISSION_ATTEMPTS) {
const exitMessage = lastFailureCode === "CONCURRENT_TASK_LIMIT"
? `\nStill hitting the concurrent task limit after ${MAX_SUBMISSION_ATTEMPTS} attempts. Wait for your other extractions to finish, then re-run.`
: `\nGave up after ${MAX_SUBMISSION_ATTEMPTS} attempts. There may be temporary service issues — please wait and try again later.`;
console.error(exitMessage);
process.exit(1);
}
}
console.log(`\nExtraction completed!`);
console.log(`Credits deducted: ${result.credits_deducted}`);
console.log(`Output structure: ${result.output_structure}`);
console.log(`Pages: ${result.pages.successful_count} successful, ${result.pages.failed_count} failed`);
if (result.pages.failed_count > 0) {
console.warn(`\nWarning: ${result.pages.failed_count} page(s) failed to extract. Data from these pages is missing from the output.`);
for (const page of result.pages.failed) {
console.warn(` - ${page.file_name} (page ${page.page})`);
}
}
if (result.ai_uncertainty_notes.length > 0) {
console.log(`\n--- AI Uncertainty Notes ---`);
console.log(`The AI made assumptions in ${result.ai_uncertainty_notes.length} area(s). Review these and consider adding the suggested prompt additions to improve future extractions.\n`);
result.ai_uncertainty_notes.forEach((note, i) => {
console.log(` [${i + 1}] ${note.topic}`);
console.log(` ${note.description}`);
for (const suggestion of note.suggested_prompt_additions) {
console.log(` → ${suggestion.purpose}: "${suggestion.instructions}"`);
}
console.log();
});
console.log(`---`);
}
// ---------------------------------------------------------------------------
// Step 6: Download the output files
// ---------------------------------------------------------------------------
const timestamp = new Date().toISOString().replace(/[:.]/g, "-").slice(0, 19);
const safeName = TASK_NAME.replace(/[^a-zA-Z0-9_-]/g, "_");
await mkdir("output", { recursive: true });
for (const format of DOWNLOAD_FORMATS) {
const url = result.output[`${format}_url`];
if (!url) {
console.warn(`No ${format} download available.`);
continue;
}
const response = await fetch(url);
if (!response.ok) {
console.error(`Failed to download ${format}: ${response.status}`);
continue;
}
const buffer = Buffer.from(await response.arrayBuffer());
const outputPath = `output/${safeName}_${timestamp}.${format}`;
await writeFile(outputPath, buffer);
console.log(`Downloaded: ${outputPath}`);
}
console.log(`\nDone. Extraction ${result.extraction_id} completed successfully.`);