Extraction Guide

Upload your documents, describe what you need, and download structured Excel data — it only takes a few clicks

Quick Start

Upload files

PDFs or images containing the data you want to extract.

Describe what to extract

Optionally provide a prompt in your own words — or let the AI decide.

Download your spreadsheet

Your extracted data, structured in a downloadable Excel file.

Uploads

Upload PDFs and images for extraction — from a single document to thousands at once.

Supported File Types

PDF Files

  • Regular & scanned PDFs
  • Up to 150MB per file
  • Up to 5,000 pages long
  • Can contain multiple invoices

Images

  • JPG and PNG files
  • Up to 5MB per file
  • One invoice per image

Batch Processing

Process up to 6000 documents in a single extraction task. Need larger batches? Message us and we can accommodate your needs.

Upload batches exactly as you receive them. The AI can handle invoices, credit notes, and statements in the same job, applying different rules to each. Email cover pages, remittance advice, and other non-invoice pages can be automatically filtered out via your prompt. See Prompt Capabilities for examples.

Prompt

Tell the AI what to extract — from a simple list of fields to detailed business rules.

Let AI Write Your Prompt

Not sure where to start? Upload your files, click Suggest prompt, and the AI generates a tailored extraction prompt for you.

How it works

  • The AI examines your documents to identify key data points
  • A tailored extraction prompt is generated automatically
  • Review and adjust, then save to your library for reuse

Writing Prompts

Start simple and add detail when you need precise control. When your prompt leaves something unspecified, the system uses conservative, accounting-friendly judgment.

Simple Prompts

List the fields you need. The AI selects formats and handles document structure.

Extract invoice number, invoice date, vendor name, net amount, tax, total

AI returns exactly these fields, correctly formatted

I’m processing invoices for payment. Extract invoice number, date, vendor, amount due, payment terms

Goal helps AI handle edge cases; listed fields define the output

Extract line items: description, quantity, unit price, line total

AI creates one row per line item with invoice details repeated

Detailed Prompts

Define exact fields, formats, and business rules for repeatable, auditable workflows.

I'm preparing AP data for our month-end close.

Extract:

  • Invoice Number (alphanumeric, top-right)
  • Invoice Date (YYYY-MM-DD)
  • Vendor Legal Name (prefer extracting from footer)
  • Net Amount (pre-tax invoice total)
  • VAT Rate (if no VAT is listed use 0, use Excel type percentage)
  • VAT Amount (if no VAT is present use 0)
  • Total Amount (invoice final total)
  • Document Type (classify as Invoice or Credit Note)
  • - For credit notes prefix Invoice Number with 'CR-' and show amounts as negative.
  • - One row for each invoice or credit note.
  • - Skip any pages that are email cover sheets or summary pages.”

Describe Your Goal for Smarter Results

Adding context about your finance process helps the AI handle edge cases you haven't anticipated.

Without a goal

“Extract invoice number, date, and total”

With a goal

I'm processing supplier invoices for payment. Extract invoice number, date, and total”

Why this helps

Describing your goal — such as the finance process you're performing — helps the AI make smarter decisions about edge cases (like how to handle bundled documents or ambiguous values) so you don't have to anticipate every scenario in your prompt.

Example goals

I'm processing these invoices for payment approvalI need this for our quarterly VAT returnI'm doing a line-item spend analysis across vendorsI'm reconciling bank statement transactions against our invoicesI'm extracting monthly utility charges across our sitesI'm preparing payroll data for our monthly pay run

Prompt Controls & Capabilities

Define exact fields, formats, and rules to control how your data is extracted and structured.

Fields & Output Structure

Define which fields to capture, how to name columns, and what each row represents.

“Extract 'Invoice Number', 'Invoice Date', and 'Total'”

“Use the column header 'Supplier_Name'”

“Create one row per invoice”

“One row per line item, repeat 'Invoice Number' on each row”

Business Logic & Rules

Set hints, default values, and conditional logic to handle real-world variations.

Hint“'Product Code' is in the 'Description' column, begins with 'SKU-'”

Default“If 'Tax Amount' is missing, set to 0”

Fallback“Find 'PO Number' in header, else use 'Reference'”

Conditional“If 'Currency' is 'USD', use 'State Tax'; if 'EUR', use 'VAT'”

Document & Page Handling

Apply rules to specific document types or filter out unwanted pages.

“Ignore pages titled 'Email Cover Sheet'”

“For credit notes, prefix Invoice Number with 'CR-' and show amounts as negative”

“From Statements of Account, extract each invoice as a separate row”

Data Formatting & Classification

Control how values are stored in Excel and automatically categorize transactions.

“Format all dates as YYYY-MM-DD”

“Ensure all currency fields have 2 decimal places”

“Add an 'Expense Category' column — classify as 'Office Supplies', 'Software', 'Travel', or 'Utilities'”

“Add a 'Payment Priority' column — 'Urgent' if overdue or due within 7 days, else 'Standard'”

Note: Your local Excel settings may display native Excel types (i.e. numbers, dates) according to your settings.

Prompt Library

Save prompts for one-click reuse — ensuring consistent, repeatable results across future batches.

Save and reuse

Save prompts by workflow, client, or document type. Apply any saved prompt with one click to ensure every batch follows the same extraction rules.

When to use separate prompts

You don't need separate prompts for different vendors or layouts — the AI adapts. Use separate prompts when the extraction logic itself differs:

  • Different tasks (VAT reporting vs. expense tracking)
  • Different document types (invoices vs. bank statements)
  • Client-specific output formats
  • Vendor-specific handling rules

Structured Prompt

An alternative way to provide extraction instructions — same capabilities as the free-text prompt, with guaranteed column headers and order.

When to Use Structured Prompt

You need exact column headers in a specific order for import into another system
You prefer a visual, organized layout for defining your extraction schema
You want to provide specific guidance per column alongside task-wide instructions

With Structured Prompt, your spreadsheet is guaranteed to use your exact column headers in your exact order. With the free-text Prompt, column names and order are typically consistent but the AI may make adjustments unless you explicitly specify otherwise.

Components of a Structured Prompt

1

Column Headers

Each column header becomes an exact column name in your spreadsheet. Use clear, descriptive names that convey the data point's meaning.

Good column names

Invoice Number
Supplier Name
Total Amount

Avoid

Column 1
Field A
Data
2

Per-Column Prompt (Optional)

Add specific guidance for individual columns. Use these to clarify ambiguities or specify formatting for that particular data point.

"The date the invoice was issued, NOT the payment due date"

"Do not include currency symbol, use 2 decimal places"

"Extract from the table beneath Description"

"If crossed out and handwritten, use the handwritten value"

3

Additional Task-Wide Prompt (Optional)

Add instructions that apply to the entire extraction, not just a single column. This is where you describe your goal, specify what each row should represent, and set task-wide formatting or handling rules.

Examples of task-wide instructions:

"I'm preparing AP data for month-end close"
"One row per invoice"
"Ignore email cover pages and remittance advice"
"For credit notes, show amounts as negative"
"Format all dates as YYYY-MM-DD"

Tips

Describe your goal in the Task-Wide Prompt

When the AI understands your workflow, it makes smarter decisions about edge cases. For example, knowing you're “processing for payment” tells the AI that bundled documents should be treated as one transaction. Add context like “I need this for quarterly VAT return” or “I'm reconciling against POs.”

Review AI Notes to refine your prompt

After extraction, check AI Notes for any assumptions the AI made. Use this feedback to add clarifying instructions to your structured prompt for future extractions.

Results

Review, download, and verify your extracted data.

Download Formats

Your output can be downloaded as Excel (.xlsx), CSV (.csv), or JSON (.json). Excel files use native spreadsheet cell types where applicable. CSV and JSON return extracted values as text.

Reviewing Your Output

Review failed pages

Failed pages are flagged in the Status & Results column. Click View Pages to see details and re-attempt if needed.

Check for AI Notes

If the AI made assumptions, a badge appears in the AI Notes column — each note includes copyable prompt suggestions so you can be more explicit next time.

Source file column

The last column always shows which uploaded file and page each row was extracted from.

Spot-check for accuracy

Review random samples from the spreadsheet against your original documents to verify extraction accuracy.

AI Uncertainty Notes

When the AI has to make an assumption, it tells you what it assumed — and suggests how to make your prompt more explicit next time. No notes means no assumptions were needed.

Multiple possible matches

Your prompt says ‘Total’ but the document has both a line item total and an invoice total — the AI tells you which one it used.

Inexact field names

Your prompt says ‘Net’ but the document labels it ‘Subtotal’ — the AI tells you how it matched them.

Mixed document types

Your files contain invoices with attached purchase orders — the AI tells you which pages it extracted from and which it ignored.

Unspecified scenarios

Your prompt didn’t mention credit notes but the AI encountered one — it tells you how it handled it, such as treating amounts as negative.

Example AI Notes

AI Extraction Assistant Notes

Post-extraction report · 1 observation · 2 prompt suggestions

During this extraction, I made some interpretations based on your instructions and the documents provided. Review these notes to ensure the results match your expectations.

1

Documents to extract from

Your files often contain a ‘Tax Invoice’ with an attached ‘Delivery Note’. I treated the ‘Tax Invoice’ pages as the main source of data, and ignored the attached ‘Delivery Note’ pages as supporting context.

Suggested prompt additions

To confirm this handling:

Extract from ‘Tax Invoice’ only

To extract from both:

Extract from ‘Tax Invoice’ and ‘Delivery Note’

Ready to extract data from your documents?

50 pages free every month·No credit card required

Try it now