Extraction Guide
Upload your documents, describe what you need, and download structured Excel data — it only takes a few clicks
Quick Start
Upload files
PDFs or images containing the data you want to extract.
Describe what to extract
Optionally provide a prompt in your own words — or let the AI decide.
Download your spreadsheet
Your extracted data, structured in a downloadable Excel file.
Uploads
Upload PDFs and images for extraction — from a single document to thousands at once.
Supported File Types
PDF Files
- Regular & scanned PDFs
- Up to 150MB per file
- Up to 5,000 pages long
- Can contain multiple invoices
Images
- JPG and PNG files
- Up to 5MB per file
- One invoice per image
Batch Processing
Process up to 6000 documents in a single extraction task. Need larger batches? Message us and we can accommodate your needs.
Upload batches exactly as you receive them. The AI can handle invoices, credit notes, and statements in the same job, applying different rules to each. Email cover pages, remittance advice, and other non-invoice pages can be automatically filtered out via your prompt. See Prompt Capabilities for examples.
Prompt
Tell the AI what to extract — from a simple list of fields to detailed business rules.
Let AI Write Your Prompt
Not sure where to start? Upload your files, click Suggest prompt, and the AI generates a tailored extraction prompt for you.
How it works
- The AI examines your documents to identify key data points
- A tailored extraction prompt is generated automatically
- Review and adjust, then save to your library for reuse
Writing Prompts
Start simple and add detail when you need precise control. When your prompt leaves something unspecified, the system uses conservative, accounting-friendly judgment.
Simple Prompts
List the fields you need. The AI selects formats and handles document structure.
“Extract invoice number, invoice date, vendor name, net amount, tax, total”
→ AI returns exactly these fields, correctly formatted
“I’m processing invoices for payment. Extract invoice number, date, vendor, amount due, payment terms”
→ Goal helps AI handle edge cases; listed fields define the output
“Extract line items: description, quantity, unit price, line total”
→ AI creates one row per line item with invoice details repeated
Detailed Prompts
Define exact fields, formats, and business rules for repeatable, auditable workflows.
“I'm preparing AP data for our month-end close.
Extract:
- Invoice Number (alphanumeric, top-right)
- Invoice Date (YYYY-MM-DD)
- Vendor Legal Name (prefer extracting from footer)
- Net Amount (pre-tax invoice total)
- VAT Rate (if no VAT is listed use 0, use Excel type percentage)
- VAT Amount (if no VAT is present use 0)
- Total Amount (invoice final total)
- Document Type (classify as Invoice or Credit Note)
- - For credit notes prefix Invoice Number with 'CR-' and show amounts as negative.
- - One row for each invoice or credit note.
- - Skip any pages that are email cover sheets or summary pages.”
Describe Your Goal for Smarter Results
Adding context about your finance process helps the AI handle edge cases you haven't anticipated.
“Extract invoice number, date, and total”
“I'm processing supplier invoices for payment. Extract invoice number, date, and total”
Why this helps
Describing your goal — such as the finance process you're performing — helps the AI make smarter decisions about edge cases (like how to handle bundled documents or ambiguous values) so you don't have to anticipate every scenario in your prompt.
Example goals
Prompt Controls & Capabilities
Define exact fields, formats, and rules to control how your data is extracted and structured.
Fields & Output Structure
Define which fields to capture, how to name columns, and what each row represents.
“Extract 'Invoice Number', 'Invoice Date', and 'Total'”
“Use the column header 'Supplier_Name'”
“Create one row per invoice”
“One row per line item, repeat 'Invoice Number' on each row”
Business Logic & Rules
Set hints, default values, and conditional logic to handle real-world variations.
Hint“'Product Code' is in the 'Description' column, begins with 'SKU-'”
Default“If 'Tax Amount' is missing, set to 0”
Fallback“Find 'PO Number' in header, else use 'Reference'”
Conditional“If 'Currency' is 'USD', use 'State Tax'; if 'EUR', use 'VAT'”
Document & Page Handling
Apply rules to specific document types or filter out unwanted pages.
“Ignore pages titled 'Email Cover Sheet'”
“For credit notes, prefix Invoice Number with 'CR-' and show amounts as negative”
“From Statements of Account, extract each invoice as a separate row”
Data Formatting & Classification
Control how values are stored in Excel and automatically categorize transactions.
“Format all dates as YYYY-MM-DD”
“Ensure all currency fields have 2 decimal places”
“Add an 'Expense Category' column — classify as 'Office Supplies', 'Software', 'Travel', or 'Utilities'”
“Add a 'Payment Priority' column — 'Urgent' if overdue or due within 7 days, else 'Standard'”
Note: Your local Excel settings may display native Excel types (i.e. numbers, dates) according to your settings.
Prompt Library
Save prompts for one-click reuse — ensuring consistent, repeatable results across future batches.
Save and reuse
Save prompts by workflow, client, or document type. Apply any saved prompt with one click to ensure every batch follows the same extraction rules.
When to use separate prompts
You don't need separate prompts for different vendors or layouts — the AI adapts. Use separate prompts when the extraction logic itself differs:
- Different tasks (VAT reporting vs. expense tracking)
- Different document types (invoices vs. bank statements)
- Client-specific output formats
- Vendor-specific handling rules
Structured Prompt
An alternative way to provide extraction instructions — same capabilities as the free-text prompt, with guaranteed column headers and order.
When to Use Structured Prompt
With Structured Prompt, your spreadsheet is guaranteed to use your exact column headers in your exact order. With the free-text Prompt, column names and order are typically consistent but the AI may make adjustments unless you explicitly specify otherwise.
Components of a Structured Prompt
Column Headers
Each column header becomes an exact column name in your spreadsheet. Use clear, descriptive names that convey the data point's meaning.
Good column names
Avoid
Per-Column Prompt (Optional)
Add specific guidance for individual columns. Use these to clarify ambiguities or specify formatting for that particular data point.
"The date the invoice was issued, NOT the payment due date"
"Do not include currency symbol, use 2 decimal places"
"Extract from the table beneath Description"
"If crossed out and handwritten, use the handwritten value"
Additional Task-Wide Prompt (Optional)
Add instructions that apply to the entire extraction, not just a single column. This is where you describe your goal, specify what each row should represent, and set task-wide formatting or handling rules.
Examples of task-wide instructions:
Tips
Describe your goal in the Task-Wide Prompt
When the AI understands your workflow, it makes smarter decisions about edge cases. For example, knowing you're “processing for payment” tells the AI that bundled documents should be treated as one transaction. Add context like “I need this for quarterly VAT return” or “I'm reconciling against POs.”
Review AI Notes to refine your prompt
After extraction, check AI Notes for any assumptions the AI made. Use this feedback to add clarifying instructions to your structured prompt for future extractions.
Results
Review, download, and verify your extracted data.
Download Formats
Your output can be downloaded as Excel (.xlsx), CSV (.csv), or JSON (.json). Excel files use native spreadsheet cell types where applicable. CSV and JSON return extracted values as text.
Reviewing Your Output
Review failed pages
Failed pages are flagged in the Status & Results column. Click View Pages to see details and re-attempt if needed.
Check for AI Notes
If the AI made assumptions, a badge appears in the AI Notes column — each note includes copyable prompt suggestions so you can be more explicit next time.
Source file column
The last column always shows which uploaded file and page each row was extracted from.
Spot-check for accuracy
Review random samples from the spreadsheet against your original documents to verify extraction accuracy.
AI Uncertainty Notes
When the AI has to make an assumption, it tells you what it assumed — and suggests how to make your prompt more explicit next time. No notes means no assumptions were needed.
Multiple possible matches
Your prompt says ‘Total’ but the document has both a line item total and an invoice total — the AI tells you which one it used.
Inexact field names
Your prompt says ‘Net’ but the document labels it ‘Subtotal’ — the AI tells you how it matched them.
Mixed document types
Your files contain invoices with attached purchase orders — the AI tells you which pages it extracted from and which it ignored.
Unspecified scenarios
Your prompt didn’t mention credit notes but the AI encountered one — it tells you how it handled it, such as treating amounts as negative.
AI Extraction Assistant Notes
Post-extraction report · 1 observation · 2 prompt suggestions
During this extraction, I made some interpretations based on your instructions and the documents provided. Review these notes to ensure the results match your expectations.
Documents to extract from
Your files often contain a ‘Tax Invoice’ with an attached ‘Delivery Note’. I treated the ‘Tax Invoice’ pages as the main source of data, and ignored the attached ‘Delivery Note’ pages as supporting context.
To confirm this handling:
Extract from ‘Tax Invoice’ onlyTo extract from both:
Extract from ‘Tax Invoice’ and ‘Delivery Note’Ready to extract data from your documents?
50 pages free every month·No credit card required