Bank Statement Extraction Prompt: Fields, JSON, and Checks

A bank statement extraction prompt should define the statement-level fields, transaction-level fields, output format, and validation checks before asking the AI to extract data. It should tell the AI to return the statement-period opening and closing balances, ignore current or available balances unless they clearly belong to that same period, keep one row per posted transaction, and flag any mismatch between opening balance plus credits minus debits and closing balance.

Use this bank statement extraction prompt as a starting point, then adjust the field names to match your bookkeeping, reconciliation, or review workflow:

Prompt: Extract structured data from this bank statement. Return statement-level fields and transaction-level fields separately.

For the statement, return: bank name, account holder, masked account number or account identifier, account type if present, statement period start date, statement period end date, opening balance, closing balance, currency, source file name, and source page for each value.

For transactions, return one row or object per posted transaction. For each transaction, return: posted date, transaction date if separately shown, description, reference or check number, debit amount, credit amount, signed amount, running balance if shown, source page, and category only if I explicitly request categorization.

Return valid JSON with top-level statement, transactions, validation, and exceptions fields.

Use stable field names. Format dates as ISO dates. Return money amounts as numbers, not text. Use null when a field is missing or unreadable. Do not infer missing transactions, invent categories, merge separate rows, or treat pending activity as posted statement activity.

Treat opening balance and closing balance as balances for the statement period. Do not use current balance, available balance, or online banking balance as the statement closing balance unless the document clearly labels it as the closing balance for this statement period.

Validate the result. Check whether opening balance plus total credits minus total debits equals closing balance within a small tolerance. Flag missing pages, duplicate transaction rows, inconsistent running balances, unreadable values, and any balance ambiguity instead of silently correcting them.

The important shift is that the prompt is not just asking for bank statement data extraction. It is defining a small extraction schema, naming the balance rules, and telling the model what uncertainty should look like in the output.

Turn the Prompt Into a Field Schema

A strong bank statement parser prompt separates statement metadata from transaction rows. Statement fields describe the document and account period. Transaction fields describe posted activity inside that period. Mixing them together makes the output harder to validate because balances, dates, and row counts stop having clear boundaries.

At the statement level, ask for the bank name, account holder, masked account number or account identifier fragment, account type when present, statement period start date, statement period end date, opening balance, closing balance, currency, source file, and source page. If the document structure is unfamiliar, a short review of common bank statement fields and components helps clarify which labels belong to the account summary and which belong to transaction detail.

At the transaction level, ask for posted date, transaction date when the statement shows it separately, description, reference or check number, debit, credit, signed amount, running balance, and source page. Use category only when categorization is part of the task. A prompt to extract bank statement transactions should not assign bookkeeping categories by default because categories require judgment beyond reading the statement row.

The row rule matters: one posted transaction should become one row or object. Repeated page headers, account messages, interest summaries, pending transactions, advertisements, carried-forward labels, and subtotal lines should not become transaction rows. If the reader only needs a spreadsheet file and not the prompt design itself, the adjacent workflow is how to convert bank statement PDFs to Excel or CSV, and region-specific variants such as turning Indian bank statements and UPI records into Tally-ready Excel rows follow the same pattern, but the extraction prompt still needs the same field discipline.

Balance Labels Are the Part Most Prompts Get Wrong

Opening balance and closing balance should refer to the statement period. Current balance and available balance may reflect the online account state at the moment the statement was viewed, printed, or downloaded. If a prompt does not make that distinction, an AI can return a plausible balance that does not reconcile to the statement's transactions.

A ledger balance is the posted account balance before pending activity. An available balance may subtract holds, pending card purchases, or other items that are not yet posted. A running balance is different again: it is the row-by-row balance after individual posted transactions. For extraction, the safest instruction is to return the balance label exactly as shown, map only statement-period opening and closing balances into opening_balance and closing_balance, and use an exception field when the document only shows current, available, or pending balances.

This field model is not arbitrary. In the United States, under CFPB Regulation E periodic statement requirements, periodic statements for accounts with electronic fund transfers must include transaction information, applicable fees, and the balance at the beginning and close of the statement period. That source is U.S.-specific, but it supports the broader extraction principle: beginning and closing balances are natural statement fields because they tie the account summary to the period's transactions.

For an opening balance closing balance JSON task, the prompt should not ask the AI to pick "the balance" from the page. It should ask for opening_balance, closing_balance, balance_label_used, balance_source_page, and balance_ambiguity_flag. That gives a reviewer a clean path to catch cases where the only visible balance is current, available, or tied to an online banking snapshot rather than the statement period.

Use JSON Rules That Preserve Accounting Meaning

A bank statement JSON prompt should tell the AI how to represent uncertainty, not just which fields to extract. The output needs to be easy to load into a spreadsheet or system, but it also needs to preserve the difference between a blank field, an unreadable field, and a field the statement never provided.

One compact shape is enough for most bank statement extraction schema work:

{
  "statement": {
    "bank_name": "string",
    "account_holder": "string",
    "masked_account_number": "string or null",
    "statement_period_start": "YYYY-MM-DD",
    "statement_period_end": "YYYY-MM-DD",
    "opening_balance": 0,
    "closing_balance": 0,
    "currency": "string",
    "source_page": 1
  },
  "statement_sources": {
    "bank_name": 1,
    "account_holder": 1,
    "statement_period_start": 1,
    "statement_period_end": 1,
    "opening_balance": 1,
    "closing_balance": 1,
    "currency": 1
  },
  "transactions": [
    {
      "posted_date": "YYYY-MM-DD",
      "transaction_date": "YYYY-MM-DD or null",
      "description": "string",
      "reference_number": "string or null",
      "debit": 0,
      "credit": 0,
      "signed_amount": 0,
      "running_balance": 0,
      "source_page": 1
    }
  ],
  "validation": {
    "opening_plus_credits_minus_debits_equals_closing": true,
    "difference": 0,
    "tolerance": 0.01
  },
  "exceptions": []
}

The prompt should require stable field names, ISO dates, numeric amounts as numbers, and explicit nulls for missing values. Transaction date, reference number, category, and running balance should be null when the statement does not show them clearly. Guessing those fields makes the JSON look complete while making it less trustworthy.

Source references are part of the schema, not an audit extra. Ask for a source page on statement-level balances and on every transaction row so a reviewer can trace suspicious values back to the document. That matters when scanned pages are faint, pages are missing, or a statement repeats headers and carried-forward labels across page breaks.

The same discipline applies to cleanup instructions. Do not ask the AI to fill gaps, merge similar rows, or silently correct totals. Ask it to report exceptions in a separate array with the affected field, source page, and reason, so the output can be reviewed without losing the original document evidence.

Add Reconciliation Checks Before You Trust the Output

The first check is arithmetic. Opening balance plus total credits minus total debits should equal closing balance within the tolerance you set in the prompt. If the difference is outside that tolerance, the output should not be auto-corrected. It should carry an exception flag with the calculated difference and the fields used in the check.

Running balances give a second line of defense when the statement includes them. Credits should move the running balance in the expected direction, and debits should move it the other way. If the direction changes unexpectedly, the prompt should tell the AI to flag the row rather than reinterpret the transaction. A bank statement may omit running balances entirely, but missing running balances are different from inconsistent ones.

The prompt should also ask for document-level warnings: missing pages, duplicate transaction rows, repeated page headers treated as rows, subtotals mistaken for transactions, multi-account statements, unreadable amounts, and mixed currencies. These issues are common enough that a clean-looking extraction without exceptions can be more dangerous than an extraction that admits uncertainty. The same arithmetic and exception discipline also carries into downstream reconciliation tasks, such as how to match Hong Kong card settlement deposits to bank statements, where extracted statement rows become one side of a tiered match.

Categorization belongs in a separate instruction unless the workflow truly needs it at extraction time. A model can assign categories that look reasonable while masking whether the statement row itself was read correctly. If classification is the next step, use extraction output as the source and then categorize extracted bank transactions for bookkeeping with a category policy that fits the chart of accounts or reporting purpose.

Make the Prompt Repeatable for Real Bank Statement Workflows

A one-off AI test can tolerate manual cleanup. A recurring finance workflow cannot. Once the same extraction task repeats across clients, accounts, months, or batches, the prompt needs stable field names, consistent null handling, reviewable exceptions, and exports that fit the next system or spreadsheet.

That is where a prompt-based extraction workflow differs from pasting a statement into a general chat window. In Invoice Data Extraction, users can upload bank statements and other financial documents, describe the fields they want in a natural-language prompt, save and reuse prompts, process batches, and export the result as Excel, CSV, or JSON. For finance teams that need to extract bank statement data with a reusable prompt, the value is not just the first extraction. It is getting the same schema and review process every time the job recurs.

Developers who need to embed the workflow in software have a separate path: bank statement extraction API design for developer workflows. That is a different problem from writing the prompt itself. The prompt still needs the same field model, balance rules, and exception handling whether it is used in a web interface, a spreadsheet workflow, or an API-backed process.

A reusable bank statement data extraction prompt is ready when it names the statement fields, names the transaction fields, disambiguates opening and closing balances from current and available balances, uses explicit nulls, keeps source-page references, and reports reconciliation exceptions instead of hiding them.

Use this bank statement extraction prompt as a starting point, then adjust the field names to match your bookkeeping, reconciliation, or review workflow:

Prompt: Extract structured data from this bank statement. Return statement-level fields and transaction-level fields separately.

Return valid JSON with top-level statement, transactions, validation, and exceptions fields.

Turn the Prompt Into a Field Schema

Balance Labels Are the Part Most Prompts Get Wrong

Use JSON Rules That Preserve Accounting Meaning

One compact shape is enough for most bank statement extraction schema work:

{
  "statement": {
    "bank_name": "string",
    "account_holder": "string",
    "masked_account_number": "string or null",
    "statement_period_start": "YYYY-MM-DD",
    "statement_period_end": "YYYY-MM-DD",
    "opening_balance": 0,
    "closing_balance": 0,
    "currency": "string",
    "source_page": 1
  },
  "statement_sources": {
    "bank_name": 1,
    "account_holder": 1,
    "statement_period_start": 1,
    "statement_period_end": 1,
    "opening_balance": 1,
    "closing_balance": 1,
    "currency": 1
  },
  "transactions": [
    {
      "posted_date": "YYYY-MM-DD",
      "transaction_date": "YYYY-MM-DD or null",
      "description": "string",
      "reference_number": "string or null",
      "debit": 0,
      "credit": 0,
      "signed_amount": 0,
      "running_balance": 0,
      "source_page": 1
    }
  ],
  "validation": {
    "opening_plus_credits_minus_debits_equals_closing": true,
    "difference": 0,
    "tolerance": 0.01
  },
  "exceptions": []
}

Bank Statement Extraction Prompt: Fields, JSON, and Checks

Turn the Prompt Into a Field Schema

Balance Labels Are the Part Most Prompts Get Wrong

Use JSON Rules That Preserve Accounting Meaning

Add Reconciliation Checks Before You Trust the Output

Make the Prompt Repeatable for Real Bank Statement Workflows

Extract invoice data to Excel with natural language prompts

How to Categorize Bank Transactions from PDFs

How to Convert Bank Statement to Excel (5 Methods Compared)

How to Convert Revolut Statements to Excel

Bank Statement Extraction Prompt: Fields, JSON, and Checks

Turn the Prompt Into a Field Schema

Balance Labels Are the Part Most Prompts Get Wrong

Use JSON Rules That Preserve Accounting Meaning

Add Reconciliation Checks Before You Trust the Output

Make the Prompt Repeatable for Real Bank Statement Workflows

Extract invoice data to Excel with natural language prompts

How to Categorize Bank Transactions from PDFs

How to Convert Bank Statement to Excel (5 Methods Compared)

How to Convert Revolut Statements to Excel