Chinese Fapiao to Excel: Extract Tax Invoice Data

Convert Chinese fapiao and e-fapiao to Excel with tax fields, line items, source-file references, and a verification-aware workflow.

Published
Updated
Reading Time
11 min
Topics:
Invoice Data ExtractionExcelChinafapiaoe-fapiaoChinese VAT invoices

Chinese fapiao to Excel conversion is not just Chinese OCR. The useful output is a spreadsheet where each received tax invoice becomes structured data: buyer and seller details, invoice number, issue date, line items, tax rate, tax amount, totals, remarks, issuer, and a reference back to the source file.

That distinction matters because a fapiao is a tax-controlled invoice document. For AP and bookkeeping work, the goal is to extract fapiao data into columns that support review and posting, while keeping official verification as a separate step through the relevant Chinese tax channel when compliance requires it. The Excel file should help the finance team work faster; it should not pretend to be the tax authority.

For fully digitalized electronic invoices, the State Taxation Administration's Announcement No. 11 of 2024 says digital e-fapiao have the same legal validity as paper invoices and lists the basic face fields: invoice name, invoice number, issue date, buyer and seller information, item name, quantity, unit price, amount, tax rate or levy rate, tax amount, totals, remarks, and issuer. That official digital e-fapiao field list is a useful basis for deciding what your China fapiao to Excel schema should preserve.

This is why a good mainland China invoice to spreadsheet workflow separates three jobs:

  • Extraction: turn the visible invoice data into structured Excel, CSV, or JSON fields.
  • Verification: check the invoice through the tax digital account or official VAT invoice verification process when your policy requires it.
  • Archiving: retain the original PDF, image, OFD/XML artifact, or verification record alongside the spreadsheet row.

English-speaking regional teams often sit at the awkward middle of this process. The documents may be in Chinese, partly bilingual, or generated from a mainland e-fapiao workflow, while the reviewer, controller, outsourced bookkeeper, or shared-service AP team works in English. A reliable e-fapiao to Excel process has to respect both sides: the Chinese tax fields on the document and the spreadsheet structure the finance team uses for close, reconciliation, and audit follow-up.

Why Fapiao Need More Than Generic Invoice OCR

A fapiao is not just evidence that a supplier charged an amount. In mainland China, it is part of the tax invoice system, so fields such as invoice identity, buyer and seller information, item name, tax rate, tax amount, total, and issuer can all matter during accounting review. A spreadsheet that captures only supplier, date, and total may be enough for some ordinary invoices, but it is too thin for Chinese VAT invoice OCR when the team needs traceability.

Generic OCR usually fails in one of two ways. It may return a block of recognized Chinese text with no finance structure, leaving AP staff to decide which number is the taxable amount, which is tax, and which is the tax-inclusive total. Or it may force the document into a generic invoice template that loses fapiao-specific fields such as invoice code where present, issuer, specification/model, levy rate, or remarks. In both cases, the OCR worked, but the finance task did not.

The format mix adds another layer. A team might receive scanned paper fapiao, exported PDF e-fapiao, screenshots, JPG or PNG images, or original OFD/XML artifacts from the e-fapiao ecosystem. Those files do not all behave the same operationally. PDF and image copies are often practical for extraction into a spreadsheet, while official source artifacts and verification records should still be preserved for compliance and audit support. Treating every format as if it were the same flat image invites missing fields and weak source traceability.

The distinction is especially important for regional teams that handle both Hong Kong and mainland China documents. A Hong Kong bilingual commercial invoice may present language and layout challenges, but a mainland fapiao carries a different tax-control context. Teams that work across the border need to understand Hong Kong and mainland China fapiao compliance differences before they standardize one extraction template for every Chinese-language supplier document.

The Excel Columns to Capture From a Fapiao

The safest fapiao data extraction schema keeps normalized finance fields and source references in the same row. Normalized fields make the spreadsheet usable for AP review, VAT analysis, posting, and reconciliation. Source references let a reviewer trace each extracted value back to the original document when something looks wrong.

Use these column groups as the starting point:

  • Document identity: invoice name, invoice number, invoice code where present, issue date, invoice type, source file name, page number.
  • Buyer: buyer name, buyer tax ID, buyer address or phone where present, buyer bank details where present.
  • Seller: seller name, seller tax ID, seller address or phone where present, seller bank details where present.
  • Line items: item name or description, specification/model, unit, quantity, unit price, amount excluding tax.
  • Tax: tax rate or levy rate, tax amount, tax classification if visible.
  • Totals: total amount excluding tax, total tax amount, tax-inclusive total.
  • Review and audit: remarks, issuer, reviewer status, exception notes, verification status, verification date.

Line-item capture is the part most likely to break if the workflow treats a fapiao like a receipt. A single invoice may contain multiple goods or services, each with its own quantity, unit price, amount, and tax treatment. If those lines are collapsed into one summary row, the AP team may still be able to book the total, but it loses the detail needed to investigate price variances, tax mismatches, or supplier disputes.

Do not assume one layout will cover every document. Older paper fapiao, exported e-fapiao PDFs, and fully digitalized e-fapiao can expose fields in different positions and with different visual density. A durable extract Chinese tax invoice fields workflow should use stable output columns while keeping raw references such as file name, page number, and original invoice number. That lets the spreadsheet remain consistent even when the visible document layout changes.

The same schema can support Excel, CSV, or JSON. Excel is usually best for human AP review because people can filter exceptions, add notes, and reconcile totals. CSV works well for loading into accounting tools or data-cleaning workflows. JSON is better when the extracted fapiao records feed an automation pipeline and need nested line items rather than a flat sheet.

A Practical Extraction Workflow for PDF and Image Fapiao

Start by deciding which files are extraction inputs and which files are compliance records. For many finance teams, the practical extraction set will be PDF, JPG, and PNG copies of the fapiao, because those formats can be uploaded and converted into spreadsheet data. If the supplier also provides OFD/XML artifacts, preserve them with the invoice record. Do not discard the source artifact just because the review team works from an exported PDF.

A practical workflow looks like this:

  1. Gather the fapiao files and keep the original file names intact.
  2. Preserve original OFD/XML or official download artifacts outside the extraction step if your team receives them.
  3. Upload supported PDF, JPG, or PNG copies for extraction.
  4. Prompt for the exact schema: invoice identity, buyer, seller, line items, tax fields, totals, remarks, issuer, source file name, and review status.
  5. Export the result as Excel for human review, CSV for system import, or JSON for automation.
  6. Review exceptions against the original document before posting or archiving.

Invoice Data Extraction fits the extraction part of that workflow: users upload PDF, JPG, or PNG files, describe the fields they need in a natural-language prompt, and download XLSX, CSV, or JSON output. For a fapiao batch, the prompt can ask for invoice number, issue date, buyer and seller tax IDs, item descriptions, specification/model, quantity, unit price, amount, tax rate, tax amount, tax-inclusive total, remarks, issuer, source file, and page number. That is the practical role of invoice data extraction to Excel: turning received documents into consistent rows and columns for finance review.

This prompt-based approach is useful because fapiao are not always shaped like a generic supplier invoice. A fixed template might miss a levy rate, merge line items, or ignore a remarks field that matters later. A prompt lets the finance team specify the spreadsheet schema directly, then reuse that instruction across batches. The product also supports large mixed-format batches of up to 6,000 files and major language scripts including East Asian scripts, which matters when a close process includes Chinese-language documents from several suppliers.

For a broader version of the same workflow, the underlying steps are similar to how teams convert PDF invoices to Excel: define the fields, extract into a structured file, review exceptions, and keep source references. The difference with fapiao is that the field list needs to be tax-aware from the beginning, and official verification remains outside the spreadsheet extraction tool.

Verification, Review, and Archiving After Extraction

The Excel output should make review faster, but it should not become the only record. Treat the spreadsheet as a working dataset for AP, bookkeeping, reconciliation, and exception management. When the organization needs official confirmation, verification belongs in the relevant Chinese tax digital account or national VAT invoice verification channel, not in the OCR result.

The first review pass should focus on the fields that affect posting and tax treatment. Compare the invoice number, issue date, buyer and seller details, tax rate, tax amount, and tax-inclusive total against the source file. If the invoice has multiple line items, check that each row carries the right description, quantity, unit price, amount, and tax amount. Any row with uncertainty should get a review status or exception note rather than being silently treated as final.

A second pass should reconcile extracted totals against the supplier statement, ERP entry, or payment request. This catches problems that field-by-field review can miss, such as a skipped page, a duplicate source file, or a line-item split that changed the apparent total. For Chinese VAT invoice OCR, the most expensive error is often not a single mistranscribed character; it is a spreadsheet that looks complete while missing the source context needed to investigate the number.

Archiving should connect the extracted row to the source record. Keep the original PDF or image, any OFD/XML artifact the supplier provided, any official download or verification evidence, and the exported spreadsheet together under a naming convention the finance team can search later. The spreadsheet columns for source file name, page number, verification status, and exception notes are small additions, but they make the audit trail much easier to reconstruct.

Data handling also matters when fapiao contain supplier, buyer, and tax information. Invoice Data Extraction secures data with HTTPS/TLS in transit and AES-256 at rest. Uploaded source documents and processing logs are automatically deleted within 24 hours of processing, while generated spreadsheet outputs are retained for 90 days for re-download unless the user deletes them earlier from the dashboard.

When a Dedicated Fapiao Workflow Is Worth It

A dedicated China fapiao to Excel workflow is worth building when the documents recur. One or two supplier invoices can be keyed manually if the risk is low. Monthly mainland China supplier batches, regional AP close pressure, audit requests, and repeated tax-field review need something more consistent than ad hoc copying from a PDF.

The case is stronger when the finance team handles mixed document sets. A Hong Kong invoice, a mainland fapiao, a customs invoice, and a bilingual supplier statement may all arrive in the same AP inbox, but they do not need the same extraction schema. If your team already has a process for Hong Kong bilingual invoice to Excel, use that as an adjacent workflow, not as a substitute for mainland fapiao fields.

The best starting point is a representative batch rather than a perfect master template. Pick a group of real fapiao that includes paper scans, e-fapiao PDFs, and the suppliers most likely to appear again. Extract the fields into the schema above, review exceptions with AP or bookkeeping staff, and adjust the prompt or column list where the spreadsheet misses a decision-critical field.

Only scale the workflow after that review. The repeatable version should define the columns, preserve source-file references, flag exceptions, keep verification separate, and archive the original documents with the extracted rows. That is the difference between fapiao data extraction that merely produces a spreadsheet and a finance workflow that can survive month-end review.

Extract invoice data to Excel with natural language prompts

Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.

Exceptional accuracy on financial documents
1–8 seconds per page with parallel processing
50 free pages every month — no subscription
Any document layout, language, or scan quality
Native Excel types — numbers, dates, currencies
Files encrypted and auto-deleted within 24 hours
Continue Reading