Bordereau Data Extraction: Automating Insurance BDX Processing

Guide to extracting structured data from insurance bordereaux. Covers BDX types, key fields, format challenges, and how AI extraction automates processing.

Published
Updated
Reading Time
18 min
Topics:
Financial DocumentsInsurancebordereau processingreinsurancedelegated authorityMGA operations

A bordereau (plural: bordereaux, abbreviated BDX) is a periodic report sent by an MGA, coverholder, or ceding company to a reinsurer or carrier, detailing policies written or claims incurred during a reporting period. The four main types are premium bordereaux, claims/loss bordereaux, combined bordereaux, and reinsurance technical accounts. Automating bordereau data extraction eliminates the manual cleansing required when each counterparty sends data in different formats, column layouts, and code sets — a problem that scales directly with the number of delegated authority relationships you manage.

Each type carries distinct data elements that downstream systems depend on:

  • Premium bordereau: Lists policies bound during the reporting period. Key fields include policy number, insured name, inception and expiry dates, gross written premium (GWP), broker and MGA commission, applicable taxes, and net premium due to the carrier or reinsurer.
  • Claims/loss bordereau: Details claims reported or developed since the last submission. You'll find claim reference numbers, date of loss, paid amounts (indemnity and expense), outstanding reserves, claim status codes, and insured or claimant identifiers.
  • Combined/aggregate bordereau: Reconciles premium and claims data into a single document for a given program or treaty, giving both sides a unified view of how a book is performing.
  • Reinsurance technical account (statement of account): The formal settlement document. It nets premiums against losses, IBNR reserves, ceding commissions, brokerage, management expenses, and reinstatement premiums to arrive at a balance due between cedant and reinsurer.

The volume of these documents is substantial and growing. Delegated authority arrangements — where MGAs and coverholders bind risk on behalf of carriers — now account for over 40% of Lloyd's total premium volume, and the proportion continues to increase as carriers lean on coverholders for distribution and underwriting in niche segments. More delegated programs means more counterparties, more reporting periods, and more bordereaux flowing into your operations team every month.

This is where insurance bordereau processing becomes an operational bottleneck. Every counterparty sends bordereaux in different formats, layouts, and coding conventions, and the extraction and normalization challenge compounds with every new MGA or coverholder relationship you onboard. At scale, bordereaux management becomes less about insurance expertise and more about wrangling inconsistent data into a canonical format your systems can consume.


Why Bordereau Processing Breaks Down at Scale

The core problem is format inconsistency at every level. Every MGA, coverholder, and broker submits bordereaux in a different layout. Column headers vary. Date formats shift between DD/MM/YYYY, MM/DD/YYYY, and ISO conventions depending on the counterparty's home market. Currency fields might use symbols, ISO codes, or nothing at all. Internal code sets for lines of business, peril types, and coverage classes are bespoke to each program.

Documents arrive as PDFs exported from legacy policy admin systems, multi-tab Excel workbooks with embedded macros, flat CSVs with inconsistent delimiters, and occasionally password-protected spreadsheets where the password itself lives in a separate email thread. Carriers and MGAs sometimes change their formats mid-program without advance notice, silently breaking whatever manual ingestion process was in place. A column gets renamed, a new sub-tab appears, or a previously numeric field starts containing text qualifiers.

There is no universal bordereau format standard. Unlike payment files (which converge around ISO 20022) or trade confirmations (which follow FIX or FpML schemas), MGA bordereau reporting operates without a mandated document structure. Program guidelines from carriers and Lloyd's of London managing agents often run tens of pages specifying what data to report — required fields, validation rules, acceptable code values — but rarely prescribe how the document itself should be formatted. The closest historical effort at bordereau format standardization was RETACC, a UN/EDIFACT message type designed for reinsurance technical accounts. It never achieved broad adoption. The industry moved on without an equivalent of the structured formats that other financial domains take for granted.

This gap lands squarely on operations teams. Staff manually cleanse, validate, and remap data from dozens of distinct layouts into internal bordereaux management systems or data warehouses. When an insurer manages 20 delegated authority relationships, the mapping effort is manageable. At 50 or 100+ relationships — common for large carriers and Lloyd's syndicates — it becomes the operational bottleneck. Analysts spend more hours reformatting data than actually analyzing exposure, premium adequacy, or loss trends. Organizations that have explored automating financial document processing workflows in other areas often find bordereaux to be the last and hardest category to tackle, precisely because of this format diversity.

The delay compounds the damage. Manual processing introduces days or weeks of lag between when a coverholder binds a risk and when the carrier sees that data in a usable form. Insurers end up operating on stale exposure information: by the time a bordereau is cleansed and loaded, the underlying risk picture has already shifted. Pricing decisions rely on outdated loss ratios. Reserve estimates miss recent large losses. Regulatory reporting — particularly for Solvency II and Lloyd's performance management data — draws from data that is already weeks behind the market.

This is not a niche complaint. Deloitte's ceded reinsurance survey found that 71% of ceded reinsurance professionals said late data leads to manual activity, workarounds, and inaccuracies in their reinsurance operations. The downstream effects touch every function that depends on bordereau data being timely and accurate.

Then there is error propagation. Manual rekeying between spreadsheets — copying policy references, transposing premium figures, converting currencies by hand — creates entry points for mistakes that travel silently through the data pipeline. A wrong class-of-business code in the source bordereau gets carried into the carrier's underwriting system. A transposed loss reserve figure flows into quarterly reporting. A currency mismatch between the bordereau and the treaty terms distorts premium reconciliation by thousands. These errors often surface only during downstream reconciliation or audit, at which point tracing the root cause back through months of manually processed files is its own project. The direct consequences hit premium reconciliation accuracy, loss reserving integrity, and compliance reporting reliability.


Data Fields That Matter: Premium vs Claims Bordereaux

The difference between a useful extraction workflow and a broken one comes down to knowing exactly which fields to target and why each one matters operationally. Premium and claims bordereaux carry distinct data structures, and each field serves a specific function in reinsurance accounting, reserving, and settlement.

Premium Bordereau: Key Extraction Fields

Unique Market Reference (UMR) or policy number acts as the primary key linking each row to the underlying risk. Every downstream process, from reconciliation to loss ratio calculation, depends on this identifier being extracted accurately and consistently.

Insured name and risk description identifies who and what is covered under the policy. These fields are often free-text, making them harder to normalize but essential for exposure analysis and audit trails.

Inception and expiry dates determine the policy period. These dates drive earned premium calculations, so even a single transposed digit can distort financial reporting for an entire quarter.

Gross written premium (GWP) represents the total premium before any deductions. This is the top-line figure from which all other premium-related calculations flow.

Commission rate and amount captures the intermediary's compensation. Accurate extraction here is essential for reconciliation against commission statements. Teams already extracting data from insurance commission statements will recognize that matching commission figures across documents is one of the most error-prone manual tasks in delegated authority accounting.

Brokerage is the broker's fee, often carried as a separate line item from commission. Some bordereaux combine the two; others split them across multiple columns. Your extraction logic needs to handle both patterns.

Tax amounts vary by jurisdiction and must be extracted as discrete fields rather than lumped into a single deduction. Regulatory compliance requires granular tax reporting, and mixing tax types during extraction creates problems that surface months later during audits.

Net premium due is the settlement amount after deducting commission, brokerage, and tax from gross premium. This is what actually changes hands between counterparties, so it serves as the final reconciliation checkpoint.

Currency and exchange rate are critical for multi-currency reinsurance programs where premiums arrive in USD, GBP, EUR, and other currencies within the same bordereau. Missing or misread exchange rate fields force manual lookups that slow settlement.

Risk location or territory codes feed directly into exposure aggregation and catastrophe modeling. A Florida windstorm risk miscoded as a Georgia risk changes the entire nat-cat exposure picture for a reinsurer's portfolio.

Claims/Loss Bordereau: Key Extraction Fields

Unique Claims Reference (UCR) or claim number is the primary identifier for each claim event. This reference must be extracted with zero tolerance for error, as it links every financial movement on a claim back to a single record.

Policy reference (UMR) connects the claim to the corresponding premium bordereau record. This cross-reference is what enables loss ratio calculation at the policy level and is the most common point of reconciliation failure when extraction quality is poor.

Date of loss records when the insured event occurred. This field drives treaty allocation for quota share and excess-of-loss programs, where the timing of a loss determines which reinsurance contract responds.

Date reported captures when the claim was first notified to the insurer. The gap between date of loss and date reported is the foundation of Incurred But Not Reported (IBNR) calculations. Actuarial teams rely on accurate reporting lag data to set reserves, so extraction errors here have direct financial consequences for the balance sheet.

Paid amounts, ideally split between indemnity and expenses, reflect what has actually been disbursed. Many bordereaux separate these; others combine them into a single paid column. The distinction matters because expense ratios and indemnity ratios serve different analytical purposes in reinsurance accounting.

Outstanding reserves represent the estimated future cost still expected on the claim. These figures change with every bordereau submission as claims develop, making accurate extraction across reporting periods essential for tracking reserve movements.

Incurred amount (paid plus outstanding) shows the total claim exposure at a point in time. This figure feeds directly into RETACC (Reinsurance Technical Account) statements and drives ceded loss calculations.

Claim status (open, closed, reopened) is essential for reserve management. A claim marked as closed should carry zero outstanding reserves. When extraction misreads status codes, it creates phantom reserve balances that distort portfolio-level reporting.

Claimant name and description of loss provide narrative context. While less structured than numeric fields, these text fields are necessary for claims review, large-loss reporting, and audit documentation.

Why Field-Level Accuracy Has Financial Consequences

In reinsurance accounting, a mismatched reference number can break the reconciliation between premium and claims bordereaux entirely. When IBNR calculations depend on accurate paid and outstanding amounts linked to correct policy references, extraction errors do not just create data quality issues. They have direct financial consequences for reserving, settlement, and regulatory reporting.

A single misread UMR means a claim cannot be matched to its originating policy. An incorrect outstanding reserve figure feeds through to actuarial models and ultimately to the balance sheet.

The Format Variation Problem

The same conceptual field appears under different column headers depending on which MGA, broker, or coverholder produced the bordereau. "Premium" vs. "GWP" vs. "Gross Written Premium" all mean the same thing. "Sum Insured" vs. "Total Insured Value" vs. "TSI" refer to identical data. "Claim No" vs. "UCR" vs. "Loss Reference" all point to the same unique claim identifier.

This naming inconsistency is a primary reason template-based extraction fails at scale. Building a fixed mapping for one counterparty's format does nothing when the next counterparty uses entirely different headers, column ordering, and data formatting conventions. Multiply that across dozens or hundreds of delegated authority relationships, and the maintenance burden of template approaches becomes unsustainable.


How AI Extraction Solves Bordereau Format Chaos

Template-based extraction tools assume every document follows the same layout. That assumption fails immediately with bordereaux. When MGA #1 sends a PDF with "Gross Written Premium" in column F and MGA #2 sends an Excel file with "GWP" in column D, a rigid template either breaks or requires a dedicated configuration for each counterparty. Multiply that across dozens of MGAs and brokers, and template maintenance becomes its own full-time job.

AI-powered extraction takes a fundamentally different approach: it analyzes document structure rather than memorizing fixed column positions. The extraction model reads headers, evaluates the data types in each column, and identifies contextual relationships between fields. An "Inception Date" column containing date-formatted values next to a "UMR" column containing reference strings gets recognized correctly regardless of where those columns sit in the spreadsheet or whether the document is a native PDF, a scanned printout, or an Excel export saved as an image.

Disambiguating Column Headers with Natural Language Prompts

The naming variation problem is where AI extraction earns its value. "GWP," "Gross Premium," "Premium Amount," "Written Premium," and "Original Premium" can all refer to gross written premium depending on which underwriter produced the bordereau. Similarly, "Policy Ref," "UMR," "Unique Market Reference," and "Contract ID" may all point to the same identifier.

AI extraction resolves these ambiguities by mapping headers to canonical field names based on surrounding context, the data patterns in each column, and the overall document structure. A column labeled "Rate" containing percentage values adjacent to premium and commission columns gets interpreted differently than a "Rate" column in an exchange rate table.

You control this mapping through natural language extraction prompts that define your target fields and handling rules. A realistic bordereau extraction prompt looks like this:

Extract UMR, insured name, inception date, expiry date, gross written premium, commission rate, commission amount, net premium. Format all dates as YYYY-MM-DD. If commission is shown as a percentage, calculate the commission amount from GWP. Map any column referencing "policy reference," "unique market reference," or "contract ID" to the UMR field.

That single prompt handles the header disambiguation problem across multiple bordereau formats. You save it once in a prompt library and reuse it each reporting period, refining the instructions as new edge cases surface.

Multi-Currency Normalization

International reinsurance programs generate bordereaux in multiple currencies, and the way currency information appears varies widely. Some documents include a dedicated "Currency" column with ISO codes. Others use currency-specific amount columns ("Premium (USD)," "Premium (GBP)"). Others still include separate exchange rate fields for settlement currency conversion.

AI extraction identifies these different currency conventions and normalizes them into a consistent output structure. You get standardized currency code and amount columns regardless of whether the source document embedded currency in the header, in a separate column, or in a footnote. For programs spanning USD, GBP, and EUR bordereaux, the extracted output aligns all amounts to the same column structure so downstream reconciliation doesn't require manual reformatting.

Batch Processing Across Mixed Formats

Each quarterly or monthly reporting cycle means handling a stack of documents from multiple counterparties, and those documents rarely arrive in the same format. Some MGAs submit PDF bordereau reports. Others send Excel workbooks. Occasionally, a Lloyd's coverholder sends a scanned image of a printed bordereau that was never digitized.

Rather than routing each format through a separate processing pipeline, you can upload an entire batch of mixed-format bordereau documents — PDFs, images, spreadsheets — and extract bordereau data into structured spreadsheets in a single job. Batches of up to 6,000 files process in parallel, producing unified structured output in Excel, CSV, or JSON. Every row in the output includes a source file reference, so you can trace any extracted value back to its originating document and page.

The practical workflow is three steps: upload the batch of bordereau files, apply your saved extraction prompt (or write a new one), and download the consolidated output. What previously required days of manual reformatting and copy-pasting across dozens of spreadsheets compresses into minutes of processing time. The extracted output lands in a single structured file ready for ingestion into your reinsurance accounting or bordereaux management system.

No extraction method is infallible, and reinsurance accounting demands verification. The extracted output is the starting point, not the final answer. What makes automated extraction valuable is not that it eliminates review, but that it transforms review from "rebuild this data from scratch" into "validate this structured output against known rules."


From Raw Extract to Reconciled Data

Extraction produces structured data. What you do with that data next determines whether your bordereau workflow actually scales or just moves the bottleneck downstream.

The post-extraction workflow has four stages: validation, cross-bordereau reconciliation, multi-MGA normalization, and loading into downstream systems. Each stage compounds the value of the one before it, and each becomes dramatically faster when the extraction output is consistent and machine-readable from the start.

Programmatic Validation Rules

Once bordereau data lands in a structured format, you can apply validation rules programmatically rather than relying on analysts to spot-check rows in Excel. The difference is coverage: manual review catches what the reviewer happens to look at, while automated validation checks every row against every rule.

Practical validation checks that run immediately on extracted data:

  • Net premium arithmetic — verify that net premium equals gross premium minus commission minus tax for each row, flagging any line where the calculation does not balance
  • Claim-to-premium reference matching — check that claim reference numbers correspond to entries in the premium bordereau for the same program
  • Loss date range validation — flag entries where reported loss dates fall outside the inception-to-expiry window of the associated policy period
  • Duplicate and gap detection — identify duplicate submission numbers, repeated claim references, or missing sequential reference numbers that suggest dropped records
  • Currency code verification — validate that currency codes on each entry match the expected values for the program and territory

These checks take seconds to execute against thousands of rows. When they surface exceptions, your team investigates specific flagged items rather than scanning entire spreadsheets looking for problems they may or may not find.

Cross-Bordereau Reconciliation

The most operationally significant validation happens between bordereaux, not within them. Bordereaux reconciliation across premium and claims reporting for the same program is where data quality issues surface and where manual processes consume the most analyst time.

Premium records must tie to claims records through shared policy references, typically the Unique Market Reference (UMR). The reconciliation logic is straightforward: every claim should trace back to a premium record, reported premium volumes should align with claims exposure for each program period, and net positions should be internally consistent.

Discrepancies between reported premium volumes and claims exposure indicate one of three things: data quality problems in the source bordereau, reporting gaps where submissions were missed or delayed, or legitimate timing differences that need documentation. All three require investigation, but the nature of the investigation differs.

When extraction produces consistently structured data with reliable reference fields, this reconciliation runs automatically. You define the matching keys (UMR, policy reference, program identifier), specify tolerance thresholds, and the system flags mismatches. What previously required an analyst spending days in pivot tables becomes a validation report generated in minutes.

Multi-MGA Normalization

A single carrier or reinsurer might receive bordereaux from dozens of MGAs and coverholders, each submitting in their own format with their own field naming conventions, code sets, and structural quirks. Before you can aggregate or analyze this data, it has to conform to a single schema.

Normalization means:

  • Field name mapping — translating each MGA's column headers ("GWP," "Gross Written Premium," "Premium_Gross") to your canonical field names
  • Date format standardization — converting DD/MM/YYYY, MM-DD-YYYY, YYYY.MM.DD, and other variants into a single consistent representation
  • Currency normalization — resolving "USD," "US$," "Dollars," and blank-but-implied currency fields into standard ISO 4217 codes
  • Territory and line-of-business coding — applying consistent classification codes when MGAs use different naming conventions for the same territories or business classes

Prompt-based AI extraction can handle much of this normalization at the point of extraction rather than requiring a separate post-processing layer. The extraction instructions define the target schema, and the AI maps each source document's fields into that schema during the extraction pass itself. Adding a new MGA means adjusting extraction prompts, not building another custom ETL pipeline.

Loading Into Downstream Systems

Validated, reconciled, and normalized data flows into reinsurance accounting platforms, actuarial reserving models, regulatory reporting tools, and enterprise data warehouses. The structured output formats from extraction — Excel, CSV, JSON — align directly with standard import mechanisms across these systems. When extraction normalizes everything into a consistent structure, you maintain one integration path per target system rather than one per MGA per system, eliminating the custom transformation scripts that each counterparty's unique format previously required.

This is where the ROI on bordereau extraction compounds. The fifteenth MGA you onboard costs a fraction of what the first one did, and the data arrives faster, cleaner, and in a structure your downstream systems already expect.

Continue Reading

Extract invoice data to Excel with natural language prompts

Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.

Exceptional accuracy on financial documents
1–8 seconds per page with parallel processing
50 free pages every month — no subscription
Any document layout, language, or scan quality
Native Excel types — numbers, dates, currencies
Files encrypted and auto-deleted within 24 hours