Bordereau Data Extraction: Automating Insurance BDX Processing

A bordereau (plural: bordereaux, abbreviated BDX) is a periodic report sent by an MGA, coverholder, or ceding company to a reinsurer or carrier, detailing policies written or claims incurred during a reporting period. The four main types are premium bordereaux, claims/loss bordereaux, combined bordereaux, and reinsurance technical accounts. Automating bordereau data extraction eliminates the manual cleansing required when each counterparty sends data in different formats, column layouts, and code sets — a problem that scales directly with the number of delegated authority relationships you manage.

Each type carries distinct data elements that downstream systems depend on:

Premium bordereau: Lists policies bound during the reporting period. Key fields include policy number, insured name, inception and expiry dates, gross written premium (GWP), broker and MGA commission, applicable taxes, and net premium due to the carrier or reinsurer.
Claims/loss bordereau: Details claims reported or developed since the last submission. You'll find claim reference numbers, date of loss, paid amounts (indemnity and expense), outstanding reserves, claim status codes, and insured or claimant identifiers.
Combined/aggregate bordereau: Reconciles premium and claims data into a single document for a given program or treaty, giving both sides a unified view of how a book is performing.
Reinsurance technical account (statement of account): The formal settlement document. It nets premiums against losses, IBNR reserves, ceding commissions, brokerage, management expenses, and reinstatement premiums to arrive at a balance due between cedant and reinsurer.

The volume of these documents is substantial and growing. Delegated authority arrangements — where MGAs and coverholders bind risk on behalf of carriers — account for approximately 45% of Lloyd's market premium income, according to Lloyd's Delegated Authority resources. More delegated programs mean more counterparties, more reporting periods, and more bordereaux flowing into operations teams every month.

At scale, the bottleneck is turning each counterparty's bordereau layout, headers, and codes into a canonical format your systems can consume.

Why Bordereau Processing Breaks Down at Scale

The core problem is format inconsistency. Every MGA, coverholder, and broker submits bordereaux in a different layout. Column headers vary, date formats shift between DD/MM/YYYY and MM/DD/YYYY, currency fields use different conventions, and internal code sets are bespoke to each program.

Documents arrive as PDFs, multi-tab Excel workbooks, flat CSVs, scanned reports, and password-protected spreadsheets. Formats also change mid-program without warning: a column gets renamed, a new sub-tab appears, or a numeric field starts carrying text qualifiers.

There is no universal bordereau format standard. Program guidelines from carriers and Lloyd's managing agents often specify what data to report — required fields, validation rules, acceptable code values — but rarely prescribe how the document should be formatted. RETACC, a UN/EDIFACT message type for reinsurance technical accounts, never achieved broad adoption, so bordereaux lack the shared structure that other financial domains take for granted.

This gap lands on operations teams. Staff manually cleanse, validate, and remap dozens of layouts into internal bordereaux management systems or data warehouses. At 50 or 100+ delegated authority relationships, analysts can spend more time reformatting files than analyzing exposure, premium adequacy, or loss trends. Organizations that have explored automating financial document processing workflows in other areas often find bordereaux to be the last and hardest category to tackle because of this format diversity.

The delay compounds the damage. Manual processing introduces days or weeks of lag between when a coverholder binds a risk and when the carrier sees usable data. Pricing decisions, reserve estimates, and regulatory reporting can all end up relying on exposure information that is already stale.

This is not a niche complaint. Deloitte's ceded reinsurance survey found that 71% of ceded reinsurance professionals said late data leads to manual activity, workarounds, and inaccuracies in their reinsurance operations. The downstream effects touch every function that depends on bordereau data being timely and accurate.

Then there is error propagation. Manual rekeying between spreadsheets creates entry points for mistakes that travel silently through the data pipeline: a wrong class-of-business code, a transposed reserve figure, or a currency mismatch between the bordereau and treaty terms. These errors often surface only during reconciliation or audit, when tracing the root cause through months of manually processed files becomes its own project.

Data Fields That Matter: Premium vs Claims Bordereaux

A useful extraction workflow starts with the fields that drive reinsurance accounting, reserving, and settlement. Premium and claims bordereaux carry different structures, so the extraction target has to reflect the document type.

Premium Bordereau: Key Extraction Fields

Unique Market Reference (UMR) or policy number acts as the primary key linking each row to the underlying risk.

Insured name and risk description identify who and what is covered. These fields are often free-text, making them harder to normalize but essential for exposure analysis and audit trails.

Inception and expiry dates determine the policy period and drive earned premium calculations.

Gross written premium (GWP) represents the total premium before deductions.

Commission rate and amount captures the intermediary's compensation. Accurate extraction here is essential for reconciliation against commission statements. Teams already extracting data from insurance commission statements will recognize that matching commission figures across documents is one of the most error-prone manual tasks in delegated authority accounting.

Brokerage is the broker's fee, often carried as a separate line item from commission. Some bordereaux combine the two; others split them across multiple columns. Your extraction logic needs to handle both patterns.

Tax amounts vary by jurisdiction and should be extracted as discrete fields rather than lumped into a single deduction.

Net premium due is the settlement amount after deducting commission, brokerage, and tax from gross premium.

Currency and exchange rate are critical for multi-currency programs where premiums arrive in USD, GBP, EUR, and other currencies within the same bordereau.

Risk location or territory codes feed directly into exposure aggregation and catastrophe modeling. A Florida windstorm risk miscoded as a Georgia risk changes the entire nat-cat exposure picture for a reinsurer's portfolio.

Claims/Loss Bordereau: Key Extraction Fields

Unique Claims Reference (UCR) or claim number is the primary identifier for each claim event.

Policy reference (UMR) connects the claim to the corresponding premium bordereau record. This cross-reference is what enables loss ratio calculation at the policy level and is the most common point of reconciliation failure when extraction quality is poor.

Date of loss records when the insured event occurred. This field drives treaty allocation for quota share and excess-of-loss programs, where the timing of a loss determines which reinsurance contract responds.

Date reported captures when the claim was first notified. The gap between date of loss and date reported feeds Incurred But Not Reported (IBNR) calculations.

Paid amounts, ideally split between indemnity and expenses, reflect what has actually been disbursed.

Outstanding reserves represent the estimated future cost still expected on the claim and change as claims develop.

Incurred amount (paid plus outstanding) shows the total claim exposure at a point in time.

Claim status (open, closed, reopened) is essential for reserve management. A claim marked as closed should carry zero outstanding reserves. When extraction misreads status codes, it creates phantom reserve balances that distort portfolio-level reporting.

Claimant name and description of loss provide narrative context. While less structured than numeric fields, these text fields are necessary for claims review, large-loss reporting, and audit documentation.

Why Field-Level Accuracy Has Financial Consequences

In reinsurance accounting, mismatched reference numbers can break reconciliation between premium and claims bordereaux entirely. IBNR calculations, settlement, and regulatory reporting all depend on accurate paid and outstanding amounts linked to correct policy references.

How AI Extraction Solves Bordereau Format Chaos

Template-based extraction tools assume every document follows the same layout. Bordereaux break that assumption: when one MGA sends "Gross Written Premium" in column F and another sends "GWP" in column D, a rigid template either fails or needs a dedicated configuration for each counterparty.

AI-powered extraction takes a fundamentally different approach: it analyzes document structure rather than memorizing fixed column positions. The extraction model reads headers, evaluates the data types in each column, and identifies contextual relationships between fields. An "Inception Date" column containing date-formatted values next to a "UMR" column containing reference strings gets recognized correctly regardless of where those columns sit in the spreadsheet or whether the document is a native PDF, a scanned printout, or an Excel export saved as an image.

Disambiguating Column Headers with Natural Language Prompts

The naming variation problem is where AI extraction earns its value. "Premium," "GWP," and "Gross Written Premium" can refer to the same amount; "Sum Insured," "Total Insured Value," and "TSI" can describe the same exposure field; "Claim No," "UCR," and "Loss Reference" can identify the same claim. AI extraction maps those headers to canonical field names based on surrounding context, the data patterns in each column, and the overall document structure. A column labeled "Rate" next to premium and commission columns gets interpreted differently from a "Rate" column in an exchange rate table.

You control this mapping through natural language extraction prompts that define your target fields and handling rules. A realistic bordereau extraction prompt looks like this:

Extract UMR, insured name, inception date, expiry date, gross written premium, commission rate, commission amount, net premium. Format all dates as YYYY-MM-DD. If commission is shown as a percentage, calculate the commission amount from GWP. Map any column referencing "policy reference," "unique market reference," or "contract ID" to the UMR field.

That single prompt handles the header disambiguation problem across multiple bordereau formats. You save it once in a prompt library and reuse it each reporting period, refining the instructions as new edge cases surface.

Multi-Currency Normalization

International reinsurance programs generate bordereaux in multiple currencies, and the way currency information appears varies widely. Some documents include a dedicated "Currency" column with ISO codes. Others use currency-specific amount columns ("Premium (USD)," "Premium (GBP)"). Others still include separate exchange rate fields for settlement currency conversion.

AI extraction identifies these different currency conventions and normalizes them into a consistent output structure. You get standardized currency code and amount columns regardless of whether the source document embedded currency in the header, in a separate column, or in a footnote. For programs spanning USD, GBP, and EUR bordereaux, the extracted output aligns all amounts to the same column structure so downstream reconciliation doesn't require manual reformatting.

Batch Processing Across Mixed Formats

Each quarterly or monthly reporting cycle means handling a stack of documents from multiple counterparties, and those documents rarely arrive in the same format. Some MGAs submit PDF bordereau reports. Others send Excel workbooks. Occasionally, a Lloyd's coverholder sends a scanned image of a printed bordereau that was never digitized.

Rather than routing each format through a separate processing pipeline, you can upload an entire batch of mixed-format bordereau documents — PDFs, images, spreadsheets — and extract bordereau data into structured spreadsheets in a single job. Batches of up to 6,000 files process in parallel, producing unified structured output in Excel, CSV, or JSON. Every row in the output includes a source file reference, so you can trace any extracted value back to its originating document and page.

The practical workflow is three steps: upload the batch of bordereau files, apply your saved extraction prompt (or write a new one), and download the consolidated output. What previously required days of manual reformatting and copy-pasting across dozens of spreadsheets compresses into minutes of processing time. The extracted output lands in a single structured file ready for ingestion into your reinsurance accounting or bordereaux management system.

No extraction method is infallible, and reinsurance accounting demands verification. The extracted output is the starting point, not the final answer. What makes automated extraction valuable is not that it eliminates review, but that it transforms review from "rebuild this data from scratch" into "validate this structured output against known rules."

From Raw Extract to Reconciled Data

Extraction produces structured data. What you do with that data next determines whether your bordereau workflow actually scales or just moves the bottleneck downstream.

The post-extraction workflow has four stages: validation, cross-bordereau reconciliation, multi-MGA normalization, and loading into downstream systems. Consistent, machine-readable output makes each stage easier to run and audit.

Programmatic Validation Rules

Once bordereau data lands in a structured format, validation rules can check every row instead of relying on analysts to spot-check spreadsheets.

Practical validation checks that run immediately on extracted data:

Net premium arithmetic — verify that net premium equals gross premium minus commission minus tax for each row, flagging any line where the calculation does not balance
Claim-to-premium reference matching — check that claim reference numbers correspond to entries in the premium bordereau for the same program
Loss date range validation — flag entries where reported loss dates fall outside the inception-to-expiry window of the associated policy period
Duplicate and gap detection — identify duplicate submission numbers, repeated claim references, or missing sequential reference numbers that suggest dropped records
Currency code verification — validate that currency codes on each entry match the expected values for the program and territory

When these checks surface exceptions, your team investigates specific flagged items rather than scanning entire spreadsheets for possible problems.

Cross-Bordereau Reconciliation

The most operationally significant validation happens between bordereaux, not within them. Premium records must tie to claims records through shared policy references, typically the Unique Market Reference (UMR). Every claim should trace back to a premium record, reported premium volumes should align with claims exposure for each program period, and net positions should be internally consistent.

Discrepancies between reported premium volumes and claims exposure indicate one of three things: data quality problems in the source bordereau, reporting gaps where submissions were missed or delayed, or legitimate timing differences that need documentation. All three require investigation, but the nature of the investigation differs.

When extraction produces reliable reference fields, this reconciliation can run automatically. You define matching keys and tolerance thresholds, then review a validation report instead of spending days in pivot tables.

Multi-MGA Normalization

A single carrier or reinsurer might receive bordereaux from dozens of MGAs and coverholders, each with its own field names, code sets, and structural quirks. Before the data can be aggregated or analyzed, it has to conform to a single schema.

Normalization means:

Field name mapping — translating each MGA's column headers ("GWP," "Gross Written Premium," "Premium_Gross") to your canonical field names
Date format standardization — converting DD/MM/YYYY, MM-DD-YYYY, YYYY.MM.DD, and other variants into a single consistent representation
Currency normalization — resolving "USD," "US$," "Dollars," and blank-but-implied currency fields into standard ISO 4217 codes
Territory and line-of-business coding — applying consistent classification codes when MGAs use different naming conventions for the same territories or business classes

Prompt-based AI extraction can handle much of this normalization during extraction. The instructions define the target schema, and the AI maps each source document's fields into that schema. Adding a new MGA means adjusting extraction prompts, not building another custom ETL pipeline.

Loading Into Downstream Systems

Validated, reconciled, and normalized data flows into reinsurance accounting platforms, actuarial reserving models, regulatory reporting tools, and enterprise data warehouses. Structured outputs — Excel, CSV, JSON — align with standard import mechanisms. When extraction normalizes everything into a consistent structure, you maintain one integration path per target system rather than one per MGA per system.

The practical return is lower marginal onboarding effort: once the target schema and validation rules are in place, each additional MGA requires fewer custom transformations.