Shipping Manifest Data Extraction: Fields and Workflow

Shipping manifest data extraction turns manifest PDFs, scans, and carrier exports into structured rows. Each row carries the shipment reference (bill of lading, manifest, or voyage number), the parties (shipper or exporter, consignee, carrier), the relevant dates and times, package counts and types, gross mass and shipping marks, the goods description and any commodity codes, and the container and seal IDs. Once those fields are in rows, the data feeds three downstream uses every logistics or AP team runs: freight invoice reconciliation, customs and AP workpaper construction, and exception flagging before billing or payment.

A note on scope, because the same words point at several different things on the search results page. This article is about extracting structured data from manifests an importer, freight forwarder, 3PL, or AP team already holds. It is not about generating a carrier close-out manifest in a shipping platform, not a customs trade-data feed, and not a template for authoring a new manifest. The reader is someone with manifest documents in hand who needs the data in Excel, CSV, JSON, an ERP, or a reconciliation workbook.

The rest of the article walks the field map first — manifest fields grouped by what an AP, customs, or operations team actually does with each group — then the four downstream uses by name, then the extraction approach that runs across carriers without a per-layout template, and finally the honest scope of what the workflow does and does not replace.

The manifest fields that drive reconciliation

Vendor product pages tend to list manifest fields as a flat menu: consignee, weights, dimensions, BOL number, customs info. That listing is technically complete and operationally useless. Grouping the same fields by the work they do changes that. Each group then has an explicit downstream owner and a check the data supports, and the categories below are the ones an AP, customs, or operations team actually builds workpapers and audits around.

Shipment references and supporting document numbers. Bill of lading number (master and house where both apply), manifest number, booking reference, voyage or flight number, and any supporting customs entry references. These tie one manifest row to invoices, BOLs, packing lists, and customs entries; without them no cross-document check is possible, and a missing BOL number on a manifest is itself an exception.
Parties. Shipper or exporter, consignee, notify party, carrier (with SCAC code where present on U.S. inbound shipments), and the agent or person lodging the manifest. Parties drive vendor and consignee matching, broker assignment, and routing of the document into the correct approval queue. A manifest carrying a consignee that does not match the importer of record on the entry is a routing exception before it is anything else.
Dates and times. Departure and arrival dates, loading and unloading times, and any port-call timestamps recorded on the manifest. These feed demurrage and detention calculations, free-time tracking, and arrival-versus-billing-date checks on the freight invoice. The timeline is where most ocean freight audit recoveries live.
Packaging and weight. Package count, package type (cartons, pallets, drums, IBCs, bulk), gross mass, net mass where given, and shipping marks. This is the line a quantity-and-weight reconciliation runs against the BOL and packing list. It is also the line a weight-based freight charge gets validated against, and the line that surfaces dimensional-weight applied to non-volumetric cargo.
Goods description and codes. Cargo description in plain text, commodity codes or HS codes where present, and any hazardous material indicators. The description and codes feed customs entry preparation; they also let the team confirm the description on the commercial invoice points at the same goods the manifest names. Hazmat indicators route to the safety and stowage check before they route anywhere financial.
Transport equipment. Container numbers, seal numbers, and equipment-type indicators (20ft, 40ft, 40HC, reefer, tank). Equipment data feeds integrity checks at receipt — seal number on the manifest against the seal on the box at the gate — and anchors the shipment row in the workpaper that ties the manifest to the BOL, the commercial invoice, and the customs entry.

The U.S. regulatory text codifies most of this field set as a formal requirement on the inbound side. 19 CFR § 4.7a inward cargo manifest data requirements require an inward cargo manifest to carry specific data elements for every shipment, including the bill of lading number, shipper and consignee details, cargo description and weight, container and seal numbers, and the carrier SCAC code. The regulation is a useful anchor because the fields it names are the same fields AP and customs teams reconcile against in practice — the formal requirement and the operational requirement converge on roughly the same list.

Real manifests vary by carrier and country: a U.S. inward cargo manifest, a UK customs goods manifest, an IATA-format air waybill manifest, and a 3PL warehouse receiving manifest each carry different field subsets and different field labels. Every reconciliation-relevant data point lands in one of the six categories above, which is why the field map is more durable than any one carrier's PDF layout.

Use 1: Cross-checking the freight invoice

A freight invoice arrives carrying ocean or air freight charges, fuel surcharges, terminal handling, documentation fees, and a list of accessorials. Without a structured manifest row, each charge has to be checked against the original PDF by eye, which is why most freight audits are skipped or sampled rather than run line-by-line. With a structured manifest row joined to a structured invoice row, the audit becomes a column comparison.

The cross-checks the manifest enables are concrete, and each one corresponds to a charge category on the invoice.

The reference match is the entry point. The manifest BOL number ties the invoice to the shipment; an invoice arriving without a BOL number that resolves to a known manifest row should not pay until the carrier supplies the reference. This is the freight equivalent of an AP invoice without a PO number.

Weight-based charges are where the highest-yield audit findings live. Manifest gross mass validates weight-based freight charges and any reweigh adjustments the carrier has applied. A discrepancy between the invoiced weight and the manifest weight is the most common audit finding on ocean and air freight, and the recovery is usually straightforward to argue once both numbers sit in the same row.

The container and equipment check catches per-container and equipment-tier rate errors. A 40ft contract rate billed against a 20ft container the manifest names is a recovery; a per-container rate billed for a container that does not appear on the manifest at all is an exception that holds the invoice.

Demurrage, detention, and free-time accessorials hinge on the loading and discharge dates the manifest carries. Demurrage billed for days that fall before discharge or after the free-time window per the manifest dates flags as exception. On ocean lanes, this single check tends to produce the largest individual recoveries.

The routing check closes the audit. Origin and destination port and the carrier SCAC on the manifest validate the routing the invoice charges against. A routing mismatch usually means the carrier has applied the wrong contract rate, which is correctable but only if the manifest data is in a row to compare.

The structural point is that the manifest is the upstream document for the freight invoice in the same way the purchase order is the upstream document for an AP invoice. The audit logic is the same: extract both, join on the reference, compare the fields that drive the charge. The companion piece on freight invoice data extraction for the audit step walks the invoice side of that join in detail.

Use 2: Reconciling shipment quantities and weights

An arriving shipment carries three quantitative documents — the manifest, the BOL, and the packing list. They should agree on package count and gross mass. In practice they often don't, and the work is telling a tolerable rounding gap apart from a real discrepancy that needs a hold on goods receipt.

Package count is the first comparison. Manifest package count goes against the BOL "number of packages" field and the packing list carton or package count. Equal counts pass; unequal counts trigger an exception that has to resolve before the goods receipt posts. The most common cause is a consolidation done at the origin warehouse that the manifest captured before the packing list was reissued, but the only way to catch that is to put the three numbers in the same row.

Gross mass is the second comparison. Manifest gross mass goes against BOL gross weight and packing list total weight. Tolerance is usually applied as a percentage band — half a percent on full container loads is common, with a wider band on LCL or air freight where reweighs and consolidations are routine. Differences inside the band tend to be rounding from kilo-to-pound conversions or scaled re-weighs at intermediate hubs; differences above the band are real and route to the carrier or the shipper for resolution.

Shipping marks are the line-level check that flags whether the cartons received match what the packing list says were loaded. Manifest shipping marks compared to packing list marks per carton catch substitutions, mis-labelled cartons, and the case where a packing list has been reissued without updating the marks on the goods themselves.

Container seal integrity sits slightly outside the count-and-weight comparison but lives in the same reconciliation. Manifest seal number against the seal on the container at receipt confirms the box has not been opened in transit. A broken seal or a seal-number mismatch is an exception that routes to security and customs review before the goods can be released to the receiving floor.

The workflow shift here is the one that makes the reconciliation tolerable at volume. When manifest, BOL, and packing list all live as PDFs, the comparison runs by visually checking three documents against each other, which is slow and lossy. When all three are extracted into a single shipment-level row, the comparison becomes a column-against-column computation and the exception is computed rather than eyeballed. The companion articles on bill of lading automation for shipment data capture and packing slip data extraction to Excel for line-level checks cover the BOL and packing list sides of the same join.

Use 3: Building the customs and AP workpaper

A customs entry workpaper or an AP shipment file is a single row per shipment that pulls fields from several documents — the manifest, the commercial invoice, the BOL, the packing list, and any customs declarations. The manifest is the document that anchors the row, because it carries the shipment-level identifiers (BOL number, container number, voyage or flight number) that everything else hangs from. Pull the manifest first and the rest of the documents have an obvious place to land.

The mapping practitioners actually build looks like this:

Identifier columns — BOL number, manifest number, voyage or flight number, container and seal numbers — come straight from the manifest. These are the columns the workpaper sorts and joins on.
Party columns — shipper, consignee, carrier (with SCAC where present), notify party — come from the manifest and are validated against the BOL. The manifest is the more reliable source on the carrier and notify-party fields, which the BOL sometimes leaves implicit.
Date columns — departure, arrival, loading, discharge — come from the manifest and are used on the workpaper to compute transit time and any demurrage or detention exposure that has accrued by the time the entry is filed.
Goods columns — description, package count, gross mass, commodity code where present — come from the manifest and are validated against the commercial invoice and packing list. Disagreement on description or commodity code is a flag that a reclassification or an entry amendment may be needed.
Cross-document reference columns — invoice number from the commercial invoice, customs entry number from the broker file, PO number where the importer uses one — are joined to the manifest row by BOL number. The workpaper is finished when every reference column is populated and every disagreement has either been resolved or noted for follow-up.

The customs context is worth stating directly. Manifest data feeds customs entry preparation but is not the customs entry itself. A licensed broker or self-filing importer prepares and lodges the entry against the relevant authority's system; the manifest extraction supports the broker's working file and the importer's own file copy of the entry. Brokers reconciling their own broker invoices against the entry support similarly use the manifest as part of the document set, and the article on customs broker invoice and entry document processing covers that reconciliation in more depth.

Use 4: Flagging exceptions before billing or payment

Most of the cost of a freight or logistics dispute comes from catching the issue after the invoice has posted. Exceptions found before posting are corrected by amendment; exceptions found after posting need credit notes, accrual reversals, and sometimes write-offs that never fully recover. The manifest is one of the cheapest places in the document chain to put the gate.

The exception types that the manifest catches are well-defined enough to be encoded as rules.

A missing reference exception fires when a freight invoice arrives without a BOL number that ties to a known manifest row. The invoice cannot be matched and should not pay until the carrier supplies the reference. This sounds trivial; in practice it is the single most common reason aged-but-unmatched freight invoices accumulate.

An inconsistent weight exception fires when manifest gross mass and invoice billed weight diverge by more than the tolerance band the team has set. The same exception surfaces overweight reweighs that the carrier has applied without notification, dimensional weight that has been billed against non-volumetric cargo, and the occasional duplicate billing that has been issued on a different weight basis to obscure the duplication.

An unmatched container or equipment exception fires when the invoice charges per-container rates for containers that do not appear on the manifest, or when it charges 40ft equipment-tier rates against containers the manifest records as 20ft. Either case holds the line until the discrepancy is reconciled.

A date-window exception is the highest-value rule on most ocean lanes. Demurrage billed for days outside the manifest discharge window flags. Detention billed for days outside the equipment release window flags. These rules tend to recover more than the cumulative weight-discrepancy findings on a busy import program.

A document-set exception fires when the shipment arrives at the AP gate without a manifest at all, or with a manifest that is missing a load-bearing field — most often the consignee, occasionally the container number on a multi-container booking. The exception in that case is the missing document or field, not a charge dispute, and it routes to operations to chase the carrier or shipper before payment is released.

The structural point is that fields buried in PDFs cannot drive a rule; the same fields in a row can. Exception flagging is where the structured extraction earns its keep, because every rule above is trivial to write once the manifest data is in columns and untenable to enforce when it is not. The shipment lifecycle closes with the proof of delivery, and exceptions that only fully resolve at receipt — short shipments, damaged cartons, refused goods — reconcile against the manifest only once the POD is in the workpaper too. The article on proof of delivery data extraction for the closeout step covers that final document.

How the extraction works without per-carrier templates

Manifest layouts vary by carrier, by country, by mode (ocean, air, truck, rail), and by 3PL system. A single importer can receive U.S. inward cargo manifests, UK customs goods manifests, IATA-format air waybill manifests, and warehouse receiving manifests from a 3PL in the same week. A template-based shipping manifest parser or a layout-trained cargo manifest OCR tool needs a configured layout per source, and that configuration breaks every time a carrier reissues its PDF, a new origin enters the lane, or a 3PL changes systems. The configuration cost compounds; the maintenance cost compounds harder.

The alternative is to describe the field map by name in a natural-language prompt and let the model read each document. Instead of mapping field positions, the prompt asks for the fields directly: BOL number, voyage or flight number, shipper, consignee, carrier, package count and type, gross mass, container and seal numbers, goods description, and any commodity codes, with one row per manifest. The same prompt runs across every carrier's layout because the system reads the document, not a position map. That is what it now means to automate data capture from shipping manifests at any scale that includes more than one source.

In practice the workflow has three steps:

Batch upload. Manifest PDFs and scans go in as a batch. That can be a single multi-page PDF carrying a long manifest, a folder of mixed manifests from multiple carriers, or a packet from a 3PL that contains manifests, BOLs, packing lists, and a commercial invoice in the same envelope. Native PDFs and scanned PDFs are handled the same way.
Single prompt. One natural-language instruction names the field map and the output structure: Excel, CSV, or JSON, one row per manifest, with line-level rows where consolidated cargo needs them. The same prompt is reused on the next batch.
Structured output. Rows land directly in the format the workpaper or audit step needs. Each row references the source file and page number, which is what makes a reviewer's spot-check against the original PDF a one-click action rather than a search.

Manifest extraction usually runs alongside invoice and BOL extraction in the same workpaper build. A team handling a logistics packet — manifest, BOL, packing list, commercial invoice, customs declaration — runs the full set in one pass and joins on shipment reference rather than running each document type as a separate job. We build Invoice Data Extraction as an AI extraction tool for shipping documents that supports this workflow: the same prompt-driven interface handles batches of up to 6,000 mixed-format files in a single job, single PDFs of up to 5,000 pages, and outputs Excel, CSV, or JSON with the source-file and page reference on every row. There is no per-carrier template to configure and no layout-per-source maintenance, which is what makes running the field map across every shipment in a lane tractable instead of a sampled audit.

What this workflow does not replace

Three short statements of scope, because the SERP for this keyword tends to drift toward implications it should not.

The workflow extracts manifest data and supports reconciliation, audit, and exception flagging. It is not a substitute for licensed customs filing. A customs broker or self-filing importer prepares and lodges the entry against the relevant authority's system; the extraction supports their working file rather than performing the filing.

The extracted data supports compliance workpapers but does not constitute a compliance ruling. Tariff classification decisions, duty calculations, and the legal accuracy of an entry remain the responsibility of the broker or the importer's compliance team.

The workflow handles documents the user already holds. It does not pull manifests from carrier portals, AMS or ACE systems, or trade-data feeds. Connecting those sources is a separate integration question that lives in the carrier and broker systems, not in the extraction step.

What the workflow does contribute is a structured field map grouped by reconciliation use, four named downstream uses with concrete checks against each one, and an extraction pass that runs across carrier and country variations without a per-layout template.

Shipping Manifest Data Extraction: Fields and Workflow

The manifest fields that drive reconciliation

Use 1: Cross-checking the freight invoice

Use 2: Reconciling shipment quantities and weights

Use 3: Building the customs and AP workpaper

Use 4: Flagging exceptions before billing or payment

How the extraction works without per-carrier templates

What this workflow does not replace

Extract invoice data to Excel with natural language prompts

Proof of Delivery Data Extraction: Fields and Workflow

Fleet Card Statement to Fuel Tax Credit Spreadsheet (AU)

AtoB Fleet Card Statement to Excel