A US aggregate hauler invoice is typically a multi-page PDF. Page 1 carries the invoice header — supplier, invoice number, period, customer, totals. Pages 2 through N carry a dense table where each row represents one truck-load, and that table is where the data lives. Per-row, the columns you are trying to extract are: BOL number, scale-ticket number, date, tractor ID or truck number, origin (quarry, pit, or plant), destination (jobsite or drop site), material, tons net, rate per ton, line haul charge, and per-ticket accessorial charges. Construction AP teams extract aggregate hauler invoice to Excel for two downstream tasks the data feeds directly: reconciling field-issued scale tickets against the back-office invoice line numbers the supplier put on the bill, and allocating haulage across job numbers or cost codes before posting.
That column inventory is worth walking once before anything else, because every later step in the article assumes you know what each field carries. The BOL or scale-ticket number is the row identifier and the seed of every reconciliation against your own records. The date is the haul date — sometimes paired with a load time on the underlying ticket. The tractor ID is how the producer tracks the truck, and on some invoices the driver name appears alongside it. Origin and destination define the lane, and the lane plus the material is what determines the contracted rate. The material field carries the producer's nomenclature: limerock, crushed stone, FDOT #57, ASTM C33 sand, fill dirt, base course, riprap. Tons net is the legally weighed figure from a state-certified scale and the quantity that prices the line. Rate per ton multiplies into the line haul charge. Accessorial charges — fuel surcharge, demurrage, standby, toll pass-through, environmental fee — sit alongside the line haul, and on most aggregate invoices they appear at the per-ticket level rather than rolled up at the invoice tail.
One detail behind tons net determines what cross-references you can later run. The underlying scale ticket carries three weights: tare (empty truck), gross (loaded truck), and net (gross minus tare). The tons net figure that reaches the invoice line comes from that subtraction, performed on a state-certified scale. If your project file retains the scale tickets behind the invoice, the tare/gross/net trio is what lets you cross-reference an invoice line back to its source document and confirm the contractual quantity.
The reason any of this matters at scale: crushed stone was the leading nonfuel mineral commodity by value in the US in 2024, accounting for 24% of the total value of US mine production. Aggregate purchasing is industrial activity, not a niche workflow, and any contractor or industrial buyer at mid-market scale or above is processing these invoices on a routine cycle. The rest of this article walks the route from that PDF in your inbox to a working AP spreadsheet — covering the field-versus-back-office ticket-ID reconciliation problem, the three viable extraction routes, the job-allocation schemas that survive multi-jobsite invoices, and the freight audit the same spreadsheet already feeds — and sits alongside the broader invoice data extraction for the construction industry coverage on this blog.
Field Scale Tickets vs Back-Office Invoice Line Numbers
Field-issued scale-ticket numbers and back-office invoice line IDs frequently don't match. The invoice line in front of the AP team often carries a back-office sequence number, while the scale-house log or dispatcher log on the contractor's side cites the field ticket the driver was handed at the pit. The reconciliation looks like it should be one column compared against another, and it isn't.
The reason is mechanical. Producers commonly renumber tickets in the back office when manual ticket pads are still in use at remote pits, or when tickets are re-keyed from paper into the producer's billing system before invoicing. Some producers run a separate invoice sequence per customer or per period regardless of the field ticket. None of this is malicious — it is back-office workflow — but it produces two visible patterns on the AP side: orphans (field tickets the contractor has scale-house evidence for but cannot find a matching invoice line) and apparent duplicates (back-office sequence numbers in tight ranges that look like they should pair to specific field tickets but don't).
The spreadsheet schema has to be built for this from the start. Preserve both numbers as separate columns — call them something like field_ticket_no (or bol_no if the document carries BOL terminology) and invoice_line_no (or invoice_seq for the back-office sequence). Then add a third column for reconciliation outcome — reconciliation_status works as well as anything — that holds values like matched, orphan_field, orphan_invoice, or duplicate. The AP team works that status column ticket by ticket against the scale-house log or the dispatcher log, and the column becomes the audit trail of what was reconciled and what was held back.
There is a secondary reconciliation surface worth designing the schema for in advance. Even when ticket IDs match, the tons net on the invoice can differ from the tons recorded at the scale-house — sometimes by rounding, sometimes by a re-weigh, sometimes because the producer's scale and the contractor's scale don't agree to the pound. When the contractor has their own scale-house records to compare against, hold both quantities side by side as tons_net_invoice and tons_net_scalehouse, and add a delta column. Most reconciliations don't need this; the ones that do need it built in from the first extraction, not bolted on after the fact.
All of this feeds the formal AP control above it. 3-way matching — invoice against contract or PO, against the scale ticket — is the standard control aggregate AP runs, and the column structure described here is what feeds it. The reconciliation status is the matching evidence; the orphans and duplicates are what gets held before payment.
Three Routes from PDF to Spreadsheet, and What Each One Actually Handles
There are three viable routes from a hauler invoice PDF to a working AP spreadsheet, and the right choice depends mostly on volume and on how many producers you buy from.
Route 1: Manual copy-paste from PDF. A small contractor receiving one aggregate invoice a month from a single quarry, ten or fewer tickets per invoice, can copy-paste the table into Excel and clean it up by hand in a few minutes. The fit is real at low volume. The failure modes are equally real and arrive earlier than people expect. Copy-paste from a multi-page PDF table corrupts column alignment whenever the table breaks across pages — and on a hauler invoice the table always breaks across pages. Accessorial line items, which often appear as sub-rows or as separate columns the eye expects to be there, get dropped. By the time the invoice carries 30 or more tickets, or batches start arriving weekly, the manual route is consuming an AP clerk for an afternoon to produce a spreadsheet someone else will have to spot-check anyway.
Route 2: Generic OCR or template-based extraction tool. A contractor who buys 90% of their aggregate from one producer with a stable invoice template can build the template once, point new invoices at it, and accept the output for months. Inside that fit, route 2 is genuinely serviceable. Outside it, the route breaks supplier by supplier. The moment a contractor receives invoices from multiple producers — different table layouts, different column orderings, some with a combined BOL/ticket column and some with the two split, some with the accessorial column on the right and some at the line tail — the template breaks per supplier and someone has to either rebuild or accept fallout. Maintaining ten templates for ten producers is a maintenance liability, not a workflow, and the cost of that maintenance compounds as the producer rotates their invoice template (which they do, occasionally, without warning).
Route 3: Prompt-driven AI extraction. A mid-market GC, civil contractor, or industrial buyer routinely receiving aggregate invoices from Vulcan, Martin Marietta, CRH, a regional limerock producer, and a couple of independent haulers can describe the per-ticket schema once in a natural-language prompt and apply that prompt across the whole intake — every supplier, every layout, one consolidated per-ticket spreadsheet. This is where AI-powered aggregate hauler invoice extraction lands as the concrete answer to route 2's per-supplier brittleness rather than as a route-1 replacement: the prompt describes the columns the schema needs to hold (BOL, scale ticket, field ticket vs invoice line, date, tractor, origin, destination, material, tons net, rate per ton, line haul, the accessorial columns), and the same prompt converts aggregate hauler invoice PDF to Excel whether the file is a 12-ticket sand and gravel invoice to spreadsheet from a regional yard or a 400-line monthly statement from a national producer.
The product underneath that mention is purpose-built for this shape. The extraction interface is a single prompt field with a file upload area, with no templates to configure and no per-supplier setup; the same prompt runs against a folder of mixed-supplier invoices, and the output is a structured Excel, CSV, or JSON file. Batch capacity is up to 6,000 files per session with single PDFs up to 5,000 pages, which covers the routine multi-supplier intake described above without anyone having to think about volume. Every row in the output carries a reference back to the source file and page number, so the reconciliation work in the prior section — orphans, duplicates, tons-net deltas — can be cross-checked against the original invoice without leaving the spreadsheet.
The same multi-supplier-into-one-schema pattern shows up elsewhere in the contractor's AP stack — most directly in heavy equipment rental invoice extraction to Excel, where the column shape differs but the prompt-driven approach to cross-supplier consolidation is identical.
Allocating Haul Charges Across Jobs and Cost Codes
A single hauler invoice routinely contains deliveries across multiple jobsites. The destination column on the invoice is the natural allocation key — the producer knows where they delivered to — but the contractor's accounting system posts by job number and cost code, not by destination name. So the aggregate hauler invoice job cost allocation step lives in the translation between the two, and there are two schemas that survive the work.
Schema A: destination-to-job-number lookup. When the contractor maintains a stable map from jobsite name (or jobsite address) to job number, the spreadsheet adds a lookup column. The destination text from the invoice feeds the lookup; the lookup returns the job number; the job number feeds a cost-code column. The mapping table sits beside the extraction sheet or in a maintained reference workbook, and once it covers the contractor's active jobs the allocation is deterministic. Worked single-jobsite example: a sand-and-gravel invoice with 28 tickets, all destined for "I-95 Widening Phase 3", maps to job number 2026-117. Every line allocates to the same job and the same Division 32 base-course phase code, and the AP clerk's review collapses to a sanity check on totals.
Schema B: manual job-code column AP fills before posting. When destination names on the invoice are unreliable — drivers free-type destination text on field tickets, the same jobsite shows up under three different names across a month, the producer aliases destinations to a regional dispatch point rather than the actual drop — the lookup approach breaks. The schema then carries a job_number and a cost_code column that AP fills ticket by ticket, with the dispatcher's daily run sheets as the source of truth for which truck went to which job on which date. Worked multi-jobsite example: a 42-ticket invoice spans four active jobs, and six of those tickets carry a destination text the contractor doesn't recognize until the dispatcher's log shows the truck was rerouted mid-shift to a job the producer didn't know about. Those six tickets get reassigned manually before the spreadsheet feeds the GL.
The cost-code home for aggregate haulage sits where most contractors expect it. CSI Division 31 Earthwork covers site-prep aggregate, fill, and structural fill. CSI Division 32 Exterior Improvements covers paving aggregate, sub-base, and base course. Most contractors maintain a more granular phase code beneath the division — sub-base, base, ABC for aggregate base course, fines for cushion or topping — and the spreadsheet usually holds whichever level of granularity the chart of accounts requires for that job.
The same allocation logic applies elsewhere on a contractor's AP cycle. Splitting one rental invoice across multiple construction jobs uses the same destination-to-job lookup and manual-job-code patterns described here, and the same accounting cycle often runs both allocations through the same spreadsheet logic in the same week.
Feeding the Freight Audit from the Same Spreadsheet
The per-ticket spreadsheet built for reconciliation and job allocation is already the feed for the freight audit. No additional extraction work is needed; three audit lanes draw from the same data the AP team has on screen, and a freight audit aggregate hauler invoice routine on a contractor of any meaningful size runs all three on the same review cycle.
Lane 1: rate per ton against contracted rate. The contractor's master agreement with each producer specifies rate per ton by material type and by lane — the origin-destination pair priced by mileage and corridor. The audit check compares the rate_per_ton value on every invoice line against the contracted rate for that material and that lane, and the comparison is a join against the rate schedule on the back end of the spreadsheet or workbook. Variances flag exceptions. Sometimes a fuel surcharge has been built into the line rate instead of broken out as accessorial, which inflates the rate per ton against the contracted figure and obscures the real surcharge. Sometimes the wrong lane rate has been applied — a hauler quoted the short-haul rate for a job that's actually on the long-haul corridor. Sometimes a contracted rate escalation hasn't been honored. A rate_variance column, or a simple boolean flag, is what the controller scans first.
Lane 2: tons net against internal scale-house records. Contractors who run their own scale at the jobsite, or who retain the producer's scale tickets in the project file, can compare the tons_net on the invoice against scale-house records ticket by ticket. A small variance is sometimes legitimate — the producer's certified scale is the contractual figure and a few hundred pounds of difference shows up between any two scales — but a pattern of variances on one producer's tickets signals either a measurement issue at the pit scale or a misallocation where tickets from one job are being billed to another.
Lane 3: trucks dispatched against internal dispatcher logs. The tractor_id column on the invoice can be checked against the contractor's dispatcher log of which trucks were dispatched to which jobsites on which dates. Ticket IDs that don't show up in the dispatcher log are the visible flags: maybe a truck arrived without authorization, maybe a ticket was misallocated to the wrong job, maybe the invoice carries a phantom delivery that needs to be challenged.
Dedicated freight-cost-reconciliation modules exist in the construction ERP world — Trimble Spectrum and Viewpoint Vista both ship one as part of their core financials — and their existence is evidence the workflow is industry-standard rather than a niche curiosity. For contractors who are on those platforms the audit is automated against the same data set. For contractors who are not on those platforms, the spreadsheet route described here covers most of the same lanes without the platform.
Accessorial Columns the Schema Must Hold Separately
Accessorials each get their own column. Rolling them into the line haul total destroys the audit lane and obscures patterns that matter to the contractor's bottom line — and on most aggregate hauler invoices the accessorials are reported per ticket, which means each one is already line-resolved on the source document.
Fuel surcharge. Computed per ticket from a published index, most commonly the US DOE diesel fuel price index, applied on a fuel-surcharge schedule embedded in the master agreement. The schedule sets a base diesel price and a step function above and below it. Keep fuel surcharge as its own column so the contractor can validate the surcharge calculation against the DOE index for the haul period and against the contracted schedule. When the surcharge is folded into the line rate, that validation is gone.
Demurrage and wait time. Charged when a truck is held at the jobsite beyond a contracted free time, typically 30 to 60 minutes per drop depending on the contract. Each demurrage charge belongs in its own column entry because each one is potentially disputable on its own facts — the jobsite was congested on Tuesday because a concrete pour ran long; the gate keeper was new and slow on Friday; the producer's driver waited because the project's flagger sent them to the wrong stockpile. A contractor with congested jobsite traffic may decide to absorb some demurrage, but the decision should be made on a pattern review against the dispatcher log, not by accepting the lump sum at the invoice tail.
Standby. Distinct from demurrage in some contracts: standby is scheduled idle time billed when a truck is held for a planned but delayed pour or placement, often pre-negotiated by the hour. It belongs in its own column so it can be allocated against the project phase that caused the delay rather than smeared across the whole job's haulage cost.
Toll pass-through. Tolls incurred on the haul route, passed through at cost. The column lets AP verify pass-through totals against the producer's toll receipts when patterns look off — and "look off" usually means a sudden jump on a corridor that didn't change.
Environmental fee. A flat per-load or per-ticket fee that varies by producer and sometimes by material type. Some contractors challenge environmental fees on the basis of contract language; many accept them; either decision needs visibility, and visibility is what the dedicated column provides.
The operating principle is one line: any per-ticket charge whose calculation logic differs from the base line haul gets its own column.
A Material Column for Spec Verification
The scale ticket and the invoice line name the material delivered using producer-specific or industry-standard nomenclature — FDOT #57 stone, ASTM C33 sand, limerock base LBR 100, riprap, crushed concrete fines, base course, sub-base, road base. When the project specification requires a specific gradation or classification, the AP clerk or the cost engineer reads the material column on the spreadsheet to verify spec compliance before payment goes out. That single column is the difference between a paid invoice that holds up under a later quality audit and one that doesn't.
The schema rule is to keep the material column as free text and not normalize it. Normalizing destroys evidence. The producer's exact wording is the contractual claim about what was delivered — that nomenclature, that gradation designation, weighed on a state-certified scale, recorded on a ticket the contractor's project file retains. Forcing the value into a controlled vocabulary on extraction would erase the source-of-truth quality of the original text. Whatever cleanup the contractor wants to do downstream is fine; the extracted column itself preserves the producer's words.
The verification chain is short and worth walking. The project specification — drawn from the contract documents, the engineer's drawings, or the state DOT standard for the work type — names the required material. The scale ticket's material column carries the producer's claim of what was delivered against that requirement. The project file retains the ticket as the per-delivery evidence the contractor relies on for later audit, retesting, or dispute. The AP team's column extraction preserves that evidence in spreadsheet form for the cost engineer's review, and mismatches surface immediately. A project spec that requires FDOT #57 coarse aggregate is not satisfied by a ticket that says #67 stone, and the spreadsheet review catches that before the invoice is approved.
The two adjacent pieces of evidence the cost engineer relies on sit on the same ticket. The producer's certified scale weight is the legally decisive tons net figure. The named material classification is the contractual claim about what those tons were. Together they form the per-delivery proof of compliance the project file is keeping in case anyone asks later — and they reach the cost engineer through the AP team's spreadsheet, not a separate retrieval workflow.
The same free-text-material-column logic generalizes across the building-materials supplier-invoice family. US lumberyard supplier invoice line-item extraction carries the same shape for dimensional lumber and engineered wood products, where the product description column holds species, grade, and treatment information that should not be normalized away on extraction either.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
Ready-Mix Concrete Delivery Tickets to Excel
Extract ready-mix invoices and delivery tickets into Excel for yardage reconciliation, fee audit, and construction job-cost allocation.
US Lumberyard Supplier Invoice to Excel for Contractors
Extract line items from Builders FirstSource, 84 Lumber, US LBM, and other US lumberyards into Excel with UOM, sales tax, and job-cost coding handled.
myABCsupply Statement and Invoices to Excel for Roofers
Turn myABCsupply statements and invoices into Excel rows for roofing job costing, QuickBooks, and CRM review.