An Applied Behavior Analysis (ABA) therapy invoice — the monthly billing document a practice sends for a child's treatment — carries a shape that generic invoice extraction tends to collapse. It lists CPT codes 97151, 97152, 97153, 97155, and 97156 billed in 15-minute units, with a modifier stack (HM, HN, HO, HA, and state-specific U-codes such as U1 through U5), a rendering-provider credential (RBT, BCaBA, or BCBA), a place-of-service code (11 for office, 12 for home, 03 for school), and an authorization number with approved and used unit totals attached.
ABA therapy invoice extraction to Excel is worth doing well because that structure is the whole point: the authorization number is the spine, and every session line inherits client and authorization context that a flat OCR pass discards. Structured extraction captures the hierarchy by repeating header-level client and authorization fields onto each session row, producing a workbook that supports monthly bookkeeping, authorization-unit tracking, and payer reconciliation.
This article is for the bookkeeper. Not the clinician deciding which code applies to a session, not the practice owner evaluating practice-management software, not the billing department composing the invoice to send. The reader here has already received an ABA therapy invoice as a PDF, or a CentralReach-style export as a CSV, and needs it as analysis-ready rows — the input is a document, the output is a spreadsheet, and the work in between is decomposing the hierarchy without losing the authorization-spine context that makes every downstream job possible.
Why ABA Invoices Carry an Authorization-Spine Structure
An ABA therapy invoice is not organized like a generic supplier invoice. It is organized around a client-and-authorization header, with every line item — each individual session, or each 15-minute unit within a session — inheriting the header's client identifier, payer, authorization number, authorization date window, and approved-unit context. The authorization number is the spine of the record. Strip it off the line items and nothing downstream works: the units on each row have no approved pool to reconcile against, the payer has no care plan to tie back to, and the monthly units-used total becomes an untraceable sum.
A generic supplier invoice — a materials bill from an office vendor, a utility charge, a consulting invoice — typically has one payer, one provider, one date range, and flat line items that all share the same header context implicitly. An ABA invoice sits closer to a nested record than a flat ledger, which puts it in different territory from the general healthcare accounts payable automation context a bookkeeper handles for hospital or clinic supplier bills. Many mid-sized practices invoice several children in a single document, each child under their own authorization, each authorization carrying its own approved-unit pool per CPT code, and each session line carrying its own rendering provider, modifier stack, and place-of-service code. What looks like one invoice is really a collection of authorization-scoped billing blocks stitched together, and the bookkeeper's reconciliation treats each block as a distinct ledger.
When the hierarchy gets flattened incorrectly, specific things break. Session rows detached from their authorization number cannot be reconciled against approved units — the pivot that should roll units by authorization simply cannot group them. Sessions stripped of place-of-service cannot be separated into in-home billing (POS 12), office-based billing (POS 11), or school-based billing (POS 03), which matters because payers often rate each setting differently and school-based sessions may fall under separate Medicaid or educational-agency rules. Credential markers lost at the line level — when the extraction captures "RBT" once in a header block but not on each session row — make it impossible to separate technician-delivered hours from BCBA-delivered supervision, which is the filter behind payer-mix analysis and supervision-ratio reporting. Each of those downstream jobs depends on the header context reaching every line.
The ABA Extraction Schema, Field by Field
The schema below names what to pull off an ABA therapy invoice and why each field matters downstream. Treat it as a reference — the complete list a bookkeeper points to when configuring the extraction, briefing whoever does the extraction, or auditing rows the extraction produced.
Client and header identifiers. Capture the client name or client code (many practices replace the legal name with a client code on external-facing invoices for HIPAA reasons), the date range the invoice covers, the practice name, the practice tax ID, the practice NPI (this is the Type 2 organizational NPI, not an individual clinician's Type 1), and the practice taxonomy code. The tax ID anchors payer payments to the right vendor in QuickBooks or Xero. The organizational NPI confirms the billing entity on payer remittance advice, which is how the reconciliation from payer payment back to invoice actually closes.
Date and time of service. Capture date of service (DOS), session start time, session end time, and the derived session duration. Session duration is not cosmetic: it reconciles to the billed 15-minute unit count at four units per hour, and that arithmetic is the first verification anchor the bookkeeper uses to catch extraction errors or billing discrepancies before they reach the GL.
CPT codes. Five codes dominate ABA therapy billing: 97151 (behavior identification assessment), 97152 (behavior identification supporting assessment), 97153 (adaptive behavior treatment by protocol, delivered by a technician), 97155 (adaptive behavior treatment with protocol modification by a qualified health care professional), and 97156 (family adaptive behavior treatment guidance). Capture the code on every session row and use it as the column header label the bookkeeper's reports group on. The article does not teach these codes — the ABA Coding Coalition and commercial payer guides own that territory — it names them so the bookkeeper's schema makes sense and so the pair 97153 and 97155 (technician-delivered treatment and QHP-delivered protocol modification) can be filtered cleanly, since those two codes carry most of the billed-unit volume in any given month and are the pair most bookkeepers report on first.
Modifier stack. The modifier column carries HM (a technician below bachelor's degree), HN (bachelor's degree), HO (master's degree), HA (child and adolescent program), and the state-specific U-code layer — U1 through U5, where each state Medicaid program defines what those codes mean locally. Modifiers change the billed rate, sometimes materially, so they belong in their own column rather than concatenated onto the CPT field. When the modifier reaches the spreadsheet as discrete data, the payer rate table joins cleanly and rate variance surfaces as a filterable column rather than buried in a combined string.
Fifteen-minute billing unit. The fundamental unit for most ABA CPT codes is 15 minutes, and the bookkeeper's session-duration-to-billed-units math rests on that fact. Some payers apply an 8-minute threshold rule: a partial unit rounds up to a full 15-minute unit when at least 8 minutes of the quarter-hour elapsed, and rounds down otherwise. Capture both the billed unit count and the session duration so the bookkeeper can reproduce the rounding decision when the payer's authorization-usage report and the practice's invoice disagree.
Rendering-provider identity. Capture the rendering provider's name, individual NPI (Type 1), taxonomy code, and credential abbreviation on each session row. The credential column does real work: filtering RBT hours against BCaBA and BCBA hours is how a bookkeeper produces the supervision-ratio report and the payer-mix-by-credential report each month. That filter only works if the credential lives at the line level, since different sessions on the same invoice can carry different rendering providers at different credential tiers.
The credential column earns its place in part because the credential hierarchy is operational, not decorative. Per the BACB's Registered Behavior Technician credential definition, Registered Behavior Technicians practice under the direction and close supervision of an RBT Supervisor or RBT Requirements Coordinator who is responsible for their work. That supervision relationship is what the supervision-ratio report measures — BCBA-delivered hours as a proportion of RBT-delivered hours, rolled by client and by month — and the report is impossible to produce if the credential marker is stripped at extraction time. Normalizing credential abbreviations across a practice's clinicians matters for the same reason; inconsistent labels (RBT vs BT vs RBT-Trainee for the same tier) make the downstream filter unreliable.
Place-of-service code. POS values of 11 (office), 12 (home), and 03 (school) dominate ABA billing and often differ across sessions within the same authorization. Rate differentials tied to POS are common — many commercial payers pay in-home at one rate and office-based at another — and school-based sessions may fall under separate Medicaid or educational-agency rules that flow through a different payer code entirely. The POS column is how the bookkeeper slices the month's billing into setting-specific buckets for the practice owner's review.
Authorization and payer identifiers. Capture the payer name, member ID, group number, authorization number, authorization start date, authorization end date, units approved per code per authorization period (or per month, depending on how the authorization is written), and the running units-used total as reported on the invoice. The authorization number ties every session row back to the payer's approved care plan; the approved-units and used-units pair is the reconciliation target the bookkeeper produces month over month, flagging authorizations that are drawing down faster than the approved pool can sustain and authorizations that have already overbilled.
Monetary fields. Capture rate per unit, number of units billed, line total, session total (where sessions span multiple billing lines), and the invoice total. Two verification anchors fall out of this naturally: rate times units should equal the line total on every row, and the sum of line totals should equal the invoice total. When either check fails after extraction, the row that breaks the reconciliation is the row to audit first — a far more efficient first pass than scanning the whole invoice for transcription errors.
Inheriting Header Context Onto Each Session Row: The Extraction Prompt Pattern
The extraction shape that serves the bookkeeper's workbook is one row per session — or one row per 15-minute-unit billing line, when reconciliation needs that granularity — with header-level fields repeated on every row. Client identifier, payer, authorization number, authorization date window, and practice identifiers all duplicate across every line belonging to that authorization block, so each row carries its own full context. The row is the unit of reconciliation: a pivot grouped by authorization number and CPT code needs the authorization number present on every line it is counting, not only in a header block above the table, because the pivot does not read the header.
That shape is instruction-led rather than template-led. An extraction prompt for an ABA invoice names the header-level fields to capture once per client-and-authorization block (client code, payer name, member ID, authorization number, authorization start and end dates, units approved per code), the line-level fields to capture per session (date of service, session start and end times, CPT code, modifier stack, rendering provider name and credential, place-of-service, units billed, rate per unit, line total), and the inheritance rule that repeats the header block onto every session row it parents. The prompt adds column-ordering directives (for example, authorization fields first, session detail next, monetary fields last) and formatting rules (authorization start and end dates as YYYY-MM-DD, units as integers, rate and line total as currency-typed columns with two decimals). A prompt written this way produces the same workbook shape across invoices from different practices, because the shape is specified rather than inferred.
The contrast with generic OCR or a template-based extractor is sharp. A template extractor trained on invoice shape captures line items and a header total, but collapses the header-per-authorization-block structure because the template assumes one header per document. Generic OCR produces a flat row list with the CPT codes and units intact but the authorization-spine relationship erased. Either output still has to be reconstituted by hand before the bookkeeper can reconcile, which is where the multi-hour monthly job comes from in the first place.
A multi-client invoice is the case that makes or breaks the prompt pattern. When one document carries sessions for several children under several authorizations, the inheritance rule extends: each child-and-authorization block becomes its own header, and each child's session rows inherit their own authorization context. In the output, two rows for two different children on the same day under different authorizations carry different authorization numbers and different client codes, and the pivot that rolls units by authorization isolates them correctly.
This is the extraction shape AI-powered invoice data extraction is designed around — a single prompt field plus a file upload area, using the same interaction pattern a bookkeeper already knows from ChatGPT or Claude, with no templates to configure or rules engine to wire up. The prompt itself specifies column naming so the extracted workbook matches the bookkeeper's schema, output structure so header identifiers repeat on every line, date standardization so authorization windows are sortable and filterable, and conditional logic for document variants (a cancelled session rebilled in a later month, a credit line against an earlier invoice). The underlying capability is prompt-driven instruction applied across batch volume, which is what makes the same prompt produce the same workbook across ten invoices or ten thousand — the reliability a bookkeeper's monthly workflow depends on, but that general-purpose AI chat tools do not deliver consistently at volume.
Reconciling Session Notes and Billed Units: Where Real Invoices Break
A clean extraction is the easy part. The harder work is what happens when the extracted rows meet a session-note export or a payer's authorization-usage report and the three sources disagree. The variance patterns below are the ones that show up on real monthly reconciliations, and each one has a specific extraction decision that keeps the discrepancy visible in the workbook rather than buried in it.
Split sessions. A session that starts with one technician and finishes with another, or one that pauses across two billing windows, often appears as two line items on the invoice but one entry in the session note. The extraction schema should capture both lines with the same session identifier (or at minimum the same DOS, client, and start-of-window anchor), different rendering providers, different modifier stacks, and different unit totals. When the bookkeeper groups by session identifier and sums units, the two lines roll back up to the one session the note describes, and the variance column compares that total against the session-note duration cleanly.
Cancelled-then-rebilled sessions. Sessions flagged as cancelled in one month's authorization-usage report sometimes reappear as billed units in the following month — the payer's record caught the cancellation, but the practice's billing system submitted the corrected charge in a later window. The extraction should preserve the original date of service, not the submission date, on every row. Otherwise month-over-month reconciliation double-counts: the original month already saw the cancellation, and the submission month now counts the rebilled units a second time.
Mid-month modifier changes. When a technician's credential level changes during a billing month — an RBT completing BCaBA certification, for example — the modifier stack on their later sessions changes and the rate differs from their earlier sessions. Modifier and rate must live at the line level. A header-level modifier field applied to the whole invoice silently averages away the change, and the line total column stops reconciling to the per-unit rate math. Keeping both columns per-row preserves the audit trail and lets the bookkeeper explain the rate shift to the practice owner without reopening the source PDF.
Inconsistent credential abbreviations. One practice labels its technicians "RBT," another uses "BT," a third uses "RBT-Trainee" for what is effectively the same tier. The extraction prompt should normalize these to a short controlled vocabulary — a credential enum the prompt declares explicitly, mapping each source label to a canonical value — so the downstream filter for RBT-delivered hours works regardless of which practice the invoice came from. Without the normalization, a cross-practice report silently undercounts hours under every non-canonical label.
State-specific modifier variance. The same service can carry different U-code modifiers in different state Medicaid programs. A bookkeeper supporting a practice that bills across state lines needs a payer-state field on every row and an extraction prompt that interprets the modifier stack with the payer state in mind, rather than assuming a single state's U-code definitions. The article is not the place to teach the U-code map — every state Medicaid manual owns that territory — but the extraction schema needs to preserve enough state context that the map can be applied correctly downstream.
Missing authorization carryover between months. When an authorization spans several billing months, the units-used running total on the practice's invoice does not always start from the prior month's close; some practice-management systems reset the counter per billing cycle, some carry it forward, and the inconsistency shows up as a cumulative-total mismatch. The extraction captures whatever the invoice shows (practice-reported units-used), and the bookkeeper maintains a separate cumulative column computed from the extracted rows themselves. Discrepancies between the two columns become the month's reconciliation signal.
Session-note counts against billed-unit counts. When a session-log export from the practice-management platform is extracted alongside the invoice, the bookkeeper can cross-check the session-note time blocks against the billed units and surface the variance as its own column. The two sources should agree within minutes — more variance than that points to a session-note that was updated after the invoice generated, a billing correction that has not yet synced to the notes, or a technician who ran overtime the authorization did not cover. Any of those is a conversation with the practice owner, and the variance column is the evidence.
Cross-setting work brings its own wrinkles. When ABA services are delivered in schools (POS 03), the bookkeeper's reconciliation often intersects with educational-services invoicing patterns, which share the student-level-detail and multi-payer structure; invoice extraction for educational-services providers covers those adjacent patterns for readers whose practice straddles the clinical and school-based setting.
Working From Practice-Management Platform Exports
Some bookkeepers never see the practice's PM platform and work only from the invoice PDF the practice emails them. Others have dashboard access and can pull exports directly. The workflow accommodates both. Every major ABA practice-management platform offers some form of CSV or Excel export covering sessions, authorizations, and billing detail — the field set is roughly the same as the invoice schema, with different column names and different structural choices. The bookkeeper's job when working from an export is to normalize it into the same canonical workbook shape the invoice extraction produces.
CentralReach is the most common PM platform in the searcher's environment, and its Standard Export is the most common non-PDF input — a CSV covering sessions, authorizations, and billing detail that a bookkeeper with dashboard access can pull directly. Motivity is the next most common; its session information exports cover DOS-level detail with linked billing data in a separate file that has to be joined back on session ID. Raven Health, Artemis ABA, Noteable, Measure PM, RethinkBH (via its BillAI module), and Hi Rasmus each offer comparable CSV exports covering sessions, authorizations, and billing detail, with platform-specific column names and occasionally nested fields that have to be parsed out before the data behaves like a flat workbook.
The normalization job is where the extraction prompt earns its keep on platform exports, not just on PDFs. PM exports arrive with platform-specific column names (some call the authorization field "auth_id," others "authorization_number," others "approval_ref"), nested data structures (modifier stacks sometimes appear as JSON blobs in a single CSV cell, session-note text sometimes embeds billing codes), and inconsistent date formats across platforms. A prompt applied to the export renames columns to the bookkeeper's canonical schema, flattens nested fields into discrete columns (parsing modifier stacks into their own columns, for example), and standardizes date and currency formats so the output matches the PDF-extraction workbook exactly. The same downstream reports then run across both input types without branching logic.
For bookkeepers who work only from PDFs, none of this changes the extraction workflow; the PM-platform path is an alternative input, not a replacement. Most external bookkeepers have exactly the access the practice owner granted them, which is often the monthly invoice email and nothing else. The canonical schema is the same either way, which is the point: the spreadsheet does not know whether its rows came from a parsed PDF or a normalized CSV, and the reconciliation reports do not have to.
Authorization-Unit Tracking, Credential-Filtered Hours, and QuickBooks Class Mapping
A clean extraction workbook feeds three concurrent monthly jobs. These are what the bookkeeper actually delivers to the practice owner — the reason the extraction work is worth doing and the measure of whether the schema and the prompt pattern were right.
Authorization-unit tracking. The first job is a pivot that rolls units used by CPT code, by authorization number, by month, and compares the total against units approved. Produced in Excel or Google Sheets on top of the extraction workbook, the pivot flags two conditions the practice owner needs to see: authorizations burning through their approved pool faster than the remaining authorization window can support, and authorizations that have already overbilled. Both conditions are actionable — the first triggers a reauthorization request to the payer, the second triggers a credit note or a billing correction — and neither is visible without the extracted data at row-per-session granularity with the authorization number present on every row.
Credential-filtered hour reporting. The second job is a roll-up of hours by credential tier — RBT versus BCaBA versus BCBA — grouped by client and by month. That report feeds two downstream reads. Payer-mix analysis by credential tier compares commercial-payer revenue against Medicaid revenue at each credential level, which matters because payer rates and credential mix together determine contribution margin per client. Supervision-ratio compliance compares BCBA-delivered supervision hours against RBT-delivered direct hours; commercial and Medicaid payers both expect supervision to hold at specific ratios, and falling below triggers denial or audit risk. The report only works because the credential column reached the spreadsheet at the line level rather than being lost in a header block, which is the payoff for the extraction decision the schema section insisted on.
QuickBooks and Xero mapping. The third job is moving the extracted rows into the GL. The structural mapping is consistent across practices: each client becomes a QuickBooks customer (or a sub-customer under a parent payer), each payer becomes a class in QuickBooks or a tracking category in Xero, each CPT code becomes an item on the invoice or sales receipt, and each extracted row becomes a single line on a sales receipt (cash basis) or an invoice (accrual basis) depending on the practice's accounting setup. The extraction workbook is the staging file the import runs from — whether through a direct Excel import, a CSV upload, or an IIF file for older QuickBooks Desktop deployments, the file's shape is what determines how clean the GL ends up.
Credit-note and cancellation handling in the GL. Cancelled-then-rebilled sessions are the case where preserving the original DOS on every row pays off materially in the GL. If the schema kept that original date rather than flattening to the submission date, the bookkeeper posts the reversing entry against the correct month and the corrected charge against the correct month, and the month-end reports reflect the practice's actual service delivery rather than an artifact of when the billing system submitted the correction. Flattening everything into the submission month is mechanically easier and materially wrong — a common enough pattern in practices using template-driven extraction that a bookkeeper switching to session-level extraction often surfaces several months of misposted reversals within the first reconciliation cycle. The general invoice reconciliation workflow once the data is structured covers the downstream patterns that apply once extracted invoice data has been normalized into a workbook; the ABA schema feeds directly into that pattern, specialized for this vertical's field set.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
Bar Pour Cost From Spirits and Beer Invoices
Calculate bar pour cost from spirits, beer, and mixer invoices. Build a spreadsheet for bottle yield, keg cost, cocktail recipes, pricing, and variance.
Extract Legal Invoice Line Items to Excel
Extract legal invoice line items from PDFs into Excel. Capture time entries, expenses, matter refs, and page-level traceability without manual copy-paste.
340B Chargeback Mismatch Detection Guide
Detect 340B chargeback mismatches by reconciling split-billing output, wholesaler invoices, credits, rebills, and repayment records before disputes escalate.