Brokerage 1099 composite extraction for tax preparers converts a consolidated brokerage tax statement into reviewable Form 8949-ready data. The main burden is usually the 1099-B sale table: each row can carry proceeds, cost or other basis, acquisition date, sale date, holding period, wash-sale adjustment, and covered or noncovered status. If those fields are lost, flattened, or grouped incorrectly, the return-prep problem has not been solved.
Tax preparers usually reach for extraction when a direct brokerage import is unavailable, incomplete, hard to review, or not worth the authorization friction for the client. A Schwab, Fidelity, Morgan Stanley, Edward Jones, Robinhood, Vanguard, E-Trade, Merrill Lynch, UBS, Raymond James, LPL, or Interactive Brokers composite can be a compact statement for one investor or a long PDF with hundreds of sale lines for an active trader. The extraction task is to turn that PDF into a workpaper the preparer can inspect before data moves into Lacerte, ProConnect, ProSeries, Drake, CCH UltraTax, GoSystem Tax RS, or an intermediate CSV, TXF, DAT, or spreadsheet workflow.
The destination matters because a consolidated 1099 is not one form. It is a package. The same PDF may include 1099-DIV income, 1099-INT income, 1099-B proceeds and basis, 1099-OID amounts, occasional 1099-MISC items, Section 1256 contract totals, supplemental lot history, and year-end summary pages. Some items feed Schedule B, some feed Schedule D and Form 8949, some belong on Form 6781, and some exist mostly as review support.
The useful workflow is therefore not "OCR the PDF" in the abstract. It is a controlled workpaper process: identify the statement sections, extract the fields that drive the return, preserve the grouping that affects Form 8949, flag exceptions for review, tie extracted totals back to the broker summary, and only then import or key the return data.
Start With the Composite Statement Map, Not the Import File
A composite brokerage statement should be mapped before it is extracted. The cover pages and year-end summary usually show totals by form type and box. Those pages are valuable control totals, but they are not always the data the preparer can rely on for entry. They tell the reviewer whether the extracted detail reconciles back to the broker's statement.
The 1099-DIV section carries dividend income, qualified dividend amounts, capital gain distributions, withholding, Section 199A dividends, foreign tax paid, and country detail when the broker provides it. The 1099-INT section can carry taxable interest, Treasury interest, tax-exempt interest, withholding, market discount, bond premium, and foreign tax. The 1099-B section is usually the longest and most operationally fragile because it contains sale lots or transaction lines. OID, MISC, Section 1256, and supplemental transaction pages may appear after that, depending on the account activity.
This is where composite work differs from a general tax document OCR workflow for CPA firms. A form-level OCR workflow can identify W-2s, 1099-NECs, W-9s, and simple 1099s across a client batch. A composite 1099 requires section awareness inside one source PDF. The extraction needs to know when a table is summary support, when it is the authoritative sale detail, and when a later supplemental page explains basis adjustments or wash-sale matching.
Brokerage layout variation adds another layer. Schwab, Fidelity, Morgan Stanley, Edward Jones, Robinhood, and other issuers do not use identical section order, table headings, continuation-page labels, or subtotal placement. A preparer does not need a brokerage-by-brokerage catalogue, but the workpaper should preserve enough source context to answer review questions: which section the row came from, what page it came from, and which statement total it should tie to.
That source context is especially important when a statement has more than one account, multiple taxpayer ownership labels, nominee detail, or a corrected version issued after the first PDF was saved. The extraction file should carry account number or masked account identifier when present, statement year, statement version, section heading, source page, and row description. Those fields may not be imported into the tax software, but they make the preparer's review defensible when the same client sends two similar brokerage PDFs in the same season.
Treat 1099-B Sale Lines as the Core Extraction Problem
The 1099-B table is where most composite-statement extraction projects succeed or fail. A clean output table should preserve, at minimum, the description of property, CUSIP when present, quantity, acquisition date, sale date, proceeds, cost or other basis, accrued market discount, wash-sale loss disallowed, short-term or long-term indicator, covered or noncovered status, adjustment codes, and source page. For many clients, each sale lot needs to remain its own reviewable row.
The reason is not merely clerical. For each covered-security sale reported on Form 1099-B, brokers must report acquisition date, short-term or long-term gain or loss status, cost or other basis, accrued market discount, and wash-sale loss disallowed, according to the IRS Form 1099-B instructions. Those are the same details a preparer reviews before Form 8949 and Schedule D treatment. If an extraction process captures proceeds but drops Box 1e basis, Box 1f market discount, Box 1g wash-sale loss disallowed, or the holding-period indicator, the output may look complete while leaving the reviewer with manual reconstruction.
Active-trader clients make the weakness obvious. A small investor statement may have a handful of sales, but an active trader, concentrated equity investor, or high-net-worth client with multiple managed accounts can produce hundreds of sale rows in one composite. Manual keying turns into a bottleneck, and spot-checking becomes harder because every row looks similar until an exception appears.
The extraction file should therefore be built for preparer review, not just data capture. A wash-sale flag should stay attached to the affected sale line. A noncovered security should not be blended into the same group as covered basis reported to the IRS. Market discount, missing dates, partial basis, and adjustment codes should appear as explicit fields or review notes. The goal is a transaction table that lets the preparer see what the broker reported and where professional judgment still has to be applied.
Preserve the Grouping Before Anything Reaches Form 8949
An extracted 1099-B table is not ready just because every row has proceeds and basis. The rows need to carry the grouping that a reviewer uses before Form 8949 entry: short-term or long-term, basis reported or not reported to the IRS, and whether the transaction falls outside ordinary broker-reported sale detail.
For practical review, short-term covered transactions generally need to be separable from short-term noncovered transactions, and long-term covered transactions need to be separable from long-term noncovered transactions. Preparers commonly think of these as the Form 8949 checkbox families: Box A, Box B, and Box C for short-term categories, and Box D, Box E, and Box F for long-term categories. The extraction should not force the preparer to infer that grouping later from scattered notes or table headings.
Noncovered and missing-basis rows deserve special treatment. They may reflect older holdings, transferred-in lots, debt instruments, options, incomplete broker records, or basis that the client must support from another source. A good workpaper flags those rows for review instead of quietly placing them beside clean covered-basis rows. Suspect acquisition dates, blank basis fields, obvious placeholder values, and broker footnotes should also survive extraction because they tell the reviewer where not to trust automation.
Wash-sale disallowance is another row-level field, not a footnote to summarize away. If Box 1g appears for a sale, the amount and any related W code need to remain attached to that transaction. The same discipline applies to market discount and other adjustment codes that affect how the row is reviewed.
Section 1256 contracts should be separated from ordinary stock and fund sale rows. Brokerage composites may report regulated futures contracts, foreign currency contracts, or Section 1256 option contracts in a distinct 1099-B area, but those amounts are not handled like a routine covered equity sale list. They need their own table or tab so the preparer can route them to the appropriate Form 6781 review workflow.
Some firms use summary reporting with an attached statement where permitted by their return-prep approach and client facts. Extraction still matters in that workflow because the attached detail, reviewer workpaper, and summary totals need to reconcile.
That is why the grouping fields should be explicit columns instead of inferred labels in a note. A reviewer should be able to filter the extracted table by holding period, basis status, adjustment code, and exception flag before deciding whether the data belongs in detailed entry, summary entry, attachment support, or a separate review queue.
Do Not Ignore the Smaller Composite Sections
The 1099-B pages usually consume the most staff time, but the smaller sections still create return inputs and tie-out controls. A composite extraction file should keep them structured rather than treating them as narrative pages around the sale table.
For 1099-DIV, preparers usually need ordinary dividends, qualified dividends, capital gain distributions, federal income tax withheld, Section 199A dividends, investment expenses when present, foreign tax paid, and foreign country detail. Foreign tax is especially easy to under-capture because a summary box may show the amount while later pages provide country support. If the statement includes country-level detail, the extraction should preserve it for the preparer reviewing Form 1116 or the direct credit route available in simpler cases.
For 1099-INT, the useful fields include taxable interest, early withdrawal penalty, Treasury interest, federal withholding, foreign tax paid, foreign country, tax-exempt interest, specified private activity bond interest, market discount, and bond premium. These amounts often feed Schedule B and state adjustment review, so they should tie back to the composite summary even when they are not the primary pain point.
OID and MISC sections are less common but still need attention. A 1099-OID area can include original issue discount, acquisition premium, Treasury OID, bond premium, tax-exempt OID, and state withholding details. A composite may also include occasional 1099-MISC items such as substitute payments. These sections do not need to dominate the workpaper, but they should not be missed because the extraction process was trained only on sale rows.
This is where composite extraction overlaps with broader 1099 form data extraction, but the operating problem is different. A preparer is not just identifying a form type in a batch. They are preserving several form sections inside one brokerage PDF and reconciling those sections to the statement summary before the return is finalized.
Choose the Extraction Route Based on Review Risk
Manual entry still has a place. If a client has a short composite statement with a few sale rows, no noncovered basis, no wash-sale activity, and no unusual income sections, typing the relevant fields may be faster than building an import process. The risk changes when the statement is long, the table spans many pages, or the reviewer needs to trace exceptions back to the PDF.
Direct brokerage import is often the first route to test when the firm's tax software supports the broker and the client can complete the authorization flow. It can save time for mainstream brokerages and straightforward accounts. Its weakness is reviewer visibility. The preparer may not see a clean source-to-row workpaper, and the import can be less useful when the client has a corrected statement, a smaller custodian, transferred lots, missing basis, or multiple brokerages with inconsistent data availability.
The import also has to match the firm's review process. Some preparers are comfortable importing directly into the return and reviewing inside the tax software. Others want a spreadsheet workpaper first, especially when staff prepare and managers review. If the imported data cannot be reconciled to the PDF without re-opening the original statement line by line, the apparent time savings may move work from preparation to review.
Specialized conversion tools can be useful when the problem is narrowly defined: convert 1099-B sale detail into a Form 8949 attachment or import file. That may be enough for many active-trader returns. It is less complete when the firm wants one workpaper covering DIV, INT, OID, Section 1256, summary tie-outs, source pages, custom review flags, and the firm's own column structure.
AI extraction from the source PDF fits the middle ground: the firm wants structured data but also wants to review the data before return entry. Invoice Data Extraction is one route for this kind of financial document data extraction: users upload financial documents, describe the fields and output structure they need in a prompt, and download structured Excel, CSV, or JSON. For a composite brokerage statement, that means the prompt can ask for separate tables for 1099-B sale rows, dividend and interest boxes, OID items, Section 1256 amounts, summary tie-outs, source page references, and exception flags.
The routing question is not which method is most automated. It is which method gives the preparer enough control for the client's facts. A direct import that cannot be reviewed may be weaker than a spreadsheet that exposes every exception. A manual process may be safer for a ten-line statement and indefensible for a thousand-line one.
Design the Output Table for Tax-Prep Import and Review
The most useful output is not a single flat OCR export. It is a workbook or data package organized the way a preparer reviews the composite. For 1099-B, the core table should usually be one row per sale lot or transaction line, with columns for security description, CUSIP when present, quantity, acquisition date, sale date, proceeds, cost or other basis, holding period, covered status, Form 8949 group, adjustment code, wash-sale amount, market discount, source page, and review flag.
Separate tabs or tables keep the rest of the composite from being buried. A DIV tab can hold ordinary dividends, qualified dividends, capital gain distributions, withholding, Section 199A dividends, and foreign tax detail. An INT tab can hold interest categories, Treasury interest, tax-exempt interest, withholding, market discount, and bond premium. OID, MISC, Section 1256, and summary tie-out tabs let the reviewer compare extracted detail to the statement totals without hunting through the PDF.
Source page references are not cosmetic. They let a reviewer move from a suspicious row to the original statement quickly. Exception flags serve the same purpose. A blank basis field, noncovered status, wash-sale amount, market discount code, missing date, corrected-statement notation, or Section 1256 line should be visible before the data enters the return.
Import format should be planned after the review schema, not before it. Lacerte, ProConnect, ProSeries, Drake, CCH UltraTax, GoSystem Tax RS, CSV, TXF, and DAT workflows all reward clean column design, but the firm should preserve its review columns even if a final import file requires a narrower field set. The reviewer workpaper can be richer than the upload file.
In practice, that often means producing two outputs from the same extraction. The first is the review workbook, with every source and exception column the firm wants to see. The second is the import-shaped file, stripped down to the columns the destination system accepts. For a Form 8949 import, that narrower file may need only description, acquired date, sold date, proceeds, basis, adjustment code, adjustment amount, and classification fields. The review workbook can retain account identifiers, PDF page references, broker section names, original column labels, subtotal tie-outs, preparer notes, and exception status.
This separation prevents a common failure: designing the workpaper around the limitations of the import format. Tax software imports are built for entry. Reviewers need evidence. The extraction process should satisfy both by preserving the richer data first and deriving the import file from it only after the data has been checked.
This mirrors the logic of a real estate K-1 consolidation workpaper: the firm is not only extracting fields, it is standardizing investor-tax data into a form reviewers can reconcile, annotate, and route into the return.
Review Controls Before Importing or Keying the Return
Review should start with totals, not individual exceptions. Compare extracted proceeds, basis, dividends, interest, federal withholding, foreign tax, OID, and Section 1256 totals to the composite summary where the broker provides one. If the summary page shows different totals from the extracted tabs, resolve that before staff import rows or start keying data.
Then move to row-level review. Missing acquisition dates, missing basis, noncovered rows, wash-sale disallowance, market discount, unusual adjustment codes, negative proceeds, zero-basis entries, duplicate rows, and split transactions should be visible in an exception report or flagged column. The preparer does not need every ordinary covered sale elevated for review, but the workpaper should make the unusual items difficult to miss.
Foreign tax detail deserves a separate check. If 1099-DIV Box 7 or 1099-INT foreign tax appears in the statement, confirm whether the composite provides country detail and whether the extracted data preserved it. A summary amount without country support may be enough for some client facts and not enough for others, so the extraction should give the preparer the source detail that exists rather than making that decision invisibly.
Corrected statements create another control point. Brokerage composites can be corrected after the original PDF is received, especially when basis, reclassification, or income details change. The firm should retain version identifiers, receipt dates, file names, and review notes so staff do not mix original and corrected data in the same return file.
A practical control sequence is to hold the import until three questions are answered. First, do the extracted section totals agree to the broker's summary pages or explainable subtotals? Second, are all exception rows assigned to a reviewer, including noncovered basis, wash sales, Section 1256 activity, foreign tax detail, and corrected-statement differences? Third, does the import file reconcile back to the reviewed workbook after any rows are filtered, grouped, or remapped for the destination software?
The review standard is simple: extraction should expose uncertainty. A good workpaper makes it clear which rows tied out cleanly, which rows need basis support, which items are outside the routine 1099-B flow, and which totals changed after a corrected statement. The preparer still decides the tax treatment.
A Practical Routing Rule for Tax Season
Route composite statements by review risk. Manual entry is reasonable for short, low-exception statements where the preparer can see every sale line and tie the totals quickly. Direct brokerage import is useful when the broker, client authorization, and tax software mapping are reliable enough for the reviewer to trust the result. Specialized conversion can make sense when the only real need is a Form 8949-ready 1099-B file.
AI extraction is strongest when the source PDF itself needs to become the workpaper: long composites, multiple brokerage layouts, active-trader sale tables, noncovered basis, wash-sale amounts, foreign tax detail, Section 1256 activity, corrected statements, or firm-specific review columns. In those cases, the value is not just avoiding manual typing. It is getting the statement into structured, auditable data before the return is touched.
That distinction matters during peak workload. A tax-season process that treats every composite statement the same will either overbuild simple jobs or under-control risky ones. The same routing discipline applies when firms are managing catch-up bookkeeping for tax-season clients alongside brokerage-document review. A better rule is to send clean short statements through the fastest reliable path, send broker-supported accounts through direct import when review visibility is adequate, and send long or exception-heavy PDFs through extraction that produces a reviewer-ready workpaper.
Brokerage 1099 composite extraction for tax preparers is at its best when it preserves the original statement's tax-relevant structure: sale rows stay tied to Form 8949 grouping, income boxes stay tied to summary totals, exceptions stay visible, and preparers get structured data before they make return decisions.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
Tax Document OCR for CPA Firms: A Practical Guide
Learn how CPA firms use tax document OCR to process W-2s, 1099s, and W-9s in batches. See what matters in review, export, and tax-prep handoff.
1099 Form Data Extraction: OCR to Excel for Tax Teams
Extract received 1099 data to Excel, CSV, or JSON with a reviewable workflow for 1099-NEC, MISC, INT, DIV, and K tax-season batches.
Tax-Season Catch-Up Bookkeeping for CPA Firms
CPA firms can batch bank statements, receipts, and card activity into tax-ready books without mixing client files or slowing preparer review.