Categorize Expenses from Invoices & Receipts in Excel

Use one reusable AI prompt to categorize a batch of invoices and receipts into your own expense categories, cost centers, and projects in a spreadsheet.

Published
Updated
Reading Time
14 min
Topics:
Invoice Data ExtractionExcelexpense categorizationcost center allocationchart of accounts

To categorize expenses from invoices and receipts, extract them to a spreadsheet and add an expense-category column in the same step. You describe your categories once in plain language, for example: classify each line as Office Supplies, Software, Travel, or Utilities. Every line is then tagged as it is pulled from the document, so the data lands already classified rather than waiting for you to sort it by hand afterward.

Save that prompt and the categories stay identical from one month to the next, which is what lets a SUMIFS formula or a PivotTable total spend by category without splitting the same category across three slightly different labels. The same approach extends past expense type: one prompt can assign cost centers, projects, or chart-of-accounts codes consistently across a batch of hundreds or thousands of documents, not a dozen at a time.

This is the part the usual advice skips. Search for how to categorize expenses and you get tutorials that start from a single clean sheet of already-typed transactions and teach manual dropdowns, IF formulas, or VLOOKUP keyword lookups. That assumes the hard part is already done. In practice the expenses arrive as a pile of PDFs and photographed receipts, and the question is how to get every one of them categorized at the source. Once you convert your PDF invoices to an Excel spreadsheet, you can have the categories applied in that same extraction rather than as a second manual stage.

That shift answers the question buried in most of these searches: can a tool actually classify the data, not just read it off the page? It can. Tools that extract and categorize invoice data automatically take your category definitions as part of the extraction instructions and return a populated category column alongside the invoice number, date, and totals.


Define Your Own Expense Categories Instead of Accepting a Fixed List

A receipt-scanner app sorts spend into the categories it ships with. That is fine until your books, your budget, or your client's chart of categories uses different names, a different level of detail, or buckets the scanner does not have. The advantage of describing the categories yourself is that the classification matches the scheme you already work in, not one imposed on you. You list the exact categories you want, and each line item or receipt is sorted into one of them.

Sorting expenses into a defined, finite set of categories is the standard practice, not a constraint the AI invents. For reference, the standard expense categories on IRS Schedule C sort a sole proprietor's deductible business expenses into roughly twenty categories on numbered lines, including advertising, car and truck expenses, contract labor, supplies, travel, and utilities. That is the same shape every categorization scheme takes: a closed list of named buckets that every expense maps into. Your list does not have to be the IRS one, but it should be a deliberate, fixed set in the same way.

A realistic working set for a small business might be COGS, Travel and Entertainment, Software and Subscriptions, Utilities, Professional Services, and Office Supplies. Those become the allowed values in the prompt directly: classify each line as one of COGS, Travel and Entertainment, Software and Subscriptions, Utilities, Professional Services, or Office Supplies. The list is the instruction. Add or rename a bucket and you change one line of the prompt, not a configuration screen.

Receipts deserve a word on their own. To auto-categorize receipts into expense categories is harder than invoices because a receipt often carries a thin, abbreviated description and a merchant name rather than tidy line items. The same defined list still applies, but the AI leans more on the merchant and the totals than on a clean description. That is worth knowing when you decide how granular to make the categories for a receipt-heavy batch.

Granularity is your call and depends on the downstream use. Broad buckets work for a budget owner watching a handful of spend lines. Finer categories suit a bookkeeper who needs the detail to post or report. Define the list at the level the downstream report needs, because that is the level every total will be built on.

Categorize by Rules: Vendor and Keyword Mapping

Letting the model read a description and pick a category works well for the varied middle of your data. But a lot of spend is predictable: the same suppliers recur every month, and certain words in a description reliably point to one category. For that predictable part you want a rule, not a judgment call, so the result is the same every time and you can explain why a line landed where it did.

The most direct rule is vendor to category. State the mapping and every invoice from that supplier is assigned a fixed category regardless of what the line descriptions say: anything from Adobe or Atlassian maps to Software and Subscriptions, anything from your landlord maps to Rent. This is rule-based invoice categorization by vendor, and it is the rule most worth writing first because vendors are the strongest signal you have and the easiest to enumerate.

Keyword rules cover the cases where the vendor varies but the description does not. If a line description contains "freight" or "shipping," assign Logistics; if it contains "subscription" or "license," assign Software and Subscriptions. You are matching on the text of the line rather than the supplier, which catches the same expense type across many one-off vendors.

Conditional logic handles the suppliers a flat rule cannot. A vendor like Amazon sells both hardware and software, so a single vendor-to-category map would be wrong half the time; a conditional fixes it: if the line is hardware, classify it as Office Supplies; if it is a software licence, classify it as Software and Subscriptions. These compose with the rules above: a vendor map for the suppliers you know, keyword rules for recognizable descriptions, and a conditional for the mixed vendors a flat map gets wrong. The rules give you consistency and auditability on the high-volume, predictable lines, while the AI's own reading handles the long tail of one-off or ambiguous descriptions that no rule anticipated.

What makes this practical is that all of it is written in plain language inside the same prompt as the extraction instructions. There is no separate rules engine to configure and no nested IF formula to maintain. This business logic goes straight into the prompt: Invoice Data Extraction takes hints, defaults, fallbacks, and conditionals as part of the instructions, so "anything from this vendor is always this category, and if the description contains X use category Y" sits right next to the fields you are extracting. The rules and the extraction run in one pass.

Assign Cost Centers, Departments, and Projects to Each Line

An expense category answers what was bought. It does not answer where the cost belongs. A hospitality group needs spend split across its outlets, a nonprofit needs a bulk ad-platform receipt decomposed into campaign budget centers, and a construction firm needs every supplier invoice tied to the job it was for. These are allocation questions, and the same prompt-driven method handles them by adding more columns.

To assign cost centers to invoice line items, add a cost-center or department column and define the allowed centers in the prompt the same way you defined the category list. You can let the AI infer the center from the content, or map it deterministically: a vendor or a description keyword maps to a fixed department, exactly the rule pattern from the previous section applied to a different column. Hospitality teams that code restaurant supplier invoices by department are doing this at the line level so produce, beverage, and cleaning supplies each route to the right department even when they appear on one invoice.

To allocate invoice costs to projects in a spreadsheet, add a project or job column so each line is tagged to the job it belongs to. This is the job-costing case, and it is where construction and trades businesses live: a single lumber invoice may cover three sites, and each line needs the right job code for the cost to land against the right project budget. The mechanics of how trades businesses allocate construction supplier invoices to jobs and cost codes are worked through separately, but the categorization step is the same one described here. The dimension does not have to be a job site: a fleet manager facing one consolidated leasing invoice can break a fleet leasing invoice into per-vehicle Excel rows, carrying each plate, driver, and cost centre down its own line so the lease, maintenance, and fuel charges land against the right vehicle.

The reason to treat these together is that they are not separate jobs. Expense category, cost center, and project are each just another instruction in the prompt, so a single extraction pass can return all three columns at once. One invoice line comes back tagged with its expense type, its department, and its job, because you asked for all three in the same set of instructions. Whether your dimensions are outlets, campaigns, sites, or jobs, the method does not change, only the column names and allowed values do.

Map Line Items to a Chart of Accounts (and How This Differs from GL Coding)

If you keep a chart of accounts, you can have lines pre-coded to GL account codes in the same pass. To map invoice line items to a chart of accounts, provide the account codes and their names in the prompt and instruct the AI to assign each line its code, the same mechanism that assigns an expense category. "6010 Office Supplies, 6200 Software, 7400 Travel" goes into the prompt as the allowed set, and each line comes back with the code attached.

The payoff is that the lines arrive analysis-ready or import-ready. Spend grouped by account code is immediately usable for reporting, and a spreadsheet of coded lines is the format most accounting packages expect for an import. Because one saved prompt applies the same codes to every document, the coding stays consistent across the whole batch rather than drifting as a person tires partway through a stack.

It is worth being precise about what this is and is not. This is spreadsheet-native classification for analysis and reporting: you are enriching extracted data with account codes so you can slice and total it. It is not the act of posting invoices to the general ledger inside an accounts payable or ERP workflow, with the approvals, postings, and audit trail that entails. If your goal is the AP posting step rather than the analysis, the way to code invoices to the general ledger for AP posting is its own process; the method here gets you coded data in a spreadsheet, which is a different and earlier point in the workflow.

Consistency matters here as much as it does for categories, and arguably more. A mistyped or drifting account code is worse than a missing one, because it quietly rolls spend into the wrong account and the total still looks plausible. Pinning the codes to a fixed list in the prompt is what keeps a downstream rollup honest.


Keep Categories Consistent and Trustworthy Across the Whole Batch

Getting a category column is easy. Getting one you can build a report on, month after month, is the part that actually decides whether this method is worth using. Three things make the output trustworthy: stable naming, a way to check and correct it, and a prompt you can rerun unchanged.

Consistent naming is the foundation. If "Travel," "Travel & Entertainment," and "T&E" all appear in the same column, a PivotTable treats them as three categories and your travel total fragments into three partial figures. The fix is to fix the list: state the allowed categories as a closed set in the prompt so the same spend always lands under the exact same label. A SUMIFS across a clean, closed category list returns one correct total; the same formula across a drifting list returns nonsense that looks fine until someone checks it.

Checking is the second piece, and it is cheaper when each row carries a reference back to its source. Invoice Data Extraction puts the source file name and page number on every row, so spot-checking a sample means opening the cited document rather than hunting for which invoice a row came from. When you find a misclassification, correct the prompt, not the cell: sharpen the definition of the category that was confused, or add a vendor or keyword rule for the case that tripped it. Editing the output fixes one run; editing the prompt fixes every run after it.

Reuse is what makes period-over-period comparison valid. A saved prompt reproduces the identical category set every time it runs, so March's spend and April's spend are sorted by the same rules and are genuinely comparable. This is the sharpest difference from pasting batches into a chat session, where each session starts cold and the category boundaries drift between runs. The prompt library exists for exactly this: save the prompt once, apply it to next month's batch, and the categories are guaranteed to match. If you have a representative batch but no prompt yet, the tool can analyze the documents and suggest a starting prompt you refine and save. The structuring choices that go alongside categorizing, such as whether to flatten line items into one row per invoice or keep invoice-level rows, are saved the same way, so the whole shape of the output stays fixed.

Ambiguous lines need a deliberate home. Some descriptions genuinely will not map cleanly to any category, and the wrong move is to let the AI force them somewhere plausible. Give it an explicit fallback, an Uncategorized or Other bucket, for anything it cannot place with confidence. That turns ambiguity into a short review queue you can scan, instead of silent misfilings buried in otherwise-clean data.

All of this has to hold at scale. The point is not categorizing a dozen rows by hand but batch categorizing supplier invoices by spend category across jobs that can run to thousands of documents, up to 6,000 files in a single batch. One saved prompt applies identically to every document in that batch, which is the only way consistency survives contact with real volume, where no human is going to keep three category labels straight across ten thousand lines.

From Categorized Lines to Spend Analysis

A clean category column is not the destination; it is what makes the next step trivial. Once every row carries a consistent category, totalling spend by category is a single SUMIFS or a PivotTable dropped over the column, and it works precisely because the categories are a closed, consistent set. The work you did defining and pinning the categories is what lets that total be correct on the first try rather than after a round of cleanup.

The same extracted data supports the other spreadsheet operations finance teams run after extraction. Totalling by category is one; splitting a gross figure is another, and if your invoices arrive tax-inclusive you may also need to split each gross invoice total into net and VAT before the numbers are usable. Some of that arithmetic can happen during extraction rather than after it, since the tool can compute and validate calculated fields like net-plus-VAT-equals-total on the way into the sheet. These are siblings of the categorization step, different operations on the same spreadsheet of extracted lines.

One distinction is worth drawing so you apply the right step. Classifying expenses into your own categories is not the same as classifying the document type, sorting invoices from receipts from delivery notes, which happens before extraction so the right data gets pulled in the first place. Nor is it the general-ledger posting that lives inside an AP workflow. This method sits between them: the data is extracted, and you are enriching it with the categories, centers, projects, or codes your analysis needs.

What you have at the end is a spreadsheet where every line is tagged into your own scheme, the tagging is consistent enough to total, and the prompt that produced it is saved to run again next month on the same terms.

Extract invoice data to Excel with natural language prompts

Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.

Exceptional accuracy on financial documents
1–8 seconds per page with parallel processing
50 free pages every month — no subscription
Any document layout, language, or scan quality
Native Excel types — numbers, dates, currencies
Files encrypted and auto-deleted within 24 hours
Continue Reading