A clean extraction of a stockist tax invoice gives you one row per line item, and each row carries everything you need to post it: product, pack, HSN, batch number, expiry date (as a real Excel date, so the sheet sorts for FEFO), MRP, PTR, quantity including any free or scheme units, rate, and the CGST and SGST split. The invoice number repeats on every row so each line stands on its own. That single sheet is what turns a pharmacy purchase invoice to Excel from an afternoon of keying into a download you check and post.
Two things about these documents make them predictable. Batch number and expiry are mandatory on Indian drug labelling, so they appear on every stockist tax invoice without exception, which means your sheet can rely on them being there. And the tax is now lighter than most older guides say: after the September 2025 GST revision, most medicines are charged at 5%, some goods sit at 18%, and several life-saving drugs are at nil. Getting that rate right per line is the difference between a clean input tax credit claim and a mismatch you chase later.
Be clear-eyed about what is actually new here, because the SERP is not empty. Vertical suites like Marg ERP and MedLens do read received invoices now, pulling item, batch, expiry, MRP and GST through their own OCR. The distinction is not that nobody else can extract the data. It is that their output lands inside their software, mapped to their masters, and yours does not have to. A vendor-neutral spreadsheet you own feeds whatever you already run, with no migration into a new billing system to get your own purchase data back.
That ownership is the whole point. The output is a plain Excel or CSV that goes straight into your purchase entry, your input tax credit working, and your stock and expiry register. If you would rather skip the manual route entirely, you can extract pharmacy stockist invoices to Excel automatically from a stack of supplier PDFs and keep the file as your own.
The Fields That Make a Stockist Invoice Different
A stockist invoice is not a generic supplier bill with a few extra columns. It carries a pricing chain and compliance fields that a normal purchase invoice never touches, and capturing them correctly is what separates a chemist purchase bill to Excel with HSN and GST from a sheet that looks complete but cannot be posted. This is also where the general workflow for extracting Indian purchase invoices to Excel stops being enough on its own and the pharma-specific fields take over.
Start with the pricing chain, because it confuses people who assume one price per line. Three prices ride on a medicine, and they are linked. PTR, the Price to Retailer, is what the stockist charges the chemist: PTR = MRP × (1 − retailer margin%). PTS, the Price to Stockist, is what the stockist paid above them: PTS = PTR × (1 − stockist margin%). The invoice usually shows PTR or PTS as the transacted rate, while the MRP printed on the strip is the ceiling the patient can be charged. India's Maximum Retail Price (MRP) rules make MRP the highest price that may legally be charged to a consumer, and it must be printed on all retail packaging. Both numbers belong in the sheet as separate columns: the PTR is your cost, the MRP is your selling ceiling, and the gap between them is your margin.
Batch number and expiry date are not optional formatting choices by the stockist. They are mandatory on Indian drug labelling under the Drugs and Cosmetics Rules, which is precisely why they appear on every line of every stockist tax invoice. Their job in your register is rotation. Expiry drives FEFO, first-expiry-first-out, so the stock that lapses soonest moves first. That only works if expiry is captured as a native Excel date rather than text, because a column of real dates sorts and filters, and a column of dd/mm strings does not.
HSN tells you the GST bucket. Pharmaceuticals sit in chapter 30: 3004 covers packaged medicaments put up in measured doses or retail packs, 3003 covers bulk or unpackaged formulations, and 3006 covers specified pharmaceutical goods. The rate has moved, and older guides are now wrong on this point. Following the September 2025 revision, the change brought in by the 56th GST Council and effective 22 September 2025, most medicines are taxed at 5%, some goods remain at 18%, and a set of life-saving drugs are at nil. Capturing HSN and the per-line CGST and SGST together lets you confirm the rate applied to each item rather than assuming a single rate across the bill.
None of these fields is the stockist being thorough by choice. Batch, expiry, HSN, quantity, rate and the tax split are part of what a valid GST tax invoice must contain under Rule 46, so a compliant stockist invoice already holds every field your purchase register needs. The extraction is reading data that is required to be there, not inventing structure that is missing.
The Extraction Prompt That Names Every Pharma Column
The prompt is the configuration. There is no template to map and no rules engine to wire up. You describe the columns you want and the rules each one follows in plain language, and that description is what runs against every bill. Here is a working prompt to convert a pharma distributor invoice to Excel under Indian GST; adapt the column names to match how your accountant posts:
Extract one row per line item from these stockist tax invoices. Repeat the Invoice Number and Invoice Date on every line. Use these columns in order: Invoice Number, Invoice Date, Product (item description as printed), Pack (such as 10x10 or 1x200ml), HSN, Batch Number, Expiry, MRP, PTR, Quantity (billed units only), Free/Scheme Quantity, Rate, CGST Amount, SGST Amount, Line Total.
Capture Expiry as a native Excel date; if the bill prints the month and year only, set it to the last day of that month. Keep MRP, PTR, rate and every tax and total field as numbers, not text. Put any free or scheme units in the Free/Scheme Quantity column only, never added into Quantity. Give one row to each line item, not one row per invoice. Skip summary pages, remittance advice and any page that is not line items.
A few of these rules carry more weight than the rest. Expiry as a native Excel date is the one people get wrong most often, because a stockist may print it as 09/27 or 09/2027, and only a real date will sort for FEFO. Repeating the invoice number on every line keeps each row self-contained, so you can filter, post or query a single line without losing which bill it came from. The separate free or scheme quantity column matters enough to have its own section below. The short version: merging free units into billed quantity quietly corrupts both your stock count and your effective cost.
Because the prompt is just a description, you write it once and reuse it. Save it, and the next month's stockist bills run against the same rules without re-typing anything, which is what keeps a recurring purchase entry consistent from one supplier to the next.
This is the kind of instruction the extraction engine is built to follow. Invoice Data Extraction takes exactly this sort of detailed, field-level prompt: it does per-line-item extraction with the invoice number repeated on each row, lets you name the output columns yourself rather than accepting fixed headers, and writes native Excel types so dates land as dates and amounts as numbers ready for formulas and pivots. You can also add a classification column, for instance a flag that marks a row as a scheme line, and keep the finished prompt in a saved library so the whole team posts to the same layout. The result is a plain Excel or CSV you own, not a record locked inside a vendor's masters.
Handling Real Stockist Bills: Dense Layouts, Handwriting, and Multi-Invoice PDFs
The prompt above assumes a clean bill. Real stockist paperwork is not clean, and that is usually why people give up on extracting it and go back to manual entry. Three problems come up again and again, and each has a concrete answer.
The first is density. A stockist invoice packs product, pack, HSN, batch, expiry, MRP, PTR, quantity, scheme units and the CGST and SGST split into a narrow multi-column grid, often with two or three line items squeezed into the vertical space a normal invoice gives one. The extraction reads the relationship between the columns, what sits under the Batch header, what sits under Expiry, rather than relying on fixed pixel positions, so a tight grid from one stockist and a roomier one from another both resolve to the same set of fields.
The second is handwriting. Plenty of vouchers are part-printed and part-written: a computer-generated bill with a batch or expiry corrected by hand, a fully handwritten kacha voucher, or a bill you only have as a mobile-phone photo. These go through alongside the clean PDFs. Where a handwritten note overrides the printed line, you can instruct the prompt to prefer the handwriting, so a pen correction to an expiry date wins over the typed value beneath it. The same approach carries to other document-heavy verticals under Indian GST; it is how you would extract Indian construction supplier invoices and e-way bills from equally awkward site paperwork.
The third is the multi-invoice PDF. One stockist often sends a month as a single file: ten or twenty separate invoices concatenated, with summary pages, remittance advice and cover sheets mixed in. The workflow separates each invoice inside the PDF, filters out the pages that are not line items, and keeps every extracted row tied to its own invoice number, so a single scan of a pharmacy purchase bill to Tally-ready Excel does not blur two suppliers' bills into one block. Run a whole month of stockist files in one batch against the saved prompt, and the messy input becomes one ordered sheet.
Scheme and Free-Goods Lines: Why 10+1 Needs Its Own Column
Scheme offers run through pharma distribution constantly, and they break a purchase register that is not built for them. A 10+1 line means the stockist bills ten units and supplies an eleventh free. You receive eleven, you pay for ten, and the free unit carries no separate charge on the invoice.
That free unit changes two numbers. Your effective per-unit cost drops, because the same money now buys eleven instead of ten, and your stock on hand rises by the full eleven. If the free quantity is folded into the billed quantity, both numbers go wrong: the sheet either overstates what you paid per unit or understates how many you actually hold, and a FEFO register that miscounts units is not much use. Free-of-cost supplies also carry their own treatment for input tax credit, distinct from the billed line, so the free quantity has to stay visible rather than disappearing into a single quantity figure.
The fix is the dedicated Free/Scheme Quantity column from the prompt, holding billed units and free units apart on every line. With the two separated, your effective cost calculates correctly, your stock count reflects the full eleven, and the ITC position on the billed portion stays clean. Where it helps your posting, add a simple flag column that marks the row as a scheme line, so a quick filter shows every free-goods entry in the month at a glance.
From Excel to Purchase Entry, ITC, and an Audit Trail
The sheet only earns its keep when it lands in your books. Because each row already holds batch, expiry, HSN, rate and the tax split, it maps straight onto a purchase voucher or a goods receipt note. Medical store purchase entry from invoice stacks turns into import and check rather than re-keying every strip by hand: you load the file, scan it against the originals, and post. The same data feeds Tally or Marg directly, and if you run TallyPrime you can automate purchase voucher entry into TallyPrime with invoice OCR from the same extracted sheet.
The per-line tax is what makes the input tax credit work. With CGST and SGST captured against each HSN, you have the detail to claim ITC and to reconcile input tax credit against your GSTR-2B line by line. Pharma purchasing runs at high volume and thin margin, so an ITC mismatch left unspotted across a month of stockist bills is real money, and matching at the line level is where it gets caught.
This is a financial workflow with money and compliance attached, so the data has to be checkable, not just convenient. Every extracted row carries a reference back to its source file and page number, so any figure on the sheet can be traced to the exact stockist invoice it came from. When an auditor questions an ITC claim, or a supplier disputes a batch or a rate, you go straight to the page instead of hunting through a folder of PDFs.
The extraction does the reading and the typing; your books, your register and your ITC working stay exactly where they are.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
Extract IT Consulting Invoices in India with TDS 194J
Convert Indian IT consulting invoices into per-consultant rows with GSTIN, PAN, hours, rates, and TDS 194J review fields for AP reconciliation.
Extract IT Hardware Purchase Invoices to Excel in India
Extract Indian IT hardware supplier invoices to Excel with asset-register fields, GST splits, HSN, and 194Q flags preserved per line for AP review.
Amazon, Flipkart & Meesho Invoices to GSTR-1
Build a GSTR-1-ready Excel workflow for Amazon MTR, Flipkart Sales Reports, Meesho TCS reports, and marketplace tax invoice PDFs.