A UK P60 data extractor reads the statutory fields from end-of-year P60 certificates — pay in this employment, total pay for the year, tax deducted, employee National Insurance contributions by category letter, statutory payments, student and postgraduate loan deductions, the employee's NINO, and the employer's PAYE reference in NNN/AAAAAAAA form — and writes them into spreadsheet rows, one row per employee per tax year. Output is a working Excel, CSV, or JSON file rather than an OCR text dump, which is the difference between something a finance team can use and something they then have to retype.
Because every P60 follows the HMRC-defined field set across tax years and across payroll software providers — including the substitute layouts each provider prints — a single extraction prompt covers any year and any provider. That is what turns P60 extraction into a stable workflow rather than a per-document handcraft. The same prompt that handles last year's P60s from Sage will handle this year's P60s from Xero, BrightPay, or Iris.
This article is for practitioners who handle P60s for other people: the accountant working through a stack of clients' year-end certificates, the bookkeeper rebuilding a new client's payroll history, the payroll bureau cross-checking issued P60s against its own RTI submissions, the verification team confirming prior-year income across a book of borrowers. Three downstream uses recur across this audience, and they're the jobs the rest of the article is built around: reconciling P60s against FPS year-end submissions, onboarding clients with prior-year payroll history, and preparing self-assessment workbooks at year-end. If you've landed here looking for help reading your own P60 — what the boxes mean, how to check your tax — that's a separate audience with its own answers, and this isn't the article for it.
What's worth saying up front about evaluating the tools that do this work: the same logic that applies to any payroll-document extractor applies here. Accuracy across providers' substitute layouts, batch handling at the volumes a real client book throws at you, and an output structure you can actually pour into the next workbook. Those criteria are covered in more depth in our guide on what to look for in payroll OCR software; the rest of this article is specifically about what makes P60 extraction its own workflow on top of those general criteria — the prompt patterns that fit the document, the workflows the output plugs into, and the validation pass that finishes the job.
How P60 extraction differs from payslip and P45 extraction
A generic payroll-PDF-to-Excel prompt will produce something for a P60. The question is whether what it produces is what the downstream workflow actually needs, and on three operational axes the answer is no often enough that P60 deserves its own treatment.
The first axis is temporal shape. A P60 is an annual snapshot — one document per employee, per employer, per tax year, summarising the whole year. A payslip is a per-pay-period record, with a fresh document each month or week and a YTD column that creeps up across the year. A P45 is a leaver record dated to a single event — the employee's last day with that employer. Each shape implies a different extraction target. The natural output for P60 is one row per employee per tax year; for payslips it's one row per period (or per period per employee); for P45 it's one row per leaving event. Treating all three as "payroll documents" and stitching them into the same shape is how generic prompts produce sheets that need cleanup before they're useful.
The validation surface differs in a way that matters even when the field names look similar. On a P60 the cross-check is annual: does total pay for the year match the FPS submitted at year-end, do the NI category letters cover the correct portions of the year for an employee whose letter changed, do the statutory payment "if any" rows reflect what actually happened during the year. On a payslip the cross-check is per-period — gross-to-net working, statutory deductions running correctly into the YTD figure. On a P45 the cross-check is leaving-event integrity — the leaving date, the YTD-to-leaving figures, the tax code on departure, the continuity into the new employer's starter record. Three different sets of checks, even though all three documents carry similar-looking fields.
Downstream flow is the third axis, and probably the most consequential for tooling decisions. P60 extraction feeds year-end reconciliation against FPS submissions, audit support, prior-year migration when a client moves bookkeepers, and self-assessment workbooks during the personal-tax window. Payslip extraction feeds routine monthly payroll handling, per-period spend analysis, and ongoing reporting cycles — most of which is more usefully covered by the broader payroll PDF to Excel extraction workflow and its sibling country pages, like the operational walkthrough on extracting Irish payslips to spreadsheets. P45 extraction feeds RTI leaver processing at the outgoing employer and starter-record completion at the incoming one — a different operational loop again.
The practical consequence is that P60 needs its own prompt shape and its own validation logic. The right output structure (row per employee per tax year with the year-end fields prioritised), the right cross-checks (annual totals, category-letter coverage, statutory-payment nullability), and the right downstream framing (year-end and prior-year work rather than monthly payroll work) are not what a generic payroll prompt is trying to produce.
The statutory P60 field set, and what each field is for downstream
The reason a single prompt can cover any tax year and any payroll-software provider is that the field set is fixed. HMRC's specification for substitute P60 forms sets out the requirements employers and payroll software providers must follow when designing substitute P60 layouts, defining the field set every P60 layout for tax years 2025 to 2026 and 2026 to 2027 must carry. In practice that means the columns in the spreadsheet you're building can stay constant across providers; the prompt is portable.
What's worth doing is mapping each statutory field to the downstream use it actually feeds, because that mapping is what tells you which columns are load-bearing for your particular workflow and which are reference-only. Vendor pages list the names; the operational picture is what each one is for once it's a column in your sheet.
Pay in this employment and total pay for the year feed self-assessment cross-checks (does the figure on the P60 match what the client declared), lender verification (prior-year income evidence at firm scale), and prior-year reconciliation when onboarding a client whose previous bookkeeper has handed over a bundle of historical P60s. Where an employee held multiple employments in the year, the "in this employment" figure is what each P60 stands for individually; "total pay for the year" reconciles across all of them.
Tax deducted is the column that reconciles to FPS year-end totals. Discrepancies between the P60 figure and the FPS submission are the discrepancies a payroll bureau wants to surface before HMRC raises them, and the same column drops directly into a self-assessment workbook as the year's tax-paid line.
Employee NI contributions broken down by category letter is where the P60 carries more structure than people sometimes notice. An employee whose category letter changed mid-year — for example moving from category A to category C on reaching state pension age, or moving onto a different category on becoming a deferred contributor — will show contributions split across two letter rows on the same P60. Multi-employment NI reconciliation depends on the breakdown, and category-letter audits across a client book are how a bureau spots payroll software miscoding.
Employer NI contributions per category support employer-cost reconciliation: the column that lets a bureau or in-house payroll function tie out total employer NI for the year against the books, and that lets an accountant pull true cost-of-employment figures into a year-end summary.
Statutory payments — Statutory Maternity Pay (SMP), Statutory Paternity Pay (SPP), Shared Parental Pay (ShPP), Statutory Adoption Pay (SAP), and Statutory Parental Bereavement Pay (SPBP) — feed two checks. For the employer, the totals tie back to the recoverable amounts claimed against PAYE during the year. For the employee record, the figures evidence the leave periods that occurred. The form prints "if any" rows for these fields, which matters for the prompt section ahead — null and zero mean different things here.
Student loan deductions and postgraduate loan deductions with plan code reconcile to the employee's known loan status. A plan-code mismatch — for example deductions taken under Plan 2 when the borrower is on Plan 1 — is either an extraction error or a payroll-coding error worth flagging back to the employer. Either way the column needs to exist in the sheet.
Employee NINO is the identity column that distinguishes employees with similar names, especially in larger client books, and the cross-reference back to the payroll master record. It's also the column the lender or verification workflow keys on.
Employer PAYE reference, in the NNN/AAAAAAAA form, is the HMRC submission cross-reference. For an accountant carrying multiple clients, it's the column that anchors a P60 to the correct employer entity in the books. Provider layouts sometimes print it with spacing variations or without the slash; the prompt section covers normalising it back to the canonical shape.
Tax year anchors the row to a filing period. For self-assessment work it's the column that matches a P60 to the right tax year on the personal-tax return. For prior-year migration it's the dimension the client's payroll history is sliced on.
The NI category letters that appear in the breakdown sit within a defined set: A, B, C, F, H, I, J, L, M, S, V, X, Z. Knowing the set matters because it lets you treat the category-letter column as a constrained categorical field rather than free text — anything outside the set is an extraction or coding signal, not a legitimate value.
Readers who want depth on the document side rather than the extraction side — what each box looks like on the printed form, what each entry means for the employee — will find that covered in our field-by-field guide to reading a UK P60. The mapping above is the operator's view: each field as a column, each column with a downstream job.
Four UK B2B workflows P60 extraction plugs into
The vendor-page line about "accountants, HR, and lenders" is not wrong, it's just abstract. Below are four specific workflows that account for most P60-extraction work inside UK practice and bureau settings. Each one has its own rhythm in the year, its own batch shape, and its own output structure — and that's what the prompt has to be tuned for, not a generic "extract P60 data" instruction.
The accountant pulling clients' P60s into a self-assessment workbook in the March-to-April window. This is the busiest period for the workflow. A practice serving individual clients receives P60s alongside other year-end documentation, often for clients with multiple employments and sometimes mid-year employer changes. The workbook needs total pay for the year and tax deducted as the headline figures, with NINO and employer PAYE reference for cross-reference and audit trail. Output structure: one row per employee per employer per tax year, so a client with two concurrent employments produces two rows for the year. The self-assessment-relevant fields anchor the left of the sheet, with the rest of the statutory fields preserved for any follow-on questions HMRC raises. A practice running this workflow typically processes a few hundred P60s through the window, which is bulk P60 processing for accountants in the operational sense — repeatable, time-pressured, and unforgiving of cleanup.
The bookkeeper onboarding a new client and ingesting prior-year P60s into the historical record. This is small-batch but high-stakes work: the goal is to reconstruct the client's payroll history before opening the books, so that future reconciliations have something to reference back to. A typical onboarding might involve P60s covering several recent tax years for each employee on the client's payroll. Output structure: one row per employee per tax year, with all statutory fields preserved — student loan plan codes, statutory payment fields, NI category letters, the lot — because what looks like reference detail today becomes the source data for a question two years out. P60 extraction for client onboarding is one of the cleanest cases for using the same prompt across years, since the field set has been stable and provider variation is the only real source of layout drift.
The payroll bureau reconciling issued P60s against FPS year-end submissions. A bureau that has been issuing P60s on behalf of employer clients knows that discrepancies between the P60s and the underlying FPS year-end data are the discrepancies HMRC will raise if they aren't caught first. The reconciliation runs employer by employer, all employees in scope. Output structure: one row per employee per tax year, with the reconciliation columns — total pay, tax deducted, employer NI per category — prioritised so the variance check runs cleanly in Excel against the bureau's own FPS extract. This is P60 to spreadsheet for reconciliation in its most operational form: the spreadsheet exists to be diffed against another spreadsheet, and the column alignment is what makes the diff meaningful. This is also the workflow where a P60 data extraction tool earns its keep most visibly — the prompt names the columns, the batch processes the bureau's full employer-client scope, and the rows come out aligned to the FPS extract on the other side of the variance check.
Lender or employment verification at firm scale. Mortgage providers, larger landlords, employment-screening firms, and immigration-support practices all carry workflows that involve confirming prior-year income evidence across a book of borrowers, tenants, candidates, or applicants. The extraction here serves a checking workflow rather than an accounting one — the operator is verifying figures against an applicant-supplied total, not opening the data into ongoing books. Output structure: one row per individual per tax year, with name, NINO, employer PAYE reference, and total pay for the year as the verification fields. The other statutory fields stay in the output as reference but aren't the columns the verification check keys on.
P60-specific prompt patterns for clean, reusable extraction
The prompt is the configuration. There is no template UI to set up, no field-mapping wizard, no per-provider profile to build — the operator names the columns the workflow needs and the extractor produces them. That is the path that converts a P60 PDF to Excel directly, without an intermediate CSV-cleanup step. Below is the shape that produces a clean sheet across providers and across tax years.
The spine of the prompt is a structured field-set: the columns the spreadsheet should carry, named directly, in the order the workflow needs them. For a general-purpose P60 extraction the column list maps onto the statutory fields covered above — pay in this employment, total pay for the year, tax deducted, employee NI by category letter, employer NI per category, statutory payments by type, student and postgraduate loan deductions with plan code, NINO, employer PAYE reference, and tax year. Naming each one in the prompt makes it the literal column header in the output, which is what keeps the sheet directly usable in the next workbook rather than needing renaming after the fact.
For batch and multi-client work, the next directive in the prompt is the row rule: one row per employee per tax year, repeating the employer fields where multiple employees belong to the same employer. That single rule resolves a multi-employee P60 file into a single sheet, and a multi-client batch into a single sheet with the employer columns telling you which client each row belongs to. The same prompt covers a multi-client run because the field set is fixed across providers — there is no "rebuild the prompt for this client's payroll software" step.
A handful of P60-specific normalisations are worth writing into the prompt explicitly:
- NI category letter as a categorical column constrained to the standard set: A, B, C, F, H, I, J, L, M, S, V, X, Z. The instruction is that any value extracted outside this set should surface as-is rather than be silently coerced, so out-of-set values become a data-quality flag rather than vanishing into a default.
- Employer PAYE reference normalised to the
NNN/AAAAAAAAshape regardless of how the form printed it (some substitute layouts use spacing variants or omit the slash). Cross-references against HMRC submissions are direct when the column is canonical, and noisy when it isn't. - NINO in its standard format — two letters, six digits, suffix letter. Where a substitute layout shows a masked NINO (partial value redacted), the prompt should preserve the masked form as it appears rather than guess at the missing digits. Masked NINOs are a real artefact of some payroll software output and need to round-trip honestly.
- Statutory payment fields treated as nullable rather than zero-by-default. The P60's "if any" rows distinguish "did not apply this year" from "applied and was zero", and that distinction matters downstream — particularly for employer recovery reconciliation, where null and zero get treated differently. The prompt should leave these fields blank when the source row is blank, not coerce to zero.
For teams running this extraction inside their own systems rather than through the web app — a payroll bureau pushing year-end P60s into its reconciliation pipeline, an onboarding workflow that ingests historical P60s as part of new-client setup, a verification platform that checks income evidence as part of an automated decision — the prompt-and-batch shape is the same; only the integration surface changes. That path is covered in the API for multi-document financial extraction.
Post-extraction validation checklist for P60 data
Even at high extraction accuracy, the operator owes the downstream workflow a sanity pass. That's not a comment on any specific tool — it's what mature data work looks like. A reconciliation, a self-assessment, an onboarding ingest, or a verification check is only as trustworthy as the inputs the operator signed off, and a focused checklist takes minutes against a structured sheet that would take hours against a stack of PDFs.
The checks below are P60-specific and run column by column in Excel. None of them require auditing every field on every row; they're shape checks and outlier checks designed to surface the rows worth eyeballing.
- NINO format. The standard NINO shape is two letters, six digits, one suffix letter (A, B, C, or D). A regular-expression or formula check on the NINO column flags any row whose value doesn't match. Prefix letters that are never issued (D, F, I, Q, U, V at the start, plus O as the second character) catch additional misreads where the shape is right but the issuance rules say the value can't exist.
- Employer PAYE reference shape. The reference should match
NNN/AAAAAAAA— three digits, slash, up to ten alphanumeric characters. Rows where this column is empty, mistyped, or carries spacing variants the prompt didn't normalise away are the rows to inspect against the source P60. - NI category letter membership. The column should contain only values from the standard set (A, B, C, F, H, I, J, L, M, S, V, X, Z). Anything outside the set is either a category-letter coding error in the source payroll or a misread from extraction; both are worth flagging.
- Total pay × tax-code plausibility. A coarse proportionality check that tax deducted is roughly consistent with total pay for the year given the tax code on file catches outliers — rows where the figures don't sit within a reasonable band for that code. This is not an exact recalculation, just a screen for rows that warrant a closer look against payroll detail.
- Statutory payments null versus zero. Confirm the "if any" rows on the P60 (SMP, SPP, ShPP, SAP, SPBP) are preserved as blank where the form was blank, and as zero only where the form actually printed zero. Downstream reconciliations against employer recovery claims depend on the distinction — a blank that gets coerced to zero produces phantom recovery rows in the reconciliation.
- Student loan plan code. Where a row carries a student-loan deduction, the plan code should match the borrower's known plan. A Plan 2 deduction on a Plan 1 borrower (or a postgraduate-loan deduction where none is expected) is a flag worth raising — either an extraction error or, more usefully, a payroll-coding error to take back to the employer.
What makes this checklist tractable is that the underlying data is already structured. Each row carries a reference back to its source file and page, so any flagged row is a click away from the original P60 — which is the difference between validation that finishes inside a morning and validation that drowns in PDF-hunting. Manual transcription would never sustain column-level checks like these at firm scale, and that's the operational point: extraction is what makes the validation pass possible, not just faster.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
P60 Explained: How to Read Your UK Annual Tax Certificate
Plain-English P60 guide for UK employees: what the form shows, how to read key fields, taxable pay meaning, and what to do if it is missing.
Tofes 106 Explained: How to Read Israel's Annual Tax Certificate
English guide to Tofes 106, with key fields, tax codes, and steps to reconcile Israel's annual salary certificate against monthly tlushim.
UK Payslip Explained: Every Field, Deduction, and Code
Complete guide to UK payslips: PAYE tax codes, National Insurance, student loans, pension auto-enrolment, statutory payments, and how to spot common errors.