A UK P60 data extractor reads the statutory fields from end-of-year P60 certificates — pay in this employment, total pay for the year, tax deducted, employee National Insurance contributions by category letter, statutory payments, student and postgraduate loan deductions, the employee's NINO, and the employer's PAYE reference in NNN/AAAAAAAA form — and writes them into spreadsheet rows, one row per employee per tax year. Output is a working Excel, CSV, or JSON file rather than an OCR text dump, which is the difference between something a finance team can use and something they then have to retype.
Because every P60 follows the HMRC-defined field set across tax years and across payroll software providers — including the substitute layouts each provider prints — a single extraction prompt covers any year and any provider. That is what turns P60 extraction into a stable workflow rather than a per-document handcraft. The same prompt that handles last year's P60s from Sage will handle this year's P60s from Xero, BrightPay, or Iris.
Three downstream jobs drive most P60 extraction work: reconciling P60s against FPS year-end submissions, onboarding clients with prior-year payroll history, and preparing self-assessment workbooks at year-end. The general criteria for any payroll-document extractor — accuracy across providers' substitute layouts, batch handling, usable output structure — are covered in our guide on what to look for in payroll OCR software. The rest of this article is what makes P60 extraction its own workflow on top of those general criteria.
How P60 extraction differs from payslip and P45 extraction
A P60 is an annual snapshot — one document per employee per employer per tax year — so the natural output is one row per employee per tax year. Payslips are per-period records (one row per period); P45s are leaving-event records (one row per leaving event). Treating all three as undifferentiated "payroll documents" is how generic prompts produce sheets that need cleanup before they're useful.
The practical consequence is that P60 needs its own prompt shape and its own validation logic: annual cross-checks against FPS year-end submissions, category-letter coverage, statutory-payment nullability. Routine monthly payroll work belongs to the broader payroll PDF to Excel extraction workflow and its sibling country pages, like the operational walkthrough on extracting Irish payslips to spreadsheets.
The statutory P60 field set, and what each field is for downstream
The reason a single prompt can cover any tax year and any payroll-software provider is that the field set is fixed. HMRC's specification for substitute P60 forms defines the field set every P60 layout for tax years 2025 to 2026 and 2026 to 2027 must carry. Each statutory field maps to a downstream column job:
- Pay in this employment and total pay for the year — self-assessment cross-checks, lender verification, prior-year reconciliation. The "in this employment" figure stands per P60; "total pay for the year" reconciles across multiple employments.
- Tax deducted — reconciles to FPS year-end totals; drops directly into a self-assessment workbook as the tax-paid line.
- Employee NI contributions by category letter — multi-employment NI reconciliation. An employee whose letter changed mid-year (e.g. A to C on reaching state pension age) shows split rows on the same P60; the breakdown is what makes the reconciliation work.
- Employer NI contributions per category — employer-cost reconciliation and true cost-of-employment figures.
- Statutory payments (SMP, SPP, ShPP, SAP, SPBP) — tie back to recoverable amounts claimed against PAYE and evidence the leave periods that occurred. The form's "if any" rows distinguish "did not apply" from "applied and was zero", which matters for the prompt section ahead.
- Student loan deductions and postgraduate loan deductions with plan code — reconcile to known loan status. A plan-code mismatch flags either an extraction error or a payroll-coding error.
- Employee NINO — identity column and verification key.
- Employer PAYE reference in
NNN/AAAAAAAAform — HMRC submission cross-reference; anchors each P60 to the correct employer entity. Provider layouts sometimes print spacing variants or omit the slash; the prompt normalises back to canonical shape. - Tax year — anchors the row to a filing period.
The NI category letters sit within a defined set: A, B, C, F, H, I, J, L, M, S, V, X, Z. Treat the column as a constrained categorical field — anything outside the set is an extraction or coding signal.
Depth on the document side — what each box looks like on the printed form, what each entry means for the employee — is covered in our field-by-field guide to reading a UK P60.
Four UK B2B workflows P60 extraction plugs into
Each of the four workflows below has its own rhythm in the year, its own batch shape, and its own output emphasis. The base output rule — one row per employee per tax year — holds across all four.
Self-assessment workbook (March-to-April window). A practice serving individual clients receives P60s alongside other year-end documentation, often for clients with multiple employments. Headline columns: total pay for the year, tax deducted, NINO, employer PAYE reference. A client with two concurrent employments produces two rows for the year. This is the highest-volume window and is where the P60 data extraction tool earns its keep against manual transcription.
Client onboarding (prior-year P60 ingest). A bookkeeper rebuilding a new client's payroll history. Small-batch but high-stakes: all statutory fields preserved (student loan plan codes, statutory payment fields, NI category letters), because what looks like reference detail today becomes source data for a question two years out.
Bureau reconciliation against FPS year-end submissions. Runs employer by employer, all employees in scope. The reconciliation columns — total pay, tax deducted, employer NI per category — sit on the left so the variance check runs cleanly against the bureau's own FPS extract. The spreadsheet exists to be diffed against another spreadsheet, and column alignment is what makes the diff meaningful.
Lender or employment verification at scale. Mortgage providers, larger landlords, employment-screening firms, and immigration-support practices confirming prior-year income across a book of borrowers, tenants, candidates, or applicants. Verification fields: name, NINO, employer PAYE reference, total pay for the year. Other statutory fields stay in the output as reference.
P60-specific prompt patterns for clean, reusable extraction
The prompt is the configuration. There is no template UI to set up, no field-mapping wizard, no per-provider profile to build — the operator names the columns the workflow needs and the extractor produces them. That is the path that converts a P60 PDF to Excel directly, without an intermediate CSV-cleanup step. Below is the shape that produces a clean sheet across providers and across tax years.
The spine of the prompt is a structured field-set: the columns the spreadsheet should carry, named directly, in the order the workflow needs them. For a general-purpose P60 extraction the column list maps onto the statutory fields covered above — pay in this employment, total pay for the year, tax deducted, employee NI by category letter, employer NI per category, statutory payments by type, student and postgraduate loan deductions with plan code, NINO, employer PAYE reference, and tax year. Naming each one in the prompt makes it the literal column header in the output, which is what keeps the sheet directly usable in the next workbook rather than needing renaming after the fact.
For batch and multi-client work, the next directive in the prompt is the row rule: one row per employee per tax year, repeating the employer fields where multiple employees belong to the same employer. That single rule resolves a multi-employee P60 file into a single sheet, and a multi-client batch into a single sheet with the employer columns telling you which client each row belongs to. The same prompt covers a multi-client run because the field set is fixed across providers — there is no "rebuild the prompt for this client's payroll software" step.
A handful of P60-specific normalisations are worth writing into the prompt explicitly:
- NI category letter as a categorical column constrained to the standard set: A, B, C, F, H, I, J, L, M, S, V, X, Z. The instruction is that any value extracted outside this set should surface as-is rather than be silently coerced, so out-of-set values become a data-quality flag rather than vanishing into a default.
- Employer PAYE reference normalised to the
NNN/AAAAAAAAshape regardless of how the form printed it (some substitute layouts use spacing variants or omit the slash). Cross-references against HMRC submissions are direct when the column is canonical, and noisy when it isn't. - NINO in its standard format — two letters, six digits, suffix letter. Where a substitute layout shows a masked NINO (partial value redacted), the prompt should preserve the masked form as it appears rather than guess at the missing digits. Masked NINOs are a real artefact of some payroll software output and need to round-trip honestly.
- Statutory payment fields treated as nullable rather than zero-by-default. The P60's "if any" rows distinguish "did not apply this year" from "applied and was zero", and that distinction matters downstream — particularly for employer recovery reconciliation, where null and zero get treated differently. The prompt should leave these fields blank when the source row is blank, not coerce to zero.
For teams running this extraction inside their own systems rather than through the web app — a payroll bureau pushing year-end P60s into its reconciliation pipeline, an onboarding workflow that ingests historical P60s as part of new-client setup, a verification platform that checks income evidence as part of an automated decision — the prompt-and-batch shape is the same; only the integration surface changes. That path is covered in the API for multi-document financial extraction.
Post-extraction validation checklist for P60 data
Even at high extraction accuracy, the operator owes the downstream workflow a sanity pass. A reconciliation, a self-assessment, an onboarding ingest, or a verification check is only as trustworthy as the inputs the operator signed off, and a focused checklist takes minutes against a structured sheet that would take hours against a stack of PDFs.
The checks below are P60-specific and run column by column in Excel. None of them require auditing every field on every row; they're shape checks and outlier checks designed to surface the rows worth eyeballing.
- NINO format. The standard NINO shape is two letters, six digits, one suffix letter (A, B, C, or D). A regular-expression or formula check on the NINO column flags any row whose value doesn't match. Prefix letters that are never issued (D, F, I, Q, U, V at the start, plus O as the second character) catch additional misreads where the shape is right but the issuance rules say the value can't exist.
- Employer PAYE reference shape. The reference should match
NNN/AAAAAAAA— three digits, slash, up to ten alphanumeric characters. Rows where this column is empty, mistyped, or carries spacing variants the prompt didn't normalise away are the rows to inspect against the source P60. - NI category letter membership. The column should contain only values from the standard set (A, B, C, F, H, I, J, L, M, S, V, X, Z). Anything outside the set is either a category-letter coding error in the source payroll or a misread from extraction; both are worth flagging.
- Total pay × tax-code plausibility. A coarse proportionality check that tax deducted is roughly consistent with total pay for the year given the tax code on file catches outliers — rows where the figures don't sit within a reasonable band for that code. This is not an exact recalculation, just a screen for rows that warrant a closer look against payroll detail.
- Statutory payments null versus zero. Confirm the "if any" rows on the P60 (SMP, SPP, ShPP, SAP, SPBP) are preserved as blank where the form was blank, and as zero only where the form actually printed zero. Downstream reconciliations against employer recovery claims depend on the distinction — a blank that gets coerced to zero produces phantom recovery rows in the reconciliation.
- Student loan plan code. Where a row carries a student-loan deduction, the plan code should match the borrower's known plan. A Plan 2 deduction on a Plan 1 borrower (or a postgraduate-loan deduction where none is expected) is a flag worth raising — either an extraction error or, more usefully, a payroll-coding error to take back to the employer.
What makes this checklist tractable is that the underlying data is already structured. Each row carries a reference back to its source file and page, so any flagged row is a click away from the original P60 — which is the difference between validation that finishes inside a morning and validation that drowns in PDF-hunting. Manual transcription would never sustain column-level checks like these at firm scale, and that's the operational point: extraction is what makes the validation pass possible, not just faster.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
P60 Explained: How to Read Your UK Annual Tax Certificate
Plain-English UK P60 guide: how to read the form, what taxable pay actually means, P60 vs payslip and P45, and what to do if yours is missing.
Tofes 106 Explained: How to Read Israel's Annual Tax Certificate
English guide to Tofes 106, with key fields, tax codes, and steps to reconcile Israel's annual salary certificate against monthly tlushim.
UK Payslip Explained: Every Field, Deduction, and Code
Complete guide to UK payslips: PAYE tax codes, National Insurance, student loans, pension auto-enrolment, statutory payments, and how to spot common errors.