You already have the payslips. What you need is the data inside them, arranged so it ties out to SARS. To extract South African payslips to Excel, you upload the payslip PDFs and prompt for the statutory fields as named columns — employee, pay period, gross, PAYE, UIF (employee and employer), SDL, ETI and net — with one row per employee per pay period. The result is a spreadsheet built for reconciliation, not a reformatted payslip.
That structure earns its keep the moment you open your working papers. Correctly typed currency and date columns let you total each month against its EMP201 and validate the year's IRP5/IT3(a) certificates when you run the EMP501. And because every row carries a reference back to its source file and page, you keep an audit trail from any figure in the spreadsheet to the original payslip it came from.
The distinction between extracting payslips and generating them is the whole reason this is harder than it first looks. Search for a way to turn South African payslips into a spreadsheet and most of what comes back is the opposite tool: payslip generators, blank Excel templates, and the V5- and V7-style payslip programs that calculate PAYE, UIF and SDL from the current SARS tax tables and print a payslip. Those serve an employer who needs to produce a payslip. They do nothing for the bookkeeper who already has a stack of them and needs the numbers out and arranged.
That bookkeeper's job is extraction: taking a South African payslip to Excel as structured data, one row per employee per period, ready to add up. It is the same underlying work as our generic payroll PDF to Excel extraction; what changes for South Africa is the specific column set and the SARS reconciliation it has to feed. PAYE, UIF, SDL and ETI sitting in their own correctly typed columns, mapped to the EMP201 and EMP501 cycle, are what make a South-African-specific extraction worth doing rather than a generic export you then have to rebuild by hand.
The statutory columns your payslip spreadsheet needs
The spine of the spreadsheet is a fixed set of named columns, one for each figure you have to account for to SARS:
- Employee — name and employee number, so the year-to-date totals roll up to the right IRP5/IT3(a).
- Pay period — the month or pay date the payslip covers.
- Gross — total earnings before deductions.
- PAYE — employees' tax withheld.
- UIF employee — the 1% withheld from the employee.
- UIF employer — the matching 1% the employer contributes. Keep the two halves apart; you declare the combined contribution, but the split is what lets the spreadsheet tie back to each payslip line.
- SDL — the 1% skills development levy the employer pays on the payroll.
- ETI — the Employment Tax Incentive claimed for qualifying employees, which reduces the PAYE you actually pay over.
- Net — what the employee was paid.
Pull PAYE, UIF, SDL and ETI from payslips to a spreadsheet in these discrete columns and the working papers more or less build themselves. Merge them into a block of payslip text and you are back to re-keying.
The grain matters as much as the columns: one row per employee per pay period. That single rule is what makes the sheet reconcile in two directions. Sum a month's PAYE, UIF, SDL and ETI columns and you have the figures for that month's EMP201. Filter to one employee across the tax year and the same rows roll up to the per-employee totals behind their IRP5/IT3(a). A sheet with one row per employee, or one row per payslip with everything concatenated, can do neither cleanly.
For any of this to total, the fields have to land as separate, correctly typed columns rather than as text lifted off the page. A currency value stored as text will not sum, and a pay period stored inconsistently will not pivot. The extraction has to produce real numeric and date columns from the outset, which is what the later sections deal with.
Normalising payslips from Sage Pastel, SimplePay, PaySpace and Sage VIP
No two South African payroll systems lay a payslip out the same way. Sage Pastel Payroll, SimplePay, PaySpace and Sage VIP / Sage 300 People each put the figures in different places, label them differently, and group earnings and deductions to their own conventions. A bookkeeper in practice rarely sees just one of these. Take on a handful of clients and you are reconciling a Sage Pastel Payroll payslip for one, a SimplePay payslip for the next, and a PaySpace export for a third, every month.
Handled one provider at a time, that variation is where the hours go. The point of working from a spreadsheet is to collapse it. A single extraction across a mixed batch maps each provider's layout onto the same named columns from the previous section, so a Sage Pastel Payroll payslip and a SimplePay payslip land as the same nine columns regardless of where each system printed gross, PAYE or UIF. The SA payroll PDF to Excel step stops being per-provider work and becomes one pass over the whole stack.
This is where the underlying tool matters. Invoice Data Extraction is built to take large, mixed-format batches — up to 6,000 files in a single job — and return consistent structured output across every document, so a folder of payslips from four different payroll systems comes back as one uniform sheet rather than four shapes you then have to align. It reads native PDFs, scans and phone photos alike, which covers the reality that some payslips arrive as clean exports and others as a forwarded image or a scan. That handling goes beyond payslip OCR in a South African context: rather than just turning the image into text, it identifies the fields on each payslip and places them in the right column, so an unusual layout does not quietly push a value into the wrong place.
The pattern is not unique to South Africa. It is the same approach behind the same payslip-to-Excel workflow built for Irish payslips; what is South-African-specific is the statutory column set and the provider formats it has to absorb.
Feeding the spreadsheet into your EMP201, EMP501 and IRP5/IT3(a) reconciliation
The reason to get the spreadsheet right is what sits downstream of it. The SARS employer reconciliation is a three-way tie-out, and the extracted payslip data is the source figure for all three legs.
Each month, the PAYE, UIF, SDL and ETI columns summed for that period give you the EMP201 you declare, which in turn drives the payment you make over to SARS, once ETI has reduced the PAYE portion. Twice a year those monthly returns have to reconcile against the employee certificates. The interim EMP501 covers the six months to the end of August and is filed around September and October; the annual EMP501 runs for the full tax year, 1 March to the end of February, and is filed between 1 April and 31 May. The spreadsheet's year-to-date per-employee totals are exactly what populate those working papers, which is the point of EMP501 reconciliation from payslips rather than from memory or a manual tally.
The third leg is the certificates themselves. Each employee's IRP5/IT3(a) has to agree with the EMP201s submitted and the payments made. This is where the one-row-per-employee-per-period grain does its work: filter the sheet to one employee and the rows sum straight to the annual figures the certificate should carry. According to SARS's employer reconciliation process, the values of tax on the IRP5/IT3(a) certificates, the total value of the EMP201 returns, and the actual payments made to SARS must all balance. A spreadsheet whose columns total cleanly is what lets you prove that balance instead of chasing a few rand across three documents.
The reconciliation and the certificates are submitted to SARS through e@syFile, so the extracted figures are ultimately what you are reconciling before that submission. The job has a clear parallel on the VAT side, where the same working-paper discipline lets you prepare a VAT201 working file from supplier invoices; the payroll version simply starts from payslips and ends at the EMP501.
Keeping a SARS audit trail with clean, correctly typed columns
A reconciliation is only as defensible as your ability to trace a figure back to where it came from. When SARS or a reviewer queries a number, you need to point from the cell in your working papers to the payslip it was taken off. That is why every row in the output carries a reference to its source file and page: any total in the EMP501 working file traces back through the spreadsheet to the exact payslip and page behind it. No payslip generator or blank template can offer that, because they were never holding the source documents in the first place.
Defensibility is one half; the columns also have to add up without being rebuilt. The figures come out as 2-decimal currency, dates come out standardised, and the numeric and date fields are written as native Excel types rather than text. So your SUM across a month's PAYE, your pivot of UIF by employee, and the formulas that tie the sheet to each EMP201 work directly against the columns, with no re-typing step before the data is usable.
Extraction notes close the loop on judgement calls. Where a field was ambiguous, the notes record how it was read, so a reviewer can see the assumption behind a value instead of guessing at it. Taken together, the source reference, the correct typing and the notes are what separate a reconciliation-grade spreadsheet from a rough export that still needs an afternoon of cleaning before it will total against anything.
Setting up the extraction: prompting for South African payslip data
There are no templates to configure and no rules engine to set up. You upload the payslips and describe what you need in a single prompt. The most reliable prompts state the goal first, then the columns and the rules, because telling the extraction what the data is for helps it handle the odd layout sensibly. For a pay run that has to feed the EMP201/EMP501 cycle, that looks like:
I'm preparing payroll data for our EMP201 and EMP501 working papers. Extract one row per employee per pay period with these columns: Employee Name, Employee Number, Pay Period, Gross, PAYE, UIF Employee, UIF Employer, SDL, ETI, Net. Format all currency to 2 decimal places and standardise the pay period as a date.
That is the whole configuration. Save a prompt like it to your prompt library and every subsequent month is a matter of uploading the new payslips and applying the saved prompt, so the columns and formatting come out identical across pay periods and across clients.
Once the prompt is set, running a batch is the next step: drop in the month's payslips, or a whole tax year's, and convert South African payslips to Excel automatically into the reconciliation-grade sheet the earlier sections described. The same approach extends across the other South African documents that land on a bookkeeper's desk, including when you need to extract South African municipal bills to Excel for cost reporting.
Extract invoice data to Excel with natural language prompts
Upload your invoices, describe what you need in plain language, and download clean, structured spreadsheets. No templates, no complex configuration.
Related Articles
Explore adjacent guides and reference articles on this topic.
How to Extract Irish Payslip Data to Excel
Extract Irish payslip data to Excel with PAYE, USC, PRSI, pension and CWPS as separate columns — and source references so reviewers can trace each figure.
Timesheet Data Extractor: PDFs and Scans to Excel
Extract timesheet data from PDFs, scans, and handwritten time cards to Excel. Field maps for payroll prep, contractor billing, and project costing.
Hong Kong MPF Statement Extraction to Excel
Convert Hong Kong MPF statements, pay-records, and ABS PDFs to Excel for payroll reconciliation, eMPF cleanup, and audit support.