SAP Ariba Receipt Scanning & Invoice OCR Capture Features

SAP Ariba Invoicing and SAP Ariba Central Invoice Management capture supplier invoices from PDFs, JPEG/PNG/TIFF images, inbound email, central uploads, and APIs through built-in Document Information Extraction. Invoices arriving via Peppol or SAP Business Network skip OCR by design, because the document reaches Ariba already as machine-readable structured data rather than as a scanned page that needs to be read.

That answers the practical capture question, but the phrase "receipt scanning" carries three meanings inside the SAP estate — only one of which Ariba is built for.

What "Receipt Scanning" Actually Means in SAP Ariba

Buyer-side supplier invoice capture. This is the one most readers actually want. A supplier sends an invoice as a PDF attached to an email, or the AP team uploads a batch of supplier PDFs and images, and Ariba reads the documents through Document Information Extraction. Header fields and line items are pulled from the file, the original document is attached to a draft supplier invoice, and AP staff review and post from the draft. The rest of this article addresses this workflow.

Employee expense receipts. A taxi, a hotel, a meal, a one-off office-supplies purchase. These belong to SAP Concur, not Ariba. Concur's mobile app captures receipts through the camera, ExpenseIt extracts data from the photographed receipt, and email forwarding (typically to [email protected] from a verified address) lets travellers throw receipts into their expense profile from any inbox. If the question driving the search is really about employee receipts, the right home is Concur Invoice Capture Processing for employee expense and receipt workflows rather than anything in Ariba.

Goods-receipt confirmation against purchase orders. Procurement language uses "receipt" to mean the PO receipt — the confirmation that the goods or services on a purchase order have actually been received, against which an invoice can later be matched. This is its own Ariba feature inside the procurement workflow, and there is no document being scanned. No OCR runs. Calling it "receipt scanning" is a vocabulary collision rather than a feature gap.

The reason this disambiguation matters in practice is that the three problems sit in different SAP products with different licensing, different admins, and different vendor stories. A team trying to solve employee receipt capture inside Ariba will hit a wall, because the platform is not built for it. A team looking for goods-receipt OCR is asking for something that does not exist as a feature, because goods receipts are workflow events rather than documents. Only the first job — supplier invoices arriving as PDFs, images, or email — maps onto Ariba's Document Information Extraction pipeline.

That also answers the "SAP Ariba receipt scanning app" search variant directly. There is no consumer-style mobile receipt scanning app inside Ariba. The phone-app workflow people associate with the phrase — open the app, photograph a receipt, classify it, expense it — lives in Concur Mobile and ExpenseIt. Ariba's invoice capture is a buyer-side ingestion pipeline that runs against PDFs and images coming from suppliers, not a scanner in a finance person's pocket.

Channels and File Formats Ariba Captures Supplier Invoices Through

Once the search is narrowed to buyer-side supplier invoice capture, the practical question for most AP teams becomes a scan-check: do my channels and my file formats fit Ariba's capture pipeline, or do I need a layer outside it? It is helpful to fix the vocabulary first — what invoice data capture software does at the AP layer is read header fields and line items from a supplier document and present them for review — and then walk the matrix.

Supported file formats. Ariba's default extraction service accepts PDFs (both native and scanned), JPEG, PNG, and TIFF. There is one structured-XML special case for the German market, but the practitioner answer is "PDF and the three image formats." A supplier sending a PDF, or an AP team scanning a paper invoice into a TIFF, sits inside the supported set. A supplier sending a Word document, an HTML email body without an attachment, or a structured EDIFACT file does not — those need conversion or a different ingestion path.

Inbound channels. Three channels feed the same Document Information Extraction service:

Upload Supplier Invoices Centrally. A web screen inside Ariba where AP staff or designated users upload single files or batches. The screen accepts the formats listed above and creates a draft invoice for each file.
Supplier Invoices API. The programmatic equivalent. The same files post through an API for systems that need to push invoices into Ariba from another tool — a shared mailbox automation, a document management system, an extraction pipeline upstream.
Inbound supplier email. An email channel that suppliers can send invoice attachments to. Email-borne PDFs and images flow into the same extraction service and produce the same kind of draft invoice as a centrally uploaded file.

The point worth holding on to is that the channel does not change the extraction. Whether a PDF arrives by email, by upload, or by API, it lands in Document Information Extraction and produces a draft supplier invoice. PDF invoice capture and email invoice capture are not separate features — both channels feed the same engine.

The asynchronous processing model. Extraction does not happen in front of the uploader. The file is attached to a draft supplier invoice and the user moves on; the content extraction service processes the document in the background. Once extraction completes, the draft becomes editable with header and line-item fields populated where confidence was high enough. AP staff then review the draft, fill in any blanks, and post the invoice. The implication for workflow design is that timing of capture does not equal timing of postable data — invoices land as drafts first, then become workable.

Channel	Accepted formats	What lands in Ariba
Upload Supplier Invoices Centrally	PDF, JPEG, PNG, TIFF (plus Germany-only XML)	File attached to a draft supplier invoice; extraction runs asynchronously
Supplier Invoices API	PDF, JPEG, PNG, TIFF (plus Germany-only XML)	Same — programmatic equivalent of the upload screen
Inbound supplier email	PDF, JPEG, PNG, TIFF attachments	Same — the email's attachments become draft invoices
SAP Business Network or Peppol	Structured invoice payload (cXML, EN 16931–aligned)	Goes straight to a populated draft invoice; OCR is skipped — see the structured-invoice section

Across channels the engine is Document Information Extraction — partner content may call it multi-AI OCR or the content extraction service, but in current Ariba Invoicing it is the same service whatever the channel.

What Document Information Extraction Does, and Where Its Reliability Stops

SAP's marketing language — embedded multi-AI OCR, Joule-supported, machine-learning-driven — does not answer the question that actually matters when an AP team is sizing this up. The useful answer is which fields, at what confidence, with what fallback.

The default field set is a header-and-lines subset, not the whole invoice. The default extraction service reads a restricted set of header fields and line-item fields rather than every conceivable data point on a supplier invoice. Header fields — supplier identifier, invoice number, date, currency, net/tax/gross totals — sit in the default coverage. Line-item data — description, quantity, unit price, line total — sits there too. What does not always sit there is the long tail: project codes, GL hints, custom segmentation fields, supplier-specific reference numbers, anything an internal AP form has added over the years. Teams whose review workflow leans on those fields will end up keying or deriving them, and that is the gap to know about up front.

The extraction stack is orchestrated, not single-engine. Inside next-generation SAP Ariba Invoicing, Document Information Extraction is a content extraction service that selects between multiple LLMs — Claude Sonnet, Gemini Flash, and SAP-hosted GPT — through the Manage Document Information Extraction Templates app, with Joule-supported self-learning that adapts field mappings as it sees more invoices from a given supplier. What sits behind those labels is not a fixed engine reading characters off a page — it is a model-orchestration layer choosing how a particular document gets read. From an AP team's perspective, this matters because it means accuracy on a difficult supplier layout can improve over time without any internal effort, and because the templates app is where an admin can intervene if a supplier's invoices are systematically being mis-read.

The reliability boundary worth knowing. The feature that distinguishes Ariba's extraction from cheaper OCR pipelines, and that almost no partner content surfaces, is what happens to low-confidence values. They are not written through to the draft invoice. The field is left blank for AP staff to fill in during review, rather than populated with an uncertain value and a confidence flag. The practical consequence is that the AP review queue contains more visible blanks and fewer silently-wrong values. For a controller who has chased down a wrong invoice number that looked confident on screen, this is a useful posture; for a team optimising for raw automation rate, it means the headline "X% of fields auto-filled" number lands lower than the underlying extraction quality might suggest.

The escape hatch when defaults are not enough. Custom OCR or information-extraction services can be integrated alongside the default service for teams whose field set is wider than the default coverage, or whose accuracy bar on specific suppliers is higher than the default service hits. This is the slot partner integrators sell into — bolt-on extraction tuned for a particular industry, supplier population, or country-specific tax layout, plugged in through the same templates app. A team with a small set of stubborn supplier formats often gets further with a targeted custom service than with general tuning of the default.

Which Ariba Product Variant You're On

A surprising amount of confusion in Ariba feature questions comes down to product identity. "We have Ariba" can mean any of three things, and the OCR posture is different for each. Before applying the channel matrix and reliability boundary above to your own environment, it is worth checking which variant the company is actually running.

Ariba Buying & Invoicing (legacy). The longer-running buyer-side product, predating the current content extraction service. Many legacy tenants rely on partner-built or bolt-on OCR — a third-party vendor reads the PDF and feeds Ariba — rather than embedded extraction. The multi-LLM specifics in this article will not apply on these tenants; channel and format expectations may.

Next-generation SAP Ariba Invoicing. The current product, and the one this article describes. Embedded multi-AI OCR through Document Information Extraction runs by default on supported formats, the Manage Document Information Extraction Templates app is visible to admins, and asynchronous draft invoices are the standard ingestion model.

SAP Ariba Central Invoice Management (CIM). The inbound processing module — file uploads, draft creation, extraction orchestration. A tenant running next-generation Ariba Invoicing typically has CIM as its inbound layer; references to SAP Ariba Central Invoice Management OCR in SAP documentation point at this same module.

A fast self-check. Three signals usually settle which variant is in play without a call to a SAP rep:

Is the Manage Document Information Extraction Templates app visible to administrators? If yes, the tenant is running the current content extraction service.
Is the Upload Supplier Invoices Centrally screen available to AP users? If yes, CIM is the inbound module.
How does the Supplier Invoices API surface in tenant documentation? Modern tenants expose the API as a current-generation endpoint; legacy tenants either do not, or expose only older invoice integration endpoints.

A tenant that answers yes to all three is on next-generation Ariba Invoicing with CIM, and the channel matrix and reliability boundary above apply directly. A tenant that answers no to all three is on legacy Ariba Buying & Invoicing or an older variant, and the OCR posture is whatever the partner solution wraps around it.

One adjacent point: SAP Business One is a separate SAP product with its own capture path — readers running SAP B1 should consult SAP Business One's separate Document Information Extraction setup instead, because the Ariba feature map does not transfer.

Why Peppol and SAP Business Network Invoices Skip OCR

Not every invoice arriving in Ariba flows through Document Information Extraction, and the reason matters. When an invoice reaches Ariba via Peppol or SAP Business Network, OCR is skipped — header fields and line items are read directly from the structured payload because the document arrives already machine-readable. This is by design, not a feature gap, and it is usually the cleanest outcome for the AP team on the receiving end.

The specification this rests on is well-defined and externally documented. The Peppol BIS Billing 3.0 specification, the format SAP Business Network and Peppol invoices arrive in, is built on EN 16931, the European semantic data model for an electronic invoice, so the document reaches Ariba already as structured invoice data rather than as a PDF or image. The supplier's accounting system serialises the invoice into the EN 16931 model; Peppol's network routes it to the buyer's access point; SAP Business Network or the buyer's Ariba tenant deserialises the structured data straight into the invoice record. At no point is there a page of pixels that needs to be read.

The Business Network side works on the same principle. SAP Business Network exchanges machine-readable invoice payloads — historically cXML between connected trading partners, increasingly EN 16931–aligned formats where Peppol routing is in play — between supplier and buyer. The transport changes, the underlying logic does not: the invoice does not need OCR because it does not arrive as an unstructured document.

For an AP team, the practical consequence is straightforward. Invoices that arrive via Peppol or Business Network land directly as draft supplier invoices with header and line-item data already populated, and there are no draft-blank fields from low-confidence extraction because no extraction was needed. The review queue for a structured invoice looks different from the review queue for a scanned PDF — fewer blanks, fewer ambiguous fields, more confidence in the totals. PDF and image invoices travel the OCR path described in the prior sections; structured invoices travel a different path that ends in the same place, the draft invoice ready for AP review.

This is also why telling a supplier "send your invoices to our Peppol address" is the more durable answer to a capture question than tuning OCR templates for that supplier's PDF. OCR works against an inherently lossy starting point — a rendered document — while structured exchange skips the loss. For a supplier already on Peppol or Business Network, the technically correct integration choice is to use the structured channel and let Ariba handle PDFs only for suppliers who have no structured option.

Beyond Ariba: adjacent capture options

Teams with a one-off batch — a backlog from an acquisition, a vendor-spend analysis, sample validation before committing to a CIM rollout — often find AI-powered invoice data extraction returns a clean Excel, CSV or JSON spreadsheet without an Ariba commitment.

For readers whose evaluation extends past Ariba into the wider SAP estate, the broader SAP invoice scanning landscape covers the cross-platform OCR question — S/4HANA, OpenText VIM, partner-built capture, and the trade-offs between them. For readers comparing across accounting and ERP platforms rather than within SAP, invoice OCR integration with SAP, QuickBooks, and Xero covers the multi-platform comparison.

That answers the practical capture question, but the phrase "receipt scanning" carries three meanings inside the SAP estate — only one of which Ariba is built for.

What "Receipt Scanning" Actually Means in SAP Ariba

Channels and File Formats Ariba Captures Supplier Invoices Through

Inbound channels. Three channels feed the same Document Information Extraction service:

Upload Supplier Invoices Centrally. A web screen inside Ariba where AP staff or designated users upload single files or batches. The screen accepts the formats listed above and creates a draft invoice for each file.
Supplier Invoices API. The programmatic equivalent. The same files post through an API for systems that need to push invoices into Ariba from another tool — a shared mailbox automation, a document management system, an extraction pipeline upstream.
Inbound supplier email. An email channel that suppliers can send invoice attachments to. Email-borne PDFs and images flow into the same extraction service and produce the same kind of draft invoice as a centrally uploaded file.

Channel	Accepted formats	What lands in Ariba
Upload Supplier Invoices Centrally	PDF, JPEG, PNG, TIFF (plus Germany-only XML)	File attached to a draft supplier invoice; extraction runs asynchronously
Supplier Invoices API	PDF, JPEG, PNG, TIFF (plus Germany-only XML)	Same — programmatic equivalent of the upload screen
Inbound supplier email	PDF, JPEG, PNG, TIFF attachments	Same — the email's attachments become draft invoices
SAP Business Network or Peppol	Structured invoice payload (cXML, EN 16931–aligned)	Goes straight to a populated draft invoice; OCR is skipped — see the structured-invoice section

What Document Information Extraction Does, and Where Its Reliability Stops

Which Ariba Product Variant You're On

A fast self-check. Three signals usually settle which variant is in play without a call to a SAP rep:

Is the Manage Document Information Extraction Templates app visible to administrators? If yes, the tenant is running the current content extraction service.
Is the Upload Supplier Invoices Centrally screen available to AP users? If yes, CIM is the inbound module.
How does the Supplier Invoices API surface in tenant documentation? Modern tenants expose the API as a current-generation endpoint; legacy tenants either do not, or expose only older invoice integration endpoints.

SAP Ariba Receipt Scanning & Invoice OCR Capture Features

What "Receipt Scanning" Actually Means in SAP Ariba

Channels and File Formats Ariba Captures Supplier Invoices Through

What Document Information Extraction Does, and Where Its Reliability Stops

Which Ariba Product Variant You're On

Why Peppol and SAP Business Network Invoices Skip OCR

Beyond Ariba: adjacent capture options

Extract invoice data to Excel with natural language prompts

SAP Invoice Management OCR: Five Routes for AP Teams

SAP Business One Invoice OCR: Native Setup vs Simpler Options

Invoice OCR for SAP, QuickBooks & Xero: Integration Guide

SAP Ariba Receipt Scanning & Invoice OCR Capture Features

What "Receipt Scanning" Actually Means in SAP Ariba

Channels and File Formats Ariba Captures Supplier Invoices Through

What Document Information Extraction Does, and Where Its Reliability Stops

Which Ariba Product Variant You're On

Why Peppol and SAP Business Network Invoices Skip OCR

Beyond Ariba: adjacent capture options

Extract invoice data to Excel with natural language prompts

SAP Invoice Management OCR: Five Routes for AP Teams

SAP Business One Invoice OCR: Native Setup vs Simpler Options

Invoice OCR for SAP, QuickBooks & Xero: Integration Guide