Retrieve Invoice Data
Extract structured invoice data from PDF/XML files. This is the reverse of /v1/generate - enabling roundtrip workflows.
Endpoint
POST https://api.thelawin.dev/v1/retrievePurpose
Use this endpoint to:
- Extract invoice data from ZUGFeRD/Factur-X PDFs as JSON
- Parse standalone XML (CII, UBL, Peppol, FatturaPA)
- Validate incoming invoices against EN 16931 standards
- Enable roundtrip workflows: generate → retrieve → modify → regenerate
Roundtrip Capability
The returned JSON uses the same format as the /v1/generate input. You can modify the extracted data and generate a new invoice in a different format!
Supported Input Formats
| Format | Input Type | XML Type | Standards |
|---|---|---|---|
| ZUGFeRD 2.x | PDF/A-3 | CII | Germany |
| Factur-X 1.0 | PDF/A-3 | CII | France |
| XRechnung 3.0 | PDF/A-3 or XML | UBL | German B2G |
| Peppol BIS 3.0 | XML | UBL | EU/UK/AU/SG/NZ |
| FatturaPA 1.2 | XML | FatturaPA | Italy (SDI) |
| UBL 2.1 | XML | UBL | OASIS Standard |
| CII | XML | CII | UN/CEFACT |
Input Methods
The endpoint supports three ways to send data:
1. JSON with Base64 (Recommended)
bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-d '{
"data_base64": "JVBERi0xLjcK...",
"content_type": "application/pdf",
"include_source_xml": true
}'2. Multipart Form-Data (File Upload)
bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "X-API-Key: your_api_key" \
-F "file=@invoice.pdf;type=application/pdf" \
-F "include_source_xml=true"3. Raw Binary Body
bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "Content-Type: application/xml" \
-H "X-API-Key: your_api_key" \
--data-binary @invoice.xmlHeaders
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | application/json, multipart/form-data, application/pdf, application/xml, or text/xml |
X-API-Key | Recommended | Your API key for quota tracking |
X-Include-Source-Xml | No | true to include raw XML in response (for raw binary uploads) |
Request Body (JSON)
| Field | Type | Required | Description |
|---|---|---|---|
data_base64 | string | Yes | Base64-encoded PDF or XML |
content_type | string | No | MIME type (application/pdf, application/xml). Auto-detected if omitted. |
include_source_xml | boolean | No | Include raw XML in response (default: false) |
Response
Success (200)
json
{
"valid": true,
"format": {
"detected_format": "zugferd",
"profile": "EN16931",
"version": "2.3",
"xml_type": "CII",
"has_pdf": true
},
"invoice": {
"number": "RE-2026-001",
"date": "2026-01-15",
"due_date": "2026-02-15",
"currency": "EUR",
"seller": {
"name": "Acme GmbH",
"street": "Musterstr. 1",
"city": "Berlin",
"postal_code": "10115",
"country": "DE",
"vat_id": "DE123456789"
},
"buyer": {
"name": "Customer AG",
"street": "Kundenweg 5",
"city": "München",
"postal_code": "80331",
"country": "DE"
},
"items": [{
"description": "Consulting Services",
"quantity": 10.0,
"unit": "HUR",
"unit_price": 150.00,
"vat_rate": 19.0
}],
"payment": {
"iban": "DE89370400440532013000",
"bic": "COBADEFFXXX"
}
},
"source_xml_base64": "PD94bWwgdmVyc2lvbj0i...",
"transaction_id": "tx_abc123xyz",
"errors": [],
"warnings": [],
"locale": "de"
}Roundtrip Ready!
The invoice object uses the exact same schema as the input for /v1/generate. You can modify it and generate a new invoice!
Response Fields
| Field | Type | Description |
|---|---|---|
valid | boolean | true if parsing succeeded without critical errors |
format.detected_format | string | Detected format: zugferd, facturx, xrechnung, peppol, fatturapa, ubl, cii |
format.profile | string | Profile (e.g., EN16931, EXTENDED, BASIC) |
format.version | string | Version (e.g., 2.3, 1.0) |
format.xml_type | string | XML schema type: CII, UBL, FATTURAPA |
format.has_pdf | boolean | true if input was a PDF with embedded XML |
invoice | object | Extracted invoice data (same schema as /v1/generate input) |
source_xml_base64 | string | Raw XML (if include_source_xml=true) |
transaction_id | string | Unique transaction ID for tracing |
errors | array | Critical parsing/validation errors |
warnings | array | Non-critical warnings |
locale | string | Locale used for messages |
Error Response (422)
json
{
"valid": false,
"format": {
"detected_format": "unknown"
},
"invoice": null,
"transaction_id": "tx_def456",
"errors": [
{
"code": "UNKNOWN_XML_FORMAT",
"message": "XML-Format nicht erkannt (erwartet: CII, UBL oder FatturaPA)",
"severity": "error"
}
],
"warnings": []
}Error Codes
| Code | Description |
|---|---|
INVALID_BASE64 | Base64 decoding failed |
UNSUPPORTED_FORMAT | File format not recognized |
NO_XML_FOUND | PDF contains no embedded XML |
UNKNOWN_XML_FORMAT | XML format not CII, UBL, or FatturaPA |
MISSING_INVOICE_NUMBER | Required field not found in XML |
SCHEMA_ERROR | XML does not conform to schema (XSD) |
SCHEMATRON_ERROR | Business rule validation failed |
Validation
The endpoint validates documents using official validation methods:
| Format | Validation Method | Source |
|---|---|---|
| ZUGFeRD/Factur-X | EN 16931 + ZUGFeRD Schematron | Mustangproject |
| XRechnung | EN 16931 + KoSIT Schematron | itplr-kosit |
| Peppol BIS 3.0 | EN 16931 + Peppol Schematron | OpenPeppol |
| FatturaPA | XSD Schema | Agenzia delle Entrate |
Examples
Roundtrip: ZUGFeRD → Peppol
javascript
// 1. Retrieve data from ZUGFeRD PDF
const retrieveResponse = await fetch('https://api.thelawin.dev/v1/retrieve', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': 'your_api_key'
},
body: JSON.stringify({
data_base64: zugferdPdfBase64,
content_type: 'application/pdf'
})
});
const { invoice } = await retrieveResponse.json();
// 2. Modify and regenerate as Peppol
const generateResponse = await fetch('https://api.thelawin.dev/v1/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': 'your_api_key'
},
body: JSON.stringify({
format: 'peppol',
invoice: {
...invoice,
// Add Peppol-specific fields
buyer: {
...invoice.buyer,
peppol_id: '0088:5060012349998'
}
}
})
});Validate Incoming Invoice
python
import requests
import base64
# Read invoice file
with open('incoming_invoice.pdf', 'rb') as f:
pdf_base64 = base64.b64encode(f.read()).decode()
# Validate
response = requests.post(
'https://api.thelawin.dev/v1/retrieve',
headers={'X-API-Key': 'your_api_key'},
json={
'data_base64': pdf_base64,
'content_type': 'application/pdf'
}
)
result = response.json()
if result['valid']:
print(f"✓ Valid {result['format']['detected_format']} invoice")
print(f" Number: {result['invoice']['number']}")
print(f" From: {result['invoice']['seller']['name']}")
else:
print("✗ Invalid invoice:")
for error in result['errors']:
print(f" - [{error['code']}] {error['message']}")File Upload (cURL)
bash
# Upload PDF directly
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "Content-Type: application/pdf" \
-H "X-API-Key: your_api_key" \
--data-binary @invoice.pdf
# Upload XML with include_source_xml
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "Content-Type: application/xml" \
-H "X-API-Key: your_api_key" \
-H "X-Include-Source-Xml: true" \
--data-binary @invoice.xmlRelated
- Generate Invoice - Generate PDF/XML from JSON
- Validate Invoice - Pre-validate JSON before generation
- Error Codes - Complete error reference