Skip to content

Retrieve Invoice Data

Extract structured invoice data from PDF/XML files. This is the reverse of /v1/generate - enabling roundtrip workflows.

Endpoint

POST https://api.thelawin.dev/v1/retrieve

Purpose

Use this endpoint to:

  • Extract invoice data from ZUGFeRD/Factur-X PDFs as JSON
  • Parse standalone XML (CII, UBL, Peppol, FatturaPA)
  • Validate incoming invoices against EN 16931 standards
  • Enable roundtrip workflows: generate → retrieve → modify → regenerate

Roundtrip Capability

The returned JSON uses the same format as the /v1/generate input. You can modify the extracted data and generate a new invoice in a different format!

Supported Input Formats

FormatInput TypeXML TypeStandards
ZUGFeRD 2.xPDF/A-3CIIGermany
Factur-X 1.0PDF/A-3CIIFrance
XRechnung 3.0PDF/A-3 or XMLUBLGerman B2G
Peppol BIS 3.0XMLUBLEU/UK/AU/SG/NZ
FatturaPA 1.2XMLFatturaPAItaly (SDI)
UBL 2.1XMLUBLOASIS Standard
CIIXMLCIIUN/CEFACT

Input Methods

The endpoint supports three ways to send data:

bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key" \
  -d '{
    "data_base64": "JVBERi0xLjcK...",
    "content_type": "application/pdf",
    "include_source_xml": true
  }'

2. Multipart Form-Data (File Upload)

bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "X-API-Key: your_api_key" \
  -F "file=@invoice.pdf;type=application/pdf" \
  -F "include_source_xml=true"

3. Raw Binary Body

bash
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "Content-Type: application/xml" \
  -H "X-API-Key: your_api_key" \
  --data-binary @invoice.xml

Headers

HeaderRequiredDescription
Content-TypeYesapplication/json, multipart/form-data, application/pdf, application/xml, or text/xml
X-API-KeyRecommendedYour API key for quota tracking
X-Include-Source-XmlNotrue to include raw XML in response (for raw binary uploads)

Request Body (JSON)

FieldTypeRequiredDescription
data_base64stringYesBase64-encoded PDF or XML
content_typestringNoMIME type (application/pdf, application/xml). Auto-detected if omitted.
include_source_xmlbooleanNoInclude raw XML in response (default: false)

Response

Success (200)

json
{
  "valid": true,
  "format": {
    "detected_format": "zugferd",
    "profile": "EN16931",
    "version": "2.3",
    "xml_type": "CII",
    "has_pdf": true
  },
  "invoice": {
    "number": "RE-2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-15",
    "currency": "EUR",
    "seller": {
      "name": "Acme GmbH",
      "street": "Musterstr. 1",
      "city": "Berlin",
      "postal_code": "10115",
      "country": "DE",
      "vat_id": "DE123456789"
    },
    "buyer": {
      "name": "Customer AG",
      "street": "Kundenweg 5",
      "city": "München",
      "postal_code": "80331",
      "country": "DE"
    },
    "items": [{
      "description": "Consulting Services",
      "quantity": 10.0,
      "unit": "HUR",
      "unit_price": 150.00,
      "vat_rate": 19.0
    }],
    "payment": {
      "iban": "DE89370400440532013000",
      "bic": "COBADEFFXXX"
    }
  },
  "source_xml_base64": "PD94bWwgdmVyc2lvbj0i...",
  "transaction_id": "tx_abc123xyz",
  "errors": [],
  "warnings": [],
  "locale": "de"
}

Roundtrip Ready!

The invoice object uses the exact same schema as the input for /v1/generate. You can modify it and generate a new invoice!

Response Fields

FieldTypeDescription
validbooleantrue if parsing succeeded without critical errors
format.detected_formatstringDetected format: zugferd, facturx, xrechnung, peppol, fatturapa, ubl, cii
format.profilestringProfile (e.g., EN16931, EXTENDED, BASIC)
format.versionstringVersion (e.g., 2.3, 1.0)
format.xml_typestringXML schema type: CII, UBL, FATTURAPA
format.has_pdfbooleantrue if input was a PDF with embedded XML
invoiceobjectExtracted invoice data (same schema as /v1/generate input)
source_xml_base64stringRaw XML (if include_source_xml=true)
transaction_idstringUnique transaction ID for tracing
errorsarrayCritical parsing/validation errors
warningsarrayNon-critical warnings
localestringLocale used for messages

Error Response (422)

json
{
  "valid": false,
  "format": {
    "detected_format": "unknown"
  },
  "invoice": null,
  "transaction_id": "tx_def456",
  "errors": [
    {
      "code": "UNKNOWN_XML_FORMAT",
      "message": "XML-Format nicht erkannt (erwartet: CII, UBL oder FatturaPA)",
      "severity": "error"
    }
  ],
  "warnings": []
}

Error Codes

CodeDescription
INVALID_BASE64Base64 decoding failed
UNSUPPORTED_FORMATFile format not recognized
NO_XML_FOUNDPDF contains no embedded XML
UNKNOWN_XML_FORMATXML format not CII, UBL, or FatturaPA
MISSING_INVOICE_NUMBERRequired field not found in XML
SCHEMA_ERRORXML does not conform to schema (XSD)
SCHEMATRON_ERRORBusiness rule validation failed

Validation

The endpoint validates documents using official validation methods:

FormatValidation MethodSource
ZUGFeRD/Factur-XEN 16931 + ZUGFeRD SchematronMustangproject
XRechnungEN 16931 + KoSIT Schematronitplr-kosit
Peppol BIS 3.0EN 16931 + Peppol SchematronOpenPeppol
FatturaPAXSD SchemaAgenzia delle Entrate

Examples

Roundtrip: ZUGFeRD → Peppol

javascript
// 1. Retrieve data from ZUGFeRD PDF
const retrieveResponse = await fetch('https://api.thelawin.dev/v1/retrieve', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your_api_key'
  },
  body: JSON.stringify({
    data_base64: zugferdPdfBase64,
    content_type: 'application/pdf'
  })
});

const { invoice } = await retrieveResponse.json();

// 2. Modify and regenerate as Peppol
const generateResponse = await fetch('https://api.thelawin.dev/v1/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your_api_key'
  },
  body: JSON.stringify({
    format: 'peppol',
    invoice: {
      ...invoice,
      // Add Peppol-specific fields
      buyer: {
        ...invoice.buyer,
        peppol_id: '0088:5060012349998'
      }
    }
  })
});

Validate Incoming Invoice

python
import requests
import base64

# Read invoice file
with open('incoming_invoice.pdf', 'rb') as f:
    pdf_base64 = base64.b64encode(f.read()).decode()

# Validate
response = requests.post(
    'https://api.thelawin.dev/v1/retrieve',
    headers={'X-API-Key': 'your_api_key'},
    json={
        'data_base64': pdf_base64,
        'content_type': 'application/pdf'
    }
)

result = response.json()

if result['valid']:
    print(f"✓ Valid {result['format']['detected_format']} invoice")
    print(f"  Number: {result['invoice']['number']}")
    print(f"  From: {result['invoice']['seller']['name']}")
else:
    print("✗ Invalid invoice:")
    for error in result['errors']:
        print(f"  - [{error['code']}] {error['message']}")

File Upload (cURL)

bash
# Upload PDF directly
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "Content-Type: application/pdf" \
  -H "X-API-Key: your_api_key" \
  --data-binary @invoice.pdf

# Upload XML with include_source_xml
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "Content-Type: application/xml" \
  -H "X-API-Key: your_api_key" \
  -H "X-Include-Source-Xml: true" \
  --data-binary @invoice.xml

ZUGFeRD 2.3 & Factur-X 1.0 compliant