Skip to content

Fine-tune Invoice Extraction#

Our platform allows you to fine-tune our invoice extraction so that our jack-of-all-trades model becomes the master-of-your-trades!

With this service, you can customize our generic invoice extraction model for the vendors you typically receive to ensure a near-perfect extraction. The amazing part of this model is that it will still be a jack-of-all-trades for the remaining invoices that you would occasionally receive. So you can rest assured that your frequently received invoices will be processed with high accuracy, but still benefit from the generalizability of our generic model.

Fine-tuning our generic invoice extraction model to be tailored for your specific invoices is quite easy and straightforward by following our simple wizard:

  • Define the issuers for which you want to fine-tune the model and upload your files accordingly step 2
  • Annotate your data: the generic model extracts entities and all you have to do is correct any errors that may occur step 3
  • Train the generic model on your uploaded and corrected data to get your fine-tuned model. You can then view the performance analysis to get a better understanding of your model. step 7

You can check this blog post to see the steps involved in preparing and training the model.

Once training is complete, this workflow will provide you with extractions and ocr. Your workflow identifier is a UUID that is automatically generated during the creation process, i.e. you can call the workflow like any other workflow using the at the processing endpoint i.e. POST /processing/{your_workflow_identifier}.

Supported return values#

As automatically included in the response JSON, unless otherwise specified via include query parameters.

Credit cost#

A Freemium account allows for up to 100 pages per month, where the cost is 10 credits per page, and 15 credits per document.

Note

A document is usually a bundle of 10 pages.

Extractions formats#

The value of the extractions key for this workflow has the following form:

For document_type = invoice

All supported fields
  • schema_version: integer (possible values: [2])
  • document_type: string (possible values: ['invoice'])
  • customer: Customer
    • name: StringExtraction
    • address: StringExtraction
    • address_struct: Address
      • address_line_1: StringExtraction
      • address_line_2: StringExtraction
      • city: StringExtraction
      • zip: StringExtraction
      • country: CountryExtraction
    • vat_id: StringExtraction
    • tax_number: StringExtraction
    • eori_number: StringExtraction
    • customer_number: StringExtraction
    • banking_information: array of BankingInformation
      • validation_problem: boolean (deprecated)
      • note: string (deprecated)
      • confidence: number (deprecated)
      • bbox_refs: array of Reference
        • page_num: integer
        • bbox_id: integer
      • iban: StringExtraction
      • bic: StringExtraction
  • vendor: Vendor
    • name: StringExtraction
    • address: StringExtraction
    • address_struct: Address
      • address_line_1: StringExtraction
      • address_line_2: StringExtraction
      • city: StringExtraction
      • zip: StringExtraction
      • country: CountryExtraction
    • vat_id: StringExtraction
    • tax_number: StringExtraction
    • eori_number: StringExtraction
    • register_id: StringExtraction
    • banking_information: array of BankingInformation
      • validation_problem: boolean (deprecated)
      • note: string (deprecated)
      • confidence: number (deprecated)
      • bbox_refs: array of Reference
        • page_num: integer
        • bbox_id: integer
      • iban: StringExtraction
      • bic: StringExtraction
    • phone: StringExtraction
    • fax: StringExtraction
    • url: StringExtraction
    • e_mail: StringExtraction
  • currency: CurrencyExtraction
  • date: DateExtraction
  • due_date: DateExtraction
  • service_period: one of:
    • DateExtraction
    • Period
      • start_date: DateExtraction
      • end_date: DateExtraction
  • number: StringExtraction
  • order_numbers: array of StringExtraction
  • order_confirmation_numbers: array of StringExtraction
  • delivery_note_numbers: array of StringExtraction
  • payment_methods: array of PaymentMethodExtraction
  • net_amount: FloatExtraction
  • tax_amount: FloatExtraction
  • additional_cost: FloatExtraction
  • gross_amount: FloatExtraction
  • tax_calculation: array of TaxCalculation
    • tax_code: StringExtraction
    • tax_rate: FloatExtraction
    • tax_amount: FloatExtraction
    • net_amount: FloatExtraction
    • gross_amount: FloatExtraction
    • early_payment_date: DateExtraction
    • discount_percentage: FloatExtraction

      The percentage is only returned if it is written on the document, i.e. it is not calculated.

    early_payment_benefit: array of EarlyPaymentBenefit

    • discount_amount: FloatExtraction (deprecated)

      Use new_amount instead.

    • new_amount: FloatExtraction

  • line_item: array of LineItem

    • pos_id: StringExtraction
    • article_id: StringExtraction
    • ean: StringExtraction
    • description: StringExtraction
    • quantity: FloatExtraction
    • unit_of_measure: StringExtraction
    • service_period: one of:
      • DateExtraction
      • Period
        • start_date: DateExtraction
        • end_date: DateExtraction
    • discount: FloatExtraction
    • additional_cost: FloatExtraction
    • tax_rate: FloatExtraction
    • tax_code: StringExtraction
    • unit_price: FloatExtraction
    • total_price: FloatExtraction
    • order_number: StringExtraction
    • order_confirmation_number: StringExtraction
    • delivery_note_number: StringExtraction
  • due_payable_amount: FloatExtraction
  • discount_amount: FloatExtraction
  • payment_reference: StringExtraction
  • barcodes: array of StringExtraction
  • key_value_pairs: array of KeyValuePair
    • key: StringExtraction
    • value: StringExtraction

Note

For a reference of the structure of each of the extractions objects see Extracted Values. Also, for accessing individual processing results or artifacts, have a look at Fetch Processing Results and Artifacts.

Important

The structure of extractions might contain optional paths. See this and this part of the documentation.