Fine-tune Invoice Extraction#
Our platform allows you to fine-tune our invoice extraction so that our jack-of-all-trades model becomes the master-of-your-trades!
With this service, you can customize our generic invoice extraction model for the vendors you typically receive to ensure a near-perfect extraction. The amazing part of this model is that it will still be a jack-of-all-trades for the remaining invoices that you would occasionally receive. So you can rest assured that your frequently received invoices will be processed with high accuracy, but still benefit from the generalizability of our generic model.
Fine-tuning our generic invoice extraction model to be tailored for your specific invoices is quite easy and straightforward by following our simple wizard:
- Define the issuers for which you want to fine-tune the model and upload your files accordingly
- Annotate your data: the generic model extracts entities and all you have to do is correct any errors that may occur
- Train the generic model on your uploaded and corrected data to get your fine-tuned model. You can then view the performance analysis to get a better understanding of your model.
You can check this blog post to see the steps involved in preparing and training the model.
Once training is complete, this workflow will provide you with extractions
and ocr
. Your workflow
identifier is a UUID that is automatically generated during the creation process, i.e. you can call the workflow
like any other workflow using the
at the processing endpoint
i.e. POST /processing/{your_workflow_identifier}
.
Supported return values#
As automatically included in the response JSON, unless otherwise specified via include
query parameters.
ocr
, see OCR Formatextractions
, see below
Credit cost#
A Freemium account allows for up to 100 pages per month, where the cost is 10 credits per page, and 15 credits per document.
Note
A document is usually a bundle of 10 pages.
Extractions formats#
The value of the extractions
key for this workflow has the following form:
For document_type
= invoice
All supported fields
schema_version
: integer (possible values: [2])document_type
: string (possible values: ['invoice'])customer
: Customername
: StringExtractionaddress
: StringExtractionaddress_struct
: Addressaddress_line_1
: StringExtractionaddress_line_2
: StringExtractioncity
: StringExtractionzip
: StringExtractioncountry
: CountryExtraction
vat_id
: StringExtractiontax_number
: StringExtractioneori_number
: StringExtractioncustomer_number
: StringExtractionbanking_information
: array of BankingInformationvalidation_problem
: boolean (deprecated)note
: string (deprecated)confidence
: number (deprecated)bbox_refs
: array of Referencepage_num
: integerbbox_id
: integer
iban
: StringExtractionbic
: StringExtraction
vendor
: Vendorname
: StringExtractionaddress
: StringExtractionaddress_struct
: Addressaddress_line_1
: StringExtractionaddress_line_2
: StringExtractioncity
: StringExtractionzip
: StringExtractioncountry
: CountryExtraction
vat_id
: StringExtractiontax_number
: StringExtractioneori_number
: StringExtractionregister_id
: StringExtractionbanking_information
: array of BankingInformationvalidation_problem
: boolean (deprecated)note
: string (deprecated)confidence
: number (deprecated)bbox_refs
: array of Referencepage_num
: integerbbox_id
: integer
iban
: StringExtractionbic
: StringExtraction
phone
: StringExtractionfax
: StringExtractionurl
: StringExtractione_mail
: StringExtraction
currency
: CurrencyExtractiondate
: DateExtractiondue_date
: DateExtractionservice_period
: one of:- DateExtraction
- Period
start_date
: DateExtractionend_date
: DateExtraction
number
: StringExtractionorder_numbers
: array of StringExtractionorder_confirmation_numbers
: array of StringExtractiondelivery_note_numbers
: array of StringExtractionpayment_methods
: array of PaymentMethodExtractionnet_amount
: FloatExtractiontax_amount
: FloatExtractionadditional_cost
: FloatExtractiongross_amount
: FloatExtractiontax_calculation
: array of TaxCalculationtax_code
: StringExtractiontax_rate
: FloatExtractiontax_amount
: FloatExtractionnet_amount
: FloatExtractiongross_amount
: FloatExtraction
-
early_payment_date
: DateExtraction-
discount_percentage
: FloatExtractionThe percentage is only returned if it is written on the document, i.e. it is not calculated.
early_payment_benefit
: array of EarlyPaymentBenefit-
discount_amount
: FloatExtraction (deprecated)Use
new_amount
instead. -
new_amount
: FloatExtraction
-
line_item
: array of LineItempos_id
: StringExtractionarticle_id
: StringExtractionean
: StringExtractiondescription
: StringExtractionquantity
: FloatExtractionunit_of_measure
: StringExtractionservice_period
: one of:- DateExtraction
- Period
start_date
: DateExtractionend_date
: DateExtraction
discount
: FloatExtractionadditional_cost
: FloatExtractiontax_rate
: FloatExtractiontax_code
: StringExtractionunit_price
: FloatExtractiontotal_price
: FloatExtractionorder_number
: StringExtractionorder_confirmation_number
: StringExtractiondelivery_note_number
: StringExtraction
number_of_line_items
: FloatExtractiondue_payable_amount
: FloatExtractiondiscount_amount
: FloatExtractionpayment_reference
: StringExtractionbarcodes
: array of StringExtractionkey_value_pairs
: array of KeyValuePairkey
: StringExtractionvalue
: StringExtraction
Note
For a reference of the structure of each of the extractions
objects see
Extracted Values.
Also, for accessing individual processing results or artifacts, have a look
at Fetch Processing Results and Artifacts.