Train-your-own Classifier#

Train-your-own Classifier lets you build a custom classification API tailored to your needs, without requiring deep knowledge of AI. Typical use cases are input management — sorting and routing incoming mail to the appropriate departments — or picking the right downstream extraction workflow per document type.

When to use this workflow#

Use Train-your-own Classifier when you need to sort documents into your own categories.
Combine it with a downstream extraction workflow (Fine-tune Invoice Extraction, Train-your-own Extraction Model, or GenAI Extraction) to route each document to the right extraction logic based on its class.
If you also need to split multi-document files into individual documents before classifying, use Train-your-own Splitting Model first.

At a glance#


Output	`classifications`, `ocr`
Annotation	Required (sample documents per class)
Training	Required
Returns confidence per class	Yes
Cost	15 credits/page · 30 credits/document

Creating the workflow#

You can start the workflow creation either by clicking Train Your Own Model + on the API Hub and selecting Custom Classifier, or directly via the wizard.

Train Your Own Model button Custom Classifier selector

The wizard guides you through the following steps:

Workflow metadata. Give your classifier a name and optionally a description and thumbnail.
Define labels. Specify the classes you want to sort documents into.
Document specification. Tell us about the kind of documents you will be uploading so we can better steer the model training process. For classifiers, you can additionally specify the relevant pages for classification.

Cropping. When documents are photographed, or multiple small documents are scanned to A4, the files often contain unwanted background around the document. Cropping removes those outer areas and renders each scanned document individually — which can drastically improve your workflow's quality. For digitally born documents, cropping is usually not necessary.

Character set. Pick Latin for English, German, French, Spanish, Slovak, Czech, Polish, Italian, Dutch, Slovenian, Croatian, Portuguese, Finnish, Swedish, Danish, and Norwegian. If your documents contain Japanese characters, pick the Japanese option for best results.

Printed and handwritten text. Choose "only printed text" to ignore handwritten text in your documents. Choose "only handwritten text" to ignore printed text. Use the combined option to consider both.
Create workflow. Some settings can be changed later, others are fixed once the workflow is created.

You can check this blog post to see the steps in detail.

Your workflow identifier is a UUID generated during the creation process. Once training is complete you can call it like any other workflow at the processing endpoint:

POST /processing/{your_workflow_identifier}

Training data#

The number of training samples required depends on the difficulty of the task. Harder tasks need more training data to reach good accuracies. As a rule of thumb:

Fewer than 5 classes: upload at least 25 samples per class.
More than 5 classes: upload at least 50 samples per class.
Difficult or complicated tasks: try doubling the dataset.

In general, our classifiers work better with more data.

Training data view

Processing your documents#

You can upload documents directly from the Dashboard tab. They can also be uploaded from the Uploads tab, which lists all documents that have been processed through this workflow.

OpenAPI Documentation#

For API usage, have a look at the Documentation tab on the workflow dashboard, where you can find an OpenAPI documentation customised for your workflow — i.e. the response schema already reflects the classes you have defined.

OpenAPI documentation

The relevant endpoint is:

POST /processing/{workflow_key}

It is used to process a document with the trained classifier. The workflow key is the UUID of your workflow, which you can find in the URL of the dashboard.

The workflow returns classifications and ocr. See Classifications Format and OCR Format for the response structure.

Code Snippets#

Along with the workflow-specific OpenAPI documentation, you can find code snippets for different programming languages to help you get started with the API.

Code snippets

Confidence and Human-in-the-loop#

Along with the classification result, the API returns a confidence value. The platform also flags cases where the classifier appears unreliable and recommends adding more training data.

Based on our analysis, we recommend human review for classification results with a confidence below 80%. This corresponds to an error rate of around 3% after manual correction. The complexity of your use case strongly influences which threshold suits you best, so we recommend testing on your own data.

For details on using confidence values to drive human review and on integrating the verification API, see Verification.

Credit cost#

A Freemium account allows for up to 100 pages per month, where the cost is 15 credits per page, and 30 credits per document.

Note

A document is usually a bundle of 10 pages.

Previewing Workflow Updates#

natif.ai is constantly improving the model architectures and baselines for custom workflows, which sometimes requires (beneficial) updates to existing workflows. In order to not interfere with productive usage of your workflow, natif.ai will inform you in advance by email about such updates and will provide a preview version of the upcoming workflow update for you to try out before the automatic migration.

Please refer to the preview endpoint documentation to make use of the endpoint to test the upcoming version of your workflow for production usage and provide feedback to us.