The Document Processing API endpoint in Detail#
The central API endpoint for any document processing is the POST /processing/{workflow}
route.
It can be used to submit documents of various file types and returns all relevant results of a
selected workflow
in a single response object.
Parameters#
Field | Type | Description |
---|---|---|
{workflow} |
string |
It is passed as a part of the query path and determines the kind of analysis,
that should be performed for the submitted document file.
Common choices are ocr , invoice_capturing or accounting_workflow .
A resulting URL including the workflow parameter could look like this:
https://api.natif.ai/processing/ocr
|
file |
binary multipart/form-data |
The actual file bytes encoded into a multipart/form-data request body. |
include |
string (optional) |
Can be included as a query parameter multiple times.
(.../processing/ocr?include=<value1>&include=<value2>[...] )
Possible values are currently
By default all available results are included. |
parameters |
JSON stringified object (optional) |
Some workflows can be configured with additional parameters. These can be passed as a JSON configuration
object with this multipart/form-data parameter. The specification of supported settings for each workflow
is documented in the respective workflow documentation.
|
Hint
All popular programming languages have at least one common HTTP library, that conveniently encapsulates
the creation of a well-formed multipart/form-data
POST request, so you probably don't need to worry
about doing so by hand. See the next section for a bunch of code examples, that illustrate this.
Example of a multipart/form-data HTTP request
POST https://api.natif.ai/processing/invoice_extraction?include=ocr&include=extractions&wait_for=120
accept: application/json
Authorization: ApiKey <API_KEY_SECRET>
Content-Type: multipart/form-data; boundary=WebAppBoundary
--WebAppBoundary
Content-Disposition: form-data; name="file"; filename="invoice.png"
<FILE_CONTENT_BYTES>
--WebAppBoundary
Content-Disposition: form-data; name="parameters"
{
"language": "de"
}
--WebAppBoundary--
Implementation examples#
import urllib.parse
import requests
headers = {"Accept": "application/json", "Authorization": "ApiKey " + YOUR_API_KEY_SECRET}
query_params = {"include": ['ocr', 'extractions']}
workflow_config = {}
url = "https://api.natif.ai/processing/{workflow}?{params}".format(
workflow=WORKFLOW_IDENTIFIER,
params=urllib.parse.urlencode(query_params)
)
with open(FILE_PATH, 'rb') as file:
response = requests.post(url, headers=headers, files={"file": file}, data={"parameters": workflow_config})
curl -X 'POST' \
'https://api.natif.ai/processing/invoice_extraction?include=ocr&include=extractions&wait_for=120' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'Authorization: ApiKey <API_KEY_SECRET>' \
-F 'file=@FILE_PATH;type=image/png' \
-F 'parameters={"language": "de"}'
Responses#
This endpoint may respond with one of the following HTTP-status-codes:
HTTP-Status-Code | Description |
---|---|
201 | Processing request completed. |
202 | Processing request is still in progress. |
401 | Credentials could not be validated. |
402 | Reached a processing limit based on the account's plan. |
403 | Credentials could not be authorized for this processing request. |
404 | No such workflow. |
413 | Uploaded document is too large or has too many pages. |
422 | Validation Error (most likely regarding the request payload). |
429 | Temporary rate limit exceeded or uploaded document too large. |
500 | Processing request failed to complete. |
Please check the section about advanced response handling for further details about how to avoid or handle the different response codes.
Response format#
In most cases the POST /processing/{workflow}
endpoint should respond with an HTTP code 201 - Created
.
The response body is then of type application/json
, and contains:
- Always:
processing_id
: A unique identifier for the respective processing instance. It can be used to retrieve the same results object again later.
- Depending on the result types supported by the respective workflow:
ocr
: All detected text segments for the document with bounding boxes, as well as fulltext for each document pageextractions
: A tree-like object containing all entities and their values from the document. The exact set of supported entities and their tree structure depends on the workflow.- further extraction details and references to processing result artifacts
The different result types are further documented in the Response Formats section.
Hint
If you want to access re-previous processing results, you can do so uning the processing_id
returned
by the 201 and 200 responses from this endpoint.
See Re-fetching the entire processing output or
Fetching individual processing results
Custom Workflows#
The POST /processing/{workflow}
endpoint can be used to process documents with custom self-trained workflows as well.
For more information about how to create your own custom workflows, please refer to the Create Custom Workflows.
You will learn about the UUID workflow identifier that is automatically generated during the creation process and which you can use to process with your custom workflow in the same way as with the prebuilt workflows, i.e. POST /processing/{your_workflow_uuid}
.
Previewing Workflow Updates#
natif.ai is constantly improving the model architectures and baselines for self-trained workflows, which sometimes requires (beneficial) updates to existing workflows. In order to not interfere with productive usage of your workflow, natif.ai will inform you in advance by email about such updates and will provide a preview version of the upcoming workflow update for you to try out before the automatic migration.
When a preview is available, you can use the POST /processing/{your_workflow_uuid}/preview
endpoint to process documents with the preview version of your workflow.
The endpoint works according to the same rules as the POST /processing/{your_workflow_uuid}
endpoint
but returns results (or reference to results) as processed by the preview version of your workflow.
Please make use of the /preview
endpoint to test the upcoming version of your workflow for
production usage and provide feedback to us.