Skip to content

Advanced HTTP response handling#

HTTP response codes#

In most cases the POST /processing/{workflow} request should respond with an HTTP code 201 - Created. This success code indicates that the document processing has been completed and the results are immediately delivered within the response body. However, there are also some other possible responses, that may deserve advanced handling within your application.

HTTP Code 202 - Accepted#

This HTTP response code indicates that the document processing has been initiated but did not finish early enough to provide the results immediately within the same request. In this case the response body will not contain any processing results, but only the two fields processing_id and url.

Whereby:

  • processing_id contains a unique identifier for the document processing instance, that will be processed further in the background. This identifier can be used to retrieve the finished results object later via the GET /processing/results/{processing_id} endpoint.

  • url contains the URL pointing to the GET /processing/results/{processing_id} endpoint, already including the aforementioned processing_id for convenience.

A call to the GET /processing/results/{processing_id} endpoint will behave completely analogous to the initial POST /processing/{workflow} endpoint:

  • If the processing again does not finish within the long polling timeout, it returns the same response again with status code 202.
  • If the processing finishes in time, it will return the exact same response as the initial request would have provided for the workflow.

Hint

This endpoint will return the same response format as the POST /processing/{workflow_key} endpoint, i.e. the relevant processing results packed into a single large JSON object. However, we also provide endpoints for (re-)fetching individual processing results and additional workflow artifacts

Implementation examples#

import urllib.parse
import requests

headers = {"Accept": "application/json", "Authorization": "ApiKey " + YOUR_API_KEY_SECRET}
params = {}
url = "https://api.natif.ai/processing/{workflow}?{params}".format(
    workflow=WORKFLOW_IDENTIFIER,
    params=urllib.parse.urlencode(params)
)
with open(FILE_PATH, 'rb') as file:
    response = requests.post(url, headers=headers, files={"file": file})

    while response.status_code == 202:  # long polling timed-out
        processing_id = response.json()['processing_id']
        retry_url = "https://api.natif.ai/processing/{workflow}/{processing_id}?{params}".format(
            workflow=WORKFLOW_IDENTIFIER,
            processing_id=processing_id,
            params=urllib.parse.urlencode(params)
        )
        response = requests.get(retry_url, headers=headers)

HTTP Code 429 - Too Many Requests#

Hint

For details about the default API rate-limits see this page.

If the endpoint returns this HTTP Client error code, this indicates, that one of the specified rate-limits for the selected workflow has been reached and the current request was therefore temporarily rejected.

In this case the response contains a Retry-After header, which indicates the number seconds, before the next request is going to be accepted. You may use this information to implement automated request back-offs in your software, similar to the following examples.

Implementation examples#

import time
import urllib.parse
import requests

headers = {"Accept": "application/json", "Authorization": "ApiKey " + YOUR_API_KEY_SECRET}
params = {}
url = "https://api.natif.ai/processing/{workflow}?{params}".format(
    workflow=WORKFLOW_IDENTIFIER,
    params=urllib.parse.urlencode(params)
)
with open(FILE_PATH, 'rb') as file:
    response = requests.post(url, headers=headers, files={"file": file})

    while response.status_code == 429:  # too many requests, i.e. hit a rate limit: back off for a while and retry
        suggested_wait_seconds = int(response.headers.get('Retry-After', 1))
        time.sleep(suggested_wait_seconds)
        response = requests.post(url, headers=headers, files={"file": file})

HTTP Codes 400, 401 and 403#

Info

If you still use the deprecated token based authentication, please refer to the token documentation.

In this case something about your request authorization seems to be wrong. Please make sure you use a valid Auhorization: ApiKey <...> header.

Invalid authentication can also be caused by an API key that has been revoked or automatically expired. This can be checked in the API Keys section of the natif.ai API Hub.

HTTP Code 404 - Not Found#

This error may be caused by specifying a non-existent workflow key in the endpoint path - or just a typo in the endpoint base-url.

HTTP Codes 30X#

Redirect codes may happen due to missing or extra trailing slashes / at the end of the endpoint URL. If the requests library of your choice does not support following redirects automatically, please try adding or removing the trailing slash / to/from the endpoint URL. In any case it makes sense to avoid such redirects to reduce the amount of HTTP requests being issued from your application.

HTTP Code 422 - Validation Error#

At least one of the parameters passed to the endpoint could not successfully be parsed or validated. Please refer to the error details from the application/json response to find out which parameter is affected and how.

HTTP Code 50X - Internal Server Error#

Potential Bug

Shoot! Something went wrong on our side! If this happens regularly or systematically, please get in touch and report a bug, describing the process that led to the error-code, so we can have a look at it.

Partial extractions objects in the response JSON#

Note

The AI models used for document extraction work probabilistically, which means that it is impossible to guarantee 100% certainty and correctness of extracted results - nonetheless we take care to be as close as possible to the 100%.

Oftentimes entire subtrees of the extractions result objects are optional, i.e. only included in the JSON response body, if they

  1. were found in the submitted document - documents in the wild have a lot of variation and not all documents of the same kind always contain the exact same set of information
  2. were found in the submitted document with sufficient certainty

This means anywhere on the path to a nested value one needs to anticipate and handle null values, when using the /processing/<workflow> response JSON object.

Below artificial example illustrates this with three different extractions variations.

"extractions": {
    "names": {
        "first_name": {"value": "John", "confidence": 0.99, ...},
        "last_name": {"value": "Doe", "confidence": 0.98, ...}
    }
}
"extractions": {
    "names": {
        "first_name": null,
        "last_name": {"value": "Doe", "confidence": 0.98, ...}
    }
}
"extractions": {
    "names": null
}

Many programming languages have their own idioms to handle optional values along object paths:

response_json.get("extractions", {}).get("names", {}).get('first_name', None)
response_json.extractions?.names?.first_name;