Affinda's ID Extractor solution can easily be used via API and embedded into existing workflows and platforms.
Process
A typical process that is followed to extract and validate the data from a file is:
- End-user submits the file for verification
- Customer sends the file to Affinda API for data extraction
- Affinda applies the AI model to the file and extracts data
- Data is returned via API, along with relevant metadata
- Customer uses some rules or logic to determine if human validation is required
- If human validation is required:
- reviewUrl found in API response is opened and validation occurs via interface
- Data is retrieved from Affinda again either via API or webhook
Using the API
Affinda's API for the ID Extractor is found at: https://api.affinda.com/v2/custom_documents
Submitting an ID to the Affinda API
The following fields may be included in the API POST request to Affinda. Note, all fields are optional, however, one of file or URL must be included.
Field Name | Description |
file |
File as binary data blob. Supported formats: PDF, DOC, DOCX, TXT, RTF, HTML, PNG, JPG |
url |
URL to a file to download and process |
file_name |
Optional filename of the file |
identifier |
Unique identifier for the document. If creating a document and left blank, one will be automatically generated. |
wait |
If "true" (default), will return a response only after processing has completed. If "false", will return an empty data object which can be polled at the GET endpoint until processing is complete. |
expiry_time |
The date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry. |
document_type |
The type of ID that has been submitted. Options are passports, aus_drivers_licence, or aus_birth_certificate. If not specified, Affinda will apply classification of what type of ID has been submitted and apply the relevant model to extract data. |
reject_duplicates | If "true", parsing will fail when the uploaded file is a duplicate of an existing file. If "false", will parse the document normally whether it is a duplicate or not. If blank, will revert to account level settings. |
Response from the Affinda API
The response from the Affinda API will include the data extracted from each field, as well as a range of metadata for each field and the document itself. If wait = false in the POST request, the data will be automatically returned. Otherwise, submit a GET request with the identifier in the body to return the data once extraction is complete.
Field level metadata
Meta Data | Description |
rectangle | x/y coordinates for the rectangular bounding box containing the data |
confidence | Value that indicates the confidence that the model has that the data returned is correct |
isVerified | Indicates whether the data has been validated, either by a human using our validation tool or through auto-validation rules |
isClientVerified | Indicates whether the data has been validated by a human |
isAutoVerified | Indicates whether the data has been auto-validated |
raw | Raw data extracted from the before any post-processing |
parsed | Parsed data extract after post-processing steps, including reformatting or mapping to a defined taxonomy |
classification | The name of the field |
Document level metadata
Metadata | Description |
identifier | Unique identifier for the document. If creating a document and left blank, one will be |
fileName |
Optional filename of the file |
ready |
If true, the document has finished processing. Particularly useful if an endpoint request specified wait=False, when polling use this variable to determine when to stop polling |
readyDt |
The datetime when the document was ready |
failed |
If true, some exception was raised during processing. Check the 'error' field of the main return object. |
expiryTime |
The date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry. |
language |
The document's language. |
The URL to the document's pdf (if the uploaded document is not already pdf, it's converted to pdf as part of the parsing process). |
|
parentDocument |
Unique identifier for the parent document if the document has been split |
childDocuments | Unique identifier for any child documents if the document has been split |
pages | |
isVerified |
This is true if the "confirm" button has been clicked in the Affinda validation tool |
reviewUrl |
Signed URL (valid for 60 minutes) to access the validation tool. Not applicable for documents types such a resumes. |
ocrConfidence |
The overall confidence in the conversion of image to text (only applicable for images or PDF documents without a text layer) |