ID VERIFICATION OCR API
Need an accurate API to extract and verify ID documents? Affinda’s AI-Powered System delivers it for you with our plug and play API. We can verify drivers licenses, passports, and birth certificates with our powerful system that goes beyond OCR to see a document – just like a human would.
Passport Verification OCR API
Accurate Passport Extraction and Verification. Our AI recognizes and processes all Passport fields with stunning accuracy, including Country, Date of Birth, and Place of Birth.
Driver's License Verification OCR API
Go beyond OCR with our AI for Driver's License Extraction. Our AI Recognises and extracts all forms of drivers license, extracting all major fields including License ID, Name and Date of Birth.
Birth Certificate OCR API
Use our API for accurate Birth Certificate extraction and processing. Use our service to verify Names and Dates of Birth amongst other fields.
Have Specific Needs? We Can Verify These Documents
Extract and verify card details
We can work with ID Cards from most countries.
Need to verify someone’s visa status? We can extract information from all forms of Permits and Visas.
Extract and verify information from Utility Bills using our AI. We can recognize and verify all major fields.
Vega: Our Intelligent Document Processing Engine
VEGA processes documents combining human-like understanding and computer-grade accuracy combining the latest in the AI advancements
Natural language processing
Where Affinda really stood out out head and shoulders above the others is in their level of support and attention to the customer.
Overall, we are very happy with having switched. Problems are resolved quickly. We have some unique requirements and they were able to work with us on that.
AI Experts. Specialists in Document Extraction.
Affinda is a team of AI nerds, headquartered in Melbourne, Australia and with a global team spanning the world. We specialize in solving novel document extraction problems and have built custom AI solutions for use cases across Insurance, Banking, Investment, Logistics, Recruitment and Finance.
Get in Touch to Discuss Your Project
We can customize our technology to meet your specific needs. With teams in Europe, North America and Australia, we can help – no matter where you are.
Frequently asked questions:
Our technology is used to extract all the important information from an ID document. Modern ID document parsers like Affinda's leverage multiple AI neural networks and data science techniques to extract structured data. Typical fields being extracted may include the document number, first name, middle name, last name, date of birth, address — plus much more. Our ID document extraction tools can be integrated into a software platform, to provide near real time automation.
Affinda’s ID document OCR extractor is capable of extracting data from Passports, Driver’s Licenses and Birth Certificates. If you require additional ID documents and fields to be extracted, please reach out to our AI experts.
Good intelligent document processing requires a combination of technologies and approaches.
Our solution uses deep transfer learning in combination with recent open source language models, to segment, section, identify, and extract relevant fields.
We use image-based object detection and proprietary algorithms developed over several years to segment and understand the document, to identify correct reading order, and ideal segmentation.The structural information is then embedded in downstream sequence taggers which perform Named Entity Recognition (NER) to extract key fields.Each document section is handled by a separate neural network.Post-processing of fields to clean up location data, phone numbers and more.Comprehensive skills matching using semantic matching and other data science techniques
OCR stands for Optical Character Recognition. Affinda’s software uses OCR to convert an image that doesn’t have text data — such as a photo of an ID document — into a file that is machine-readable.
All uploaded information is stored in a secure location and encrypted. You can read all the details here.
We export the data from our ID documents in JSON format so that it can be most easily integrated into your platform or database.
We’d be happy to build you an ID document parsing tool with custom fields specific to your use case. Just click here to get in touch with our development team!
We’d be happy to build you an ID document parsing tool for another document that we don't currently have listed. Click here to get in touch with our development team and we can discuss your use case.
Our Online App and ID document API will process documents in a matter of seconds. If you’re looking for a faster, integrated solution, simply get in touch with one of our AI experts.
Our ID document reading functionality is designed to be flexible, easy to integrate, and simple enough to white-label. Click here to get in touch with our sales team and discuss licensing within your app or platform.
Affinda's ID document extracting software uses machine learning and natural language processing to observe and adapt to differing documents. The software uses a dynamic approach where it is uses human like logic to recognise different document types and characteristics and build and apply a library of logical reasoning
Pricing varies depending on the integrations you’ll be using, as well as the number of ID documents you’ll be parsing each month. Please get in touch with our sales team to learn more!
Simply sign up for a free trial and start uploading your passports, licenses and birth certificates in bulk, in any common image or text format, such as .pdf, .doc, .docx, .odt, .png or .jpg. That’s all there is to it!
Our free online ID document parser accepts .pdf, .doc, .docx, .odt, .png and .jpg formats.
Our OCR software covers three types of ID documents, which are powered by our industry leading technology.Passport OCR is capable of extracting information from Passports.
- The following fields will be extracted from Passports (list fields)
- Driver’s License OCR is capable of extracting information from Driver’s Licenses. The following fields will be extracted from Driver’s Licenses (list fields
- Birth Certificate OCR is capable of extracting information from Birth Certificates. The following fields will be extracted from Birth Certificates (list fields)
The majority of customers benefit from Affinda’s ID document parser through our hosted solution. However, some users will the ID document Extractor to be deployed locally for various reasons. We have published an Affinda Self-Hosted Deployment Guide to aid customers who request this deployment method.
Affinda’s ID Document Extractor solution aims to accurately and quickly return key data from an ID document with as little human intervention as possible. The use of confidence levels and auto-validation is key to minimising this human intervention.
Artificial intelligence models are probabilistic. This means that the AI model will return what it believes is most likely to be correct, which will be the data it is most confident is the correct answer. Affinda returns the confidence levels to users, either via API or through the validation interface, so that users can more easily direct their attention to those fields that the model has less confidence in and therefore be more likely to be incorrect.
The confidence levels shown take into consideration:
- That the data point selected by the model is correct
- In the case of scanned images, the confidence that the model has that the capture of the text via OCR is correct
To reduce the amount of human intervention, auto-validation rules and thresholds can be set so that users only need to validate a subset of all data fields. Within the Affinda web app, users can set their auto-validation threshold. Any data fields whose confidence level is above this user-set threshold will be auto-validated and not require any human to validate this data point.By its nature, from time to time an auto-validated data field may be incorrect. As such, we recommend setting a high auto-validation threshold above 90%. Any auto-validated fields that are identified to be incorrect can still be corrected within the validation interface.
Affinda’s ID Document Extractor is capable of handling different date and number formats that exist across regions.
Dates will often be represented differently depending on the region. The most common date format difference is whether the month appears before or after days in the entire date. For example, dates in the United States are often presented as MM/DD/YY, whereas in the United Kingdom they are often presented as DD/MM/YY.
Additionally in passports, date formats may include the month denoted in English and the country’s primary or secondary language.
Affinda’s technology will parse all different date formats and return the extracted data in the ISO 8601 format YYYY-MM-DD.
The ID document OCR software is yet to automatically identify and categorise the type of ID document being submitted to the system. Currently users should submit the specified document to each extractor – i.e. a Passport to the Passport ExtractorHowever, in the case that an incorrect document has been submitted to the extractor (e.g: a Driver’s License being submitted to the Passport extractor), the quality of the information extracted, and the model confidence levels will be very low. These metrics can be used to imply that an incorrect document has been submitted
The Machine Readable Zone (MRZ) is a codified element of identity documents which contains the basic personal details of the document holder including full name, document number, date of birth, sex and document expiration date.This information extracted can be used to validate the information extracted from the body of the passport (excluding the MRZ). This enhances the accuracy and the confidence in the= information extracted by the Passport extractor
Affinda has developed our data extraction solution to work either within our dedicated app or embedded within your platform. With the embedded option, workflows can remain largely unchanged to current processes with customisations available so that the tool fits seamlessly into your platform.The validation interface can be customised to seamlessly fit within your platform. For more information on white-labeling and other customisations, see here.
How to embed the validation interface?
- ID documents posted to Affinda will extract and return via API most of the data from an ID document
- Within this API response, we return a value called meta.reviewUrl
- This is a signed and authenticated URL which can be embedded as an iFrame within your platform and allow members of your team to validate the data predicted and add any additional data points
- Once an ID document is validated, the validated data can be requested and entered into your platform with full confidence in the accuracy
Each signed URL is valid for 60 minutes and as such we recommend not storing the URL locally. If users want to access the validation tool for an ID document that has already been created in the Affinda system, we recommend retrieving the new URL only when the user clicks to validate the ID document by performing a request to /ID documents/<identifier> which then sends the user to a page that embeds the URL retrieved from meta.reviewUrl.
Why should I use the validation interface?
There are two key advantages of using the validation interface.
- Ensuring the integrity of data
Despite all of the benefits of using an AI-based engine for extracting data, the reality is that no model will ever be 100% accurate for every single ID document. The benefits of a 'human in the loop' model mean that predictions made by the model are validated and 100% accurate data can be guaranteed before further processing in your system.
The ease of use of the interface, and the already high accuracy of the model predictions, means that obtaining perfect data is now a matter of seconds, not minutes, per document.
- Creation of a feedback loop
One key strength of AI models is that they can continue to learn over time. When using the validation interface, the data from these human corrections can be fed back into our models so that our engine can begin to learn different state and country formats and accuracy will continue to improve over time, further reducing the amount of human intervention required.