How to Convert Image to Text Using Python

Thanks to the power of AI and OCR you can extract text from various file formats. That process can be automatized even further using code. Let's learn how to convert image to text using python.

June 15, 2023
9 minutes
Extract relevant data easily using python
Extract relevant data easily using python

Table of Contents

Modern organizations are inundated with vast amounts of unstructured data in the form of images, PDFs, and scanned documents. Extracting relevant text information from these files manually is time-consuming, error-prone, and inefficient.  

However, with the advancements in artificial intelligence (AI), you now have the ability to automate this process using code. You can use AI-powered optical character recognition (OCR) algorithms to accurately extract text from images and make your data more accessible, searchable, and actionable.  

This article looks at different types of images and methods to extract text from both simple and complex images. We also look at limitations of some common methods and suggest practical ways to improve the output. Let’s begin by understanding why you need to convert images to text!

Why Extract Text from Images?

Many organizations have image data that is scanned from operational paperwork. The text in the image scans is not searchable, editable, or useful for analysis. You have to extract the text or convert it into string data type so you can store and use the data.  

For example, you can extract supplier information, invoice date, invoice amount, and other text information from invoice images. You can store the data for tax and audits or use it to analyze supplier performance.

Other use cases for text extraction include:

  • Digital conversion of healthcare records, scans, and images.
  • Digital conversion of resumes and forms for recruitment and other HR processes.
  • Automatic scanning of ID documents like passports, voter IDs, and rental agreements as part of authorization and authentication workflows.
  • Scanning food labels and ingredients when adding products.
  • Identifying location details from images of places—like street signs, store names, and so on.

What Types of Images Can You Extract Text From?

Technically, you can extract text from all types of images in Python. However, the code complexity and output accuracy can vary greatly depending on the input you expect.  

You may just need a few lines of code if you expect an input of simple images, like the ones shown below. Such images have large text, less words, simple font and clear contrast between text and images.

However, most text extraction input images have noisy backgrounds, varying fonts, shading or skewing of image text or handwritten text, like the one shown below.

Such images are going to require much more coding and testing efforts in a DIY coding program. You have to preprocess the text before extraction and then further analyze and correct the text after extraction.

Convert Simple Images to Text in Python

The methods outlined below will work well for simple images.

#1: Tesseract and OpenCV

Tesseract is a widely used open-source OCR (Optical Character Recognition) engine that provides accurate text extraction from images. Open Source Computer Vision Library (OpenCV) is a machine learning software library that provides various functionalities and algorithms to work with images and videos. OpenCV is written in C++ and offers interfaces for various programming languages, including Python.  

You can use Tesseract and OpenCV to extract information from images using Python.

Setup

To begin, install Tesseract on your system. You can install it by following the instructions specific to your operating system.  

Once Tesseract is set up, you must install the pytesseract library, which acts as a Python wrapper for Tesseract along with OpenCV.  

pip install pytesseract pip install opencv-python

After installing everything, follow the following steps for converting the text image to string using Tesseract.

Code Example

Step Code example
Import the necessary libraries import cv2 import pytesseract
Read the image using OpenCV image = cv2.imread('image.jpg')
Preprocess the image If needed, you can apply preprocessing techniques such as resizing, denoising, or applying filters to enhance the accuracy of text extraction.
Extract text using Tesseract text = pytesseract.image_to_string(image)
Print the extracted text print(text)

#2 easyOCR

easyOCR is a user-friendly and efficient Python library for OCR. It provides a simple interface to extract text from images that are basic. To get started with easyOCR for text extraction, you need to install the library by running the following command:

pip install easyocr

Once installed, follow these steps to extract text from an image.

Step Code example
Import the easyOCR library import easyocr
Initialize the reader object. The parameter in the bracket indicates the language of the image text. The string ‘en’ stands for the English language. You can also specify multiple languages by passing a list of language codes. reader = easyocr.Reader(['en'])
Read the image and extract text result = reader.readtext('image.jpg')
Process the results The `readtext` method returns a list of text detection results. Each result contains the extracted text, the coordinates of the bounding box, and a confidence score. You can iterate over the results to access and process the text as per your requirements.
Print or manipulate the extracted text for detection in result: text = detection[1] print(text)

#3: Other Python Libraries

In addition to pytesseract and easyOCR, there are other Python libraries available that offer OCR capabilities for extracting text from images.

PyOCR

PyOCR is a Python wrapper that provides access to various OCR engines such as Tesseract, CuneiForm, and GOCR. It offers a unified interface to utilize these engines for text extraction from images. Here's an example of how to use PyOCR with Tesseract.

Import libraries import PIL.Image import pyocr import pyocr.builders
Initialize the OCR engine tools = pyocr.get_available_tools() if len(tools) == 0: print("No OCR tool found.") exit(1) ocr_tool = tools[0]
Load the image image = PIL.Image.open('image.jpg')
Perform OCR on the image text = ocr_tool.image_to_string( image, builder=pyocr.builders.TextBuilder() )
Print the extracted text print(text)

OCRopus

OCRopus is a collection of OCR tools and libraries developed by Google. It provides a framework for OCR research and includes various components like layout analysis, character recognition, and post-processing. OCRopus can be used for both single-page and multi-page document OCR. Here's a basic example.

Import libraries import ocrolib
Load the image image = ocrolib.read_image('image.jpg')
Preprocess the image if needed You can use the library to perform tasks like binarization, deskewing, and noise removal.
Perform OCR on the image text = ocrolib.ocr(image)
Print the extracted text print(text)

These Python libraries offer additional options and flexibility regarding OCR in Python. Depending on your specific requirements and the nature of your images, exploring these alternative libraries might provide you with different features and performance characteristics.

Limitations of python libraries

Open source python libraries give good results for basic images but often fail for complex images. For example, they give inaccurate results if:

  • The background is pixellated, blurry or same colour as the text.
  • Image is a scanned copy of handwritten text.
  • Image has multiple columns or irregular text placement.

They also cannot perform natural language processing (NLP) to check and improve the output. For example, if only partial text is extracted, NLP can guess and complete the results for better output. But python libraries cannot do this. They return incorrect results if the input is not standard.

Improving Results When Using Python Libraries

You can improve text extraction results in Python libraries by image conversion. You have to first convert the image to grayscale or black and white format. Then, you can further convert grayscale into binary, where text is represented as black pixels and the background as white pixels.  

You can also write additional code for pre-processing images. Common preprocessing tasks include:

  • Apply filters to remove image speckles and improve clarity.
  • Adjust the contrast between the text and background.
  • Correct any skew or rotation in the image.
  • Adjust any varying text size to a single standard size.

It is important to note that while most of these image preprocessing tasks are basic they still require significant coding and testing efforts.  

In practical applications, real-world images require additional complex computer vision based pre-processing such as:

  • Image component analysis to find regions of text blocks.
  • Pre-labelling of image regions.
  • Contour analysis to find boundaries or edges of text regions.
  • Stroke thinning for handwritten text.

Computer vision based pre-processing is not possible with above Python libraries.

Convert Complex Images to Text  

If you are expecting more complex images as input, you are better of choosing an enterprise solution.

#1: Cloud APIs

You can use fully managed OCR services provided by cloud providers for extracting text from images. The cloud providers handle the underlying complexity of text extraction. You pass the image to the API as input and get the string as output. The top three cloud OCR services are:

You can call any API in your code based on the cloud infrastructure of your organization. Below we give an example of Google Cloud Vision API. First, set up a Google Cloud Project.

  • Visit the Google Cloud Console and create a new project.
  • Enable the Cloud Vision API for the project.
  • Generate an API key or set up authentication credentials to access the API.

After setting up, follow the code example steps below.

Step Code example
Import the required Python library pip install google-cloud-vision
Import the necessary libraries. from google.cloud import vision import io
Instantiate the Vision client. client = vision.ImageAnnotatorClient()
Read and process the image. with io.open('image.jpg', 'rb') as image_file: content = image_file.read() image = vision.Image(content=content) response = client.text_detection(image=image) texts = response.text_annotations for text in texts: print(text.description)

The API analyzes the image and returns the extracted text and additional information, such as bounding box coordinates and confidence scores.

Limitations of cloud services

Cloud APIs provide a convenient and scalable solution, allowing you to process large volumes of images without the need for infrastructure setup or maintenance. However, pricing can be unpredictable and outside your control. You often have to use multiple cloud services—like storing your input images and output results in the cloud provider’s database. This adds to costs. You may also get locked in to their infrastructure with legally binding contracts.  

Most importantly, cloud providers only provide the infrastructure for your image extraction. You still have to write the code and build the image extraction applications yourself.

#2 Third party AI tools

Third party AI tools provide more comprehensive, AI powered solutions for image processing. For example, Affinda’s document processing engine, Vega, powers many custom image extraction solutions like invoice processing, recruitment data extraction and ID data extraction. Vega combines three AI techniques:

  • Computer vision technologies
  • Deep learning
  • Natural language processing

All three work together to pre-process documents, extract text and post-process results. You just have to input the image and use the output as you like. You can use existing solutions out of the box, without the need for any code. Alternatively, you can request the Affinda team to develop a custom solution for you. Price and usage is totally in your control!

Best Way to Convert Images to Text with Python

You have to consider the complexity of your images, the expected quality of output, and the expected scale and speed of extraction, before choosing the best method. Open source Python libraries are useful for simple image to text conversion. However, for most real-world applications, input images are complex with varying text placement, background colours and image quality. Open source libraries give inaccurate and inconsistent results for such images. For complex legal and finance use cases, with little scope for error and changing text placement in images, consider a ready-to-use document processing engine like Vega.

Share this post
Browse recent Tech AI articles
Understanding Transfer Learning: What Do Tennis Balls Teach AI About Ferrets?
Dive into the power of Transfer Learning in AI: A game-changer for efficient and adaptable machine learning across various fields.
A Deep Dive into Affinda Integrations Using Eden AI
Learn how to seamlessly integrate Affinda through Eden AI.
All You Need to Know About Machine Learning OCR
Digital transformation is key if you want to stay ahead of the competition, and machine learning OCR is an essential component of the process. What is machine learning OCR, and how does it work? Read on to find out!

AI Document Processing solutions
for every business.

AI tools for recruitment and talent acquisition automation. Perfect for job boards, HR tech companies and HR teams.

AI data extraction for accounts payable (and receivable) departments. Automate invoices, receipts, credit notes and more.

Data extraction AI that automates your compliance requirements for individuals and businesses alike.

Develop custom models for your own unique use case to give you a competitive edge.

Explore how you can process your documents with our powerful AI.

Get in touch with our team of experts and find the best solution for you. Contact us for a free consultation call.