Introduction
What then is optical character recognition? Optical character recognition involves converting the age of a text into readable text formats. This is possible due to the increasing technological advances resulting in the development of several optical character recognition tools and models.
You might now ask, what is the benefit of this to me? Optical character recognition is a technological innovation borne out of necessity to help solve recurring problems for individuals and businesses worldwide. Technology has caused a massive shift from the analogue process of doing things to a more digital, automated format of operation.
This entails that for businesses to thrive, they must adapt to these new realities or risk being phased out. But the big problem is that quite a lot of the business operations are already in analogue formats, and transcribing them to digital formats would be less cost-effective and less efficient.
Hence, optical character recognition comes to the rescue. This seamlessly extracts the texts from the scanned images, eliminating the time lost in transcribing these documents.
Also, optical character recognition has its use cases in other fields of technology, such as data analysis and visualization, enabling efficient data utilization. These help and serve as an aid to business owners in ensuring high cost-effective productivity and efficiency. Not to be left out, the finance sector utilizes this technology to facilitate payments virtually and ensure seamless financial transactions. All these and many more are some of the use cases of optical character recognition.
In this tutorial, I intend to illustrate how to set up an easy-to-use character recognition application with Node JS serving as the backend and React JS as the frontend tool.
To be able to enjoy this tutorial, here are some prerequisites:.
• Intermediate knowledge of Node JS
• Knowledge of Git and Github
• Knowledge of React JS
Optical Character Recognition Engines Available
Before diving in, here are some of the most popularly used optical recognition tools.
• Amazon Textract
• Google Document AI
• IBM DataCap
• DocParser
• CamScanner
• Abbyy
• Base64.ai
And many more. However, for the tutorial, we would be using the Tesseract OCR engine due to its open source, good documentation, and support for Node JS, among other reasons. We would now proceed to delve deep into Tesseract.
A brief intro to Tesseract
Tesseract is an open-source optical character recognition engine and is often revered as the first Optical Character Recognition tool ever made. It was created by HP in 1984, maintained by Google until 2018 and is currently being maintained by its users community. It is currently available in executable formats across various operating systems across the globe. It offers character recognition services for over 100 languages, among which English, French, German, and Spanish are available. The latest version is version 5.3.0.
Setting Up and Installation
To harness this tutorial, we would need to set the required file structures in place. First of all, we need to download and install the Tesseract OCR engine on our local Personal computer. To complete this, I would recommend its documentation site. You can pick your choice based on your operating system specifications. For Windows users, after installing the OCR engine, which can be downloaded from this link, the folder path must be added to the environmental variables for it to be effectively run via the command prompt. For Linux and Mac OS users, you can also get to install Tesseract via this link. With this solved, let's now dive into the tutorial proper.
Demo Project
We intend to build a web application that integrates Tesseract engine to provide the optical recognition feature to the user. The web application will be designed with Node.js Express serving as the backend tool, and React JS serving as the frontend library. To save time, we would delve into the backend functionality and briefly discuss the frontend aspect of the project. Now let's dive in.
First of all, ensure to create your Frontend and Backend code folders respectively. Navigate to the backend folder and then install Node-tesseract-ocr, express, and multer libraries
Npm I Node-tesseract-ocr express multer.
NodeJs Tesseract OCR serves as the Node JS implementation for the Tesseract engine. Multer helps with efficient parsing and storage of the uploaded images used in this tutorial, while Express serves as a great framework for the Node JS server.
After completing this, in your index.js page, initialize the packages installed by importing them as follows:
const express = require("express");
const app = express()
const multer = require("multer")
const tesseract = require("node-tesseract-ocr");
Thereafter, we would be providing a configuration file for Tesseract. This involves providing a config file with the following code.
const config = {
lang: 'eng',
oem: 1,
psm: 3
}
This config file specifies the language intended to be recognized, in this case, English. As mentioned earlier, Tesseract provides language support for over 100 languages, so this can be tweaked to fit the specific language you are interested in.
The oem represents the OCR engine modes available. So far, there are 2 models in the recent updates: Model 1 represents the legacy model, while Engine 2 represents the neural net LSTM model engine. However, you can choose from 4 model operations, with 0 representing legacy engine, 1 representing neural nets LSTM only, 2 representing a combo of the 2 above, and 3 being the default mode. The page segmentation mode (PSM) is based on the region of the image you intend to recognize optically. It ranges from 1 to 14 modes, but we would stick to 3 (default) as we don’t specify the region of the image to be transcribed.
Thereafter, we would be setting up Express and Multer to handle the images uploaded from the frontend site to the server.
app.use("/uploads", express.static(path.join(__dirname, "/uploads")))
var storage = multer.diskStorage({
destination: (req, file, cb) => {
cb(null, 'uploads/')
},
filename: (req, file, cb) => {
cb(null, file.originalname )
},
})
const upload = multer({
storage: storage
})
;
Now, we would be writing a post request that invokes Tesseract to analyze the pictures obtained from the front end.
app.post("/img-upload", upload,single("file"), (req, res) => {
const file = req.file.filename;
tesseract.recognize(file, config). then((text) => {
console.log("text: " + text);
res.status(200).json(text)
}).catch((err) => {
console.log(err)
res.status(500).json(err)
})
})
app.listen("5000", () => {
console.log("Hello")
});
In the function above, a post request is created with the endpoint set to img-upload
and the file accessed. Tesseract, which is already initialized, is invoked to recognize the image with the configuration file passed along. The text obtained from this is then sent to the frontend to be seen. Any error during this process would be caught and also shown.
Here is a result of what was expected.
Attached below is the final code for the backend of this project.
const express = require("express");
const app = express()
const multer = require("multer")
const tesseract = require("node-tesseract-ocr");
const path= require("path");
const cors = require("cors");
app.use(cors({
origin: 'http://localhost:5173',
mathods: ['GET', 'POST']
}))
app.use(express.json());
app.use("/uploads", express.static(path.join(__dirname, "/uploads")))
var storage = multer.diskStorage({
destination: (req, file, cb) => {
cb(null, 'uploads/')
},
filename: (req, file, cb) => {
cb(null, file.originalname )
},
})
const upload = multer({
storage: storage
})
;
const config = {
lang: 'eng',
oem: 1,
psm: 3
}
app.post("/img-upload", upload.single('file'), (req, res) => {
const file = req.file.filename;
tesseract.recognize(`uploads/${file}`, config).then((text) => {
console.log("text: " + text);
res.status(200).json(text)
}).catch((err) => {
console.log(err)
res.status(500).json(err)
})
})
app.listen("5000", () => {
console.log("Hello")
});
The frontend was minimally designed to test the functionality of the application and to upload the images to be transcribed. Here is a picture of the frontend screen.
You can click here for the code to the frontend design of this project.
Additional Information And Improvements
So far so good, we have come to the end of the tutorial. Utilizing cloud platform providers and tools like Docker would further help to seamlessly run the application on the cloud. This OCR technology can also be harnessed and integrated with data science to aid in data processing and visualization. Moreover, the texts can also be stored in a database of choice for further processing.
Conclusion
I sincerely hope you’ve learnt about optical character recognition using Node JS and Tesseract OCR engine and its successful implementation to help improve our day-to-day activities.
Feel free to drop comments and questions, and also check out my other educational tech articles here. Till next time, keep on coding!