Optical Character Recognition(OCR): Extracting Text From Images with JavaScript

Sk - Aug 1 '23 - - Dev Community

Powerful, featureful applications directly on the web?

Meet WebAssembly (Wasm), the technology that single-handedly opened a new world for the web.

It's a binary instruction format for a stack-based virtual machine,

and can be a compilation target for many programming languages, including C/C++,
Yes, you read that right – C/C++ on the web!

In short WebAssembly is efficient, fast, and safe,
capable of bringing the world of many programming languages to the web.

In this article we will explore one incredible example: tessaract.js – written in C/C++, ported to WebAssembly.

This means we can reap all the benefits of a systems language while running directly in the browser at near-native speed.

no need for a separate server.

tesseract.js

To get started, let's start a vanilla project. I personally use vite and TypeScript, you can opt for JavaScript if you prefer:

npm create vite@latest

Enter fullscreen mode Exit fullscreen mode

install tesseract:

npm i tesseract.js

Enter fullscreen mode Exit fullscreen mode

To run the application:

npm run dev
Enter fullscreen mode Exit fullscreen mode

We need the shell only, remove the excess html, in favor of the ff:

//main.ts
document.querySelector('#app').innerHTML = `

     <div id="drop">
       Drop Image file
     </div>

`
Enter fullscreen mode Exit fullscreen mode

The following is all the css we are going to need, replace it in style.css:

:root {
  font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
  line-height: 1.5;
  font-weight: 400;

  color-scheme: light dark;
  color: rgba(255, 255, 255, 0.87);
  background-color: white;

  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
  -webkit-text-size-adjust: 100%;
}



body {
  margin: 0;
  display: flex;
  place-items: center;
  min-width: 320px;
  min-height: 100vh;
}

#app {
  max-width: 1280px;
  height: 100vh;
  margin: 0 auto;
  padding: 2rem;
  text-align: center;
   display: flex;
   flex-direction: column;
   gap: 5em;
   align-items: center;
}


#drop{


  display: grid;
  place-content: center;   
   width: 500px;
   height: 200px;
   background: white;
   color: black;
  box-shadow: 0 10px 20px rgba(0,0,0,0.19), 0 6px 6px rgba(0,0,0,0.23);

}

.hide{
  display: none!important;
}

img {
object-fit: contain;

}
Enter fullscreen mode Exit fullscreen mode

To accept the image from the user, we will use a drag and drop zone, instead of file input,

because it's easier to style, but in a production application, for a good UX both are preferable

navigate to main.ts , let's create a div with id "drop" as our drop zone:

let dropbox;

dropbox = document.getElementById("drop");
// console.log(dropbox)
dropbox.addEventListener("dragenter", dragenter, false);
dropbox.addEventListener("dragover", dragover, false);
dropbox.addEventListener("drop", drop, false);


Enter fullscreen mode Exit fullscreen mode

Associated event listeners:


function drop(e){
  e.stopPropagation();
  e.preventDefault();
  // console.log(e, "drop")

}

function dragenter(e){
  e.stopPropagation();
  e.preventDefault();
  // console.log(e, "enter")
}

function dragover(e){
  e.stopPropagation();
  e.preventDefault();
  // console.log(e, "over")
}
Enter fullscreen mode Exit fullscreen mode

On event drop, when the user drops a file inside the div, we want to get that file:

function drop(e){
  e.stopPropagation();
  e.preventDefault();

  const dt = e.dataTransfer;
  const files = dt.files;
  if(files[0]){
   // get the actual uploaded file
   extractFile(files[0])
   dropbox.classList.add("hide")
  }

}

Enter fullscreen mode Exit fullscreen mode

Implementing exctractFile :

function extractFile(file){
  const reader = new FileReader();

  reader.onload = (evt) => {
    // console.log(evt.target.result);
     let img = document.createElement("img")
     img.src = evt.target.result;
     img.width = "600"
     img.height = "400"
     // display the image
     document.querySelector('#app').appendChild(img)
     Optical(evt.target.result)

  };
  // read the image with file Reader
   reader.readAsDataURL(file)



}

Enter fullscreen mode Exit fullscreen mode

We want to display the image in the browser as a preview, before we pass it to tesseract for Optical Character Recognition.

Implementing Optical:

import Tesseract, { createWorker } from 'tesseract.js';

async function Optical(img){

const worker = await createWorker({
  logger: m => console.log(m)
});

(async () => {
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize(img);
  const div = document.createElement("div")
  div.width = "200px"
  div.height = "400px"
  div.margin = ".4em"
  div.padding = ".4em"
  div.style.color = "black"
  // console.log(text);
  div.innerHTML = text
  // console.log(div)
  document.querySelector('#app').appendChild(div)
  await worker.terminate();
})();

}
Enter fullscreen mode Exit fullscreen mode

The following code creates a separate web thread to run tessarect:

const worker = await createWorker({
  logger: m => console.log(m)
});
Enter fullscreen mode Exit fullscreen mode

We only initialize the engine to recognize English, you can add other languages, if you want:

  await worker.loadLanguage('eng');
  await worker.initialize('eng');
Enter fullscreen mode Exit fullscreen mode

Recognizing the text:

 const { data: { text } } = await worker.recognize(img);
Enter fullscreen mode Exit fullscreen mode

Displaying the results:

  div.width = "200px"
  div.height = "400px"
  div.margin = ".4em"
  div.padding = ".4em"
  div.style.color = "black"
  // console.log(text);
  div.innerHTML = text
  // console.log(div)
  document.querySelector('#app').appendChild(div)
Enter fullscreen mode Exit fullscreen mode

And finally terminating tesseract(freeing memory):

 await worker.terminate();
Enter fullscreen mode Exit fullscreen mode

that's all it takes to extract text from an image;
with some creativity, you can do more or even "train" tesseract with your own language data.

This example was straightforward; we used just one image to extract the text. Tesseract has the potential for more capabilities, requiring some research and creativity, and its performance is directly influenced by image quality. I will post a more extended variation involving live image editing on ko-fi for those interested. Be sure to check it out!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .