Powerful, featureful applications directly on the web?
Meet WebAssembly (Wasm), the technology that single-handedly opened a new world for the web.
It's a binary instruction format for a stack-based virtual machine,
and can be a compilation target for many programming languages, including C/C++,
Yes, you read that right – C/C++ on the web!
In short WebAssembly is efficient, fast, and safe,
capable of bringing the world of many programming languages to the web.
In this article we will explore one incredible example: tessaract.js – written in C/C++, ported to WebAssembly.
This means we can reap all the benefits of a systems language while running directly in the browser at near-native speed.
no need for a separate server.
tesseract.js
To get started, let's start a vanilla project. I personally use vite and TypeScript, you can opt for JavaScript if you prefer:
npm create vite@latest
install tesseract:
npm i tesseract.js
To run the application:
npm run dev
We need the shell only, remove the excess html, in favor of the ff:
//main.ts
document.querySelector('#app').innerHTML = `
<div id="drop">
Drop Image file
</div>
`
The following is all the css we are going to need, replace it in style.css
:
:root {
font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
line-height: 1.5;
font-weight: 400;
color-scheme: light dark;
color: rgba(255, 255, 255, 0.87);
background-color: white;
font-synthesis: none;
text-rendering: optimizeLegibility;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
-webkit-text-size-adjust: 100%;
}
body {
margin: 0;
display: flex;
place-items: center;
min-width: 320px;
min-height: 100vh;
}
#app {
max-width: 1280px;
height: 100vh;
margin: 0 auto;
padding: 2rem;
text-align: center;
display: flex;
flex-direction: column;
gap: 5em;
align-items: center;
}
#drop{
display: grid;
place-content: center;
width: 500px;
height: 200px;
background: white;
color: black;
box-shadow: 0 10px 20px rgba(0,0,0,0.19), 0 6px 6px rgba(0,0,0,0.23);
}
.hide{
display: none!important;
}
img {
object-fit: contain;
}
To accept the image from the user, we will use a drag and drop zone, instead of file input,
because it's easier to style, but in a production application, for a good UX both are preferable
navigate to main.ts
, let's create a div with id "drop" as our drop zone:
let dropbox;
dropbox = document.getElementById("drop");
// console.log(dropbox)
dropbox.addEventListener("dragenter", dragenter, false);
dropbox.addEventListener("dragover", dragover, false);
dropbox.addEventListener("drop", drop, false);
Associated event listeners:
function drop(e){
e.stopPropagation();
e.preventDefault();
// console.log(e, "drop")
}
function dragenter(e){
e.stopPropagation();
e.preventDefault();
// console.log(e, "enter")
}
function dragover(e){
e.stopPropagation();
e.preventDefault();
// console.log(e, "over")
}
On event drop
, when the user drops a file inside the div, we want to get that file:
function drop(e){
e.stopPropagation();
e.preventDefault();
const dt = e.dataTransfer;
const files = dt.files;
if(files[0]){
// get the actual uploaded file
extractFile(files[0])
dropbox.classList.add("hide")
}
}
Implementing exctractFile
:
function extractFile(file){
const reader = new FileReader();
reader.onload = (evt) => {
// console.log(evt.target.result);
let img = document.createElement("img")
img.src = evt.target.result;
img.width = "600"
img.height = "400"
// display the image
document.querySelector('#app').appendChild(img)
Optical(evt.target.result)
};
// read the image with file Reader
reader.readAsDataURL(file)
}
We want to display the image in the browser as a preview, before we pass it to tesseract for Optical Character Recognition.
Implementing Optical
:
import Tesseract, { createWorker } from 'tesseract.js';
async function Optical(img){
const worker = await createWorker({
logger: m => console.log(m)
});
(async () => {
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(img);
const div = document.createElement("div")
div.width = "200px"
div.height = "400px"
div.margin = ".4em"
div.padding = ".4em"
div.style.color = "black"
// console.log(text);
div.innerHTML = text
// console.log(div)
document.querySelector('#app').appendChild(div)
await worker.terminate();
})();
}
The following code creates a separate web thread to run tessarect:
const worker = await createWorker({
logger: m => console.log(m)
});
We only initialize the engine to recognize English, you can add other languages, if you want:
await worker.loadLanguage('eng');
await worker.initialize('eng');
Recognizing the text:
const { data: { text } } = await worker.recognize(img);
Displaying the results:
div.width = "200px"
div.height = "400px"
div.margin = ".4em"
div.padding = ".4em"
div.style.color = "black"
// console.log(text);
div.innerHTML = text
// console.log(div)
document.querySelector('#app').appendChild(div)
And finally terminating tesseract(freeing memory):
await worker.terminate();
that's all it takes to extract text from an image;
with some creativity, you can do more or even "train" tesseract with your own language data.
This example was straightforward; we used just one image to extract the text. Tesseract has the potential for more capabilities, requiring some research and creativity, and its performance is directly influenced by image quality. I will post a more extended variation involving live image editing on ko-fi for those interested. Be sure to check it out!