Hey everyone! In this article, we'll be creating a simple application to perform image text detection using AWS Rekognition with Node.js.
What is AWS Rekognition?
Amazon Rekognition is a service that makes it easy to add image and video analysis to your applications. It offers features like text detection, facial recognition, and even celebrity detection.
While Rekognition can analyze images or videos stored in S3, for this tutorial, we'll be working without S3 to keep things simple.
We'll be using Express for the backend and React for the frontend.
First Steps
Before we start, you'll need to create an AWS account and set up an IAM user. If you already have these, you can skip this section.
Creating IAM user
- Log in to AWS: Start by logging into your AWS root account.
- Search for IAM: In the AWS console, search for IAM and select it.
- Go to the Users section and click Create User.
- Set the user name, and under Set Permissions, choose Attach policies directly.
- Search for and select the Rekognition policy, then click Next and create the user.
- Create Access Keys: After creating the user, select the user, and under the Security credentials tab, create an access key. Be sure to download the .csv file containing your access key and secret access key.
- For more detailed instructions, refer to the official AWS documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html
Configuring aws-sdk
- Install AWS CLI: Install the AWS CLI on your system.
-
Verify Installation: Open a terminal or command prompt and type
aws --version
to ensure the CLI is installed correctly. -
Configure the AWS CLI: Run
aws configure
and provide the access key, secret access key, and region from the .csv file you downloaded.
Project Directory
my-directory/
│
├── client/
│ └── src/
│ └── App.jsx
│ └── public/
│ └── package.json
│ └── ... (other React project files)
│
└── server/
├── index.js
└── rekognition/
└── aws.rek.js
Setting up frontend
npm create vite @latest . -- --template react
it will create the react project in the client folder.
In the App.jsx
import { useState } from "react";
function App() {
const [img, setImg] = useState(null);
const handleImg = (e) => {
setImg(e.target.files[0]); // Store the selected image in state
};
const handleSubmit = (e) => {
e.preventDefault();
if (!img) return;
const formData = new FormData();
formData.append("img", img);
console.log(formData); // Log the form data to the console
};
return (
<div>
<form onSubmit={handleSubmit}>
<input type="file" name="img" accept="image/*" onChange={handleImg} />
<br />
<button type="submit">Submit</button>
</form>
</div>
);
}
export default App;
Let's test this out by ensuring the image is logged to the console after submitting.
Now, Let's move to backend and start making the soul, for this project.
Initializing the backend
in the server folder
npm init -y
npm install express cors nodemon multer @aws-sdk/client-rekognition
I have created a separate folder for rekognition, to handle analyzing logic and create a file inside that folder.
//aws.rek.js
import {
RekognitionClient,
DetectTextCommand,
} from "@aws-sdk/client-rekognition";
const client = new RekognitionClient({});
export const Reko = async (params) => {
try {
const command = new DetectTextCommand(
{
Image: {
Bytes:params //we are using Bytes directly instead of S3
}
}
);
const response = await client.send(command);
return response
} catch (error) {
console.log(error.message);
}
};
Explanation
- We initialize a
RekognitionClient
object. Since we've already configured the SDK, we can leave the braces empty. - We create an async function
Reko
to process the image. In this function Initalize aDetectTextCommand
object, which takes an image in Bytes. - This
DectedTextCommand
is specifically used for text detection. - The function waits for a response and returns it.
Creating the API
In the server folder, create a file index.js
or what ever name you want.
//index.js
import express from "express"
import multer from "multer"
import cors from "cors"
import { Reko } from "./rekognition/aws.rek.js";
const app = express()
app.use(cors())
const storage = multer.memoryStorage()
const upload = multer()
const texts = []
let data = []
app.post("/img", upload.single("img"), async(req,res) => {
const file = req.file
data = await Reko(file.buffer)
data.TextDetections.map((item) => {
texts.push(item.DetectedText)
})
res.status(200).send(texts)
})
app.listen(3000, () => {
console.log("server started");
})
Explanation
- Initializing the express and starting the server.
- We are using the multer to handle the multipart form data, and storing it temporarily in the Buffer.
- Creating the post request to get the image from the user. this is an async function.
- After the user uploads the image, the image will be available in the
req.file
- This
req.file
contains some properties, in that there will be a Buffer property that holds our image data as an 8-bit buffer. - We need that so we are passing that
req.file.buffer
to therekognition
function. after analyzing it, the function returns the array of objects. - We are sending the texts from those objects to the user.
Coming back to frontend
import axios from "axios";
import { useState } from "react";
import "./App.css";
function App() {
const [img, setImg] = useState(null);
const [pending, setPending] = useState(false);
const [texts, setTexts] = useState([]);
const handleImg = (e) => {
setImg(e.target.files[0]);
};
const handleSubmit = async (e) => {
e.preventDefault();
if (!img) return;
const formData = new FormData();
formData.append("img", img);
try {
setPending(true);
const response = await axios.post("http://localhost:3000/img", formData);
setTexts(response.data);
} catch (error) {
console.log("Error uploading image:", error);
} finally {
setPending(false);
}
};
return (
<div className="app-container">
<div className="form-container">
<form onSubmit={handleSubmit}>
<input type="file" name="img" accept="image/*" onChange={handleImg} />
<br />
<button type="submit" disabled={pending}>
{pending ? "Uploading..." : "Upload Image"}
</button>
</form>
</div>
<div className="result-container">
{pending && <h1>Loading...</h1>}
{texts.length > 0 && (
<ul>
{texts.map((text, index) => (
<li key={index}>{text}</li>
))}
</ul>
)}
</div>
</div>
);
}
export default App;
- Using
Axios
to post the image. and storing the response in the text's state. - Displaying the texts, for now, I am using the index as the Key, but it is not encouraged to use the Index as the key.
- I have also added some additional things like loading state and some styles.
Final Output
After clicking the "Upload Image" button, the backend processes the image and returns the detected text, which is then displayed to the user.
For the complete code, check out my: GitHub Repo
Thank You!!!