Welcome to my weekend project, KhataLook!

For those unfamiliar with Hindi, "KhataLook" combines two words: Khata, meaning ledger, and Look, meaning search. So, KhataLook translates to "looking through the ledger" — but with a modern twist.

Borrowed money is a significant challenge for retail shop owners, and keeping track of it can be a cumbersome task. This inspired me to create KhataLook, a face recognition system designed to simplify this process. With KhataLook, shop owners can register faces of borrowers in the system, and when a recognized face enters the store, the system automatically announces the amount pending 💀

So, without wasting much time procrastinating, I built a prototype for this idea, and here it is!

GitHub Repository: KhataLook

Made the Logo using Canva

Project Overview

KhataLook allows shop owners to get a relief from remembering the people with borrows.
For the prototype, I decided to build a web application.
Tech Stack:

Frontend: React
Face Recognition: face-api.js
Database: Cloud Firestore
Text-to-Speech conversion: Google Cloud TTS API

Project Setup and Dependencies

As soon as I initiated the React project, First step was to install dependencies.

Material UI (I just love it)
Axios
face-api.js
Firebase
React-Router

Database Setup

I chose to move on with NoSQL databases. As I am a Google Cloud Enthusiast, I went ahead with Cloud Firestore

Initiated a Firebase Project

Went to Run > Cloud Firestore and created a database

My database was ready within minutes

Went to Project Settings > Add app and copied my Firebase Configuration

// Sample Configuration
import { initializeApp } from "firebase/app";
import { getFirestore } from "firebase/firestore";

const firebaseConfig = {
  apiKey: "YOUR_API_KEY",
  authDomain: "YOUR_PROJECT_ID.firebaseapp.com",
  projectId: "YOUR_PROJECT_ID",
  storageBucket: "YOUR_PROJECT_ID.appspot.com",
  messagingSenderId: "YOUR_MESSAGING_SENDER_ID",
  appId: "YOUR_APP_ID",
};

const app = initializeApp(firebaseConfig);
const db = getFirestore(app);

export { db };

Face Registration

Initiated with downloading the models and putting them in the public/models folder

Loading the models

// Load face-api.js models
export const loadModels = async () => {
  await faceapi.nets.ssdMobilenetv1.loadFromUri('/models');
  await faceapi.nets.faceLandmark68Net.loadFromUri('/models');
  await faceapi.nets.faceRecognitionNet.loadFromUri('/models');
};

Fired up the video stream

const startVideo = () => {
    navigator.mediaDevices
        .getUserMedia({ video: true })
        .then((stream) => {
            videoRef.current.srcObject = stream;
            videoRef.current.play();
            setCapturing(true);
        })
        .catch((err) => {
            console.error("Error accessing the webcam: ", err);
        });
};

Helper function to push data

export const registerFace = async (name, mobileNumber, amount_pending, descriptor) => {
  try {
    await addDoc(collection(db, 'users'), {
      name,
      mobileNumber,
      amount_pending,
      faceDescriptor: Array.from(descriptor),  // Convert Float32Array to array
    });
    alert('Face registered successfully!');
  } catch (e) {
    console.error('Error adding document: ', e);
  }
};

And TaDa

Face Recognition

Fired up the video stream from camera, Pulled all the registered faces from the database and searched through them

const recognizeFace = async () => {
    const context = canvasRef.current.getContext('2d');
    context.drawImage(videoRef.current, 0, 0, canvasRef.current.width, canvasRef.current.height);

    const descriptor = await detectFace(canvasRef.current);
    if (!descriptor) return;

    // Compare the detected face with registered faces
    const faceMatcher = new faceapi.FaceMatcher(registeredFacesRef.current.map(face => new faceapi.LabeledFaceDescriptors(
        face.name, [face.faceDescriptor])), 0.6); // Set a threshold for similarity

    const bestMatch = faceMatcher.findBestMatch(descriptor);
    if (bestMatch.label !== 'unknown') {
        const recognizedFace = registeredFacesRef.current.find(face => face.name === bestMatch.label);
        setRecognizedName(recognizedFace.name);
        setAmountPending(recognizedFace.amount_pending);
        playAudioMessage(recognizedFace.name, recognizedFace.amount_pending);
    } else {
        setRecognizedName('Face not recognized');
        setAmountPending(0);
    }
};

And here the result is

Audio Announcement

Went straight to Google TTS AI, Created an API Key

Pasted the API Key in this helper function

const getSpeechAudio = async (text) => {
    try {
        const response = await axios.post(
            `https://texttospeech.googleapis.com/v1/text:synthesize?key=YOUR_API_KEY_HERE`,
            {
                input: { text: text },
                voice: { languageCode: 'hi-IN', ssmlGender: 'FEMALE' }, // Language and gender of the voice
                audioConfig: { audioEncoding: 'MP3' },
            }
        );
        const binaryString = window.atob(response.data.audioContent);
        const binaryLen = binaryString.length;
        const bytes = new Uint8Array(binaryLen);
        for (let i = 0; i < binaryLen; i++) {
            bytes[i] = binaryString.charCodeAt(i);
        }
        return bytes.buffer;
    } catch (error) {
        console.error('Error generating speech:', error);
        return null;
    }
};

Finally played the audio

const playAudioMessage = async (name, amount) => {
    let message = '';

    // This text is in Hindi, Modify it as per your preference
    message = `${name} Ji, aapke ${amount} rupye udhaar hai.`; // Default message

    const audioContent = await getSpeechAudio(message);

    if (audioContent && audioRef.current) { // Check if audioRef is defined
        const audioBlob = new Blob([audioContent], { type: 'audio/mp3' });
        const audioUrl = URL.createObjectURL(audioBlob);
        audioRef.current.src = audioUrl;
        audioRef.current.play();
    } else {
        console.error('Audio element is not available or audio content is invalid.');
    }
};

Clone the project, Set it up on your local system and listen to the audio and use the application by yourself.

Conclusion

While it was just a very simple prototype of a random spontaneous idea.

If anyone is interested in taking this project further, feel free to contribute by submitting a pull request.

I’d love to hear your thoughts in the comments! Let me know if you think this idea is feasible.

Could just be another fun project or maybe a popular product !

KhataLook - Face Recognition Meets Retail Debt Tracking in React