Translating with AI - Using node/bun with Transformers.js, poolifer & Xenova/nllb-200-distilled-600M from Huggingface

Calum Knott - Dec 6 '23 - - Dev Community

Introduction

The world of software development moves quickly, so this will likley be out of date in a few months, but as of early Dec 2023, I have a working translation service running in node.

To do this i experimented with running translations sequentially, but eventually decided to try threading.

I have no idea if there is a better way of doing this - very happy for someone to present a different idea in the comments.

This isnt really intended to be a tutorial, but more like notes to push people in the right direction... so... a quick guide to using Xenova/nllb-200-distilled-600M with Transformers.js and Poolifer

 Demo

Here is a video showing the current (beta) implementation in my software stack

Notes :

Ideally i would use Bun, but as of 06/12/23, due to This issue I am unable to use Bun, and instead fell back to node. (This is annoying as this is part of a much larger bun monorepo

Running 4/5 threads simultainously seems to be the limit for my 'Macbook Pro M1 Max 32GB'
I havent done any real benchmarking, but it seems to use about 2GB per thread. (and i have other programs running of course, using about 13GB idle.


Translating with nllb-200 is easy, but takes some time.
This is because actually loading the model takes time, and it needs to run constantly in the background.

import { env, pipeline } from '@xenova/transformers';
env.allowRemoteModels = false;
env.localModelPath = __dirname+'/models/';

const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');

// translate from English to Dutch
const output = await translator("Hello everyone, {
        src_lang: "eng_Latn",
        tgt_lang: "nld_Latn"),
});

Enter fullscreen mode Exit fullscreen mode

Translating takes time, and is resource-intensive.
So we can use poolifer to thread requests.

Poolifer is a wrapper around node Worker Threads, but also implements a queue system, making it ideal for this kind of heavy async task

We setup Workers in our index.js script.

import { FixedThreadPool, PoolEvents } from 'poolifier'

// a fixed worker_threads pool
const pooly = new FixedThreadPool(4, '_worker.js', {
  errorHandler: (e) => console.error(e),
  onlineHandler: () => console.info('worker is online')
})

// some async function calls the translation workers (maybe a post request)
// this is not my real code so might be buggy!
let exampleFunction = async (req,res) => {
    let input = {
        text: req.body.text,
        src_lang: req.body.src_lang,
        tgt_lang: req.body.tgt_lang
    };

    return pooly.execute(input).then(result => {
        // for example send back post reply
        res.send(result)
    });
}
Enter fullscreen mode Exit fullscreen mode

And then the _worker.js script carries out the translation

...
import { ThreadWorker } from 'poolifier'
// simplified example
let job = async (input) => {
    let output = await translate(input.text,{
        src_lang: input.src_lang,
        tgt_lang: input.tgt_lang,
    });
    output = output[0].translation_text
    return output
}
export default new ThreadWorker(job) 
Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . . . . . .