Introduction
The world of software development moves quickly, so this will likley be out of date in a few months, but as of early Dec 2023, I have a working translation service running in node.
To do this i experimented with running translations sequentially, but eventually decided to try threading.
I have no idea if there is a better way of doing this - very happy for someone to present a different idea in the comments.
This isnt really intended to be a tutorial, but more like notes to push people in the right direction... so... a quick guide to using Xenova/nllb-200-distilled-600M with Transformers.js and Poolifer
Demo
Here is a video showing the current (beta) implementation in my software stack
Notes :
Ideally i would use Bun, but as of 06/12/23, due to This issue I am unable to use Bun, and instead fell back to node. (This is annoying as this is part of a much larger bun monorepo
Running 4/5 threads simultainously seems to be the limit for my 'Macbook Pro M1 Max 32GB'
I havent done any real benchmarking, but it seems to use about 2GB per thread. (and i have other programs running of course, using about 13GB idle.
Translating with nllb-200 is easy, but takes some time.
This is because actually loading the model takes time, and it needs to run constantly in the background.
import { env, pipeline } from '@xenova/transformers';
env.allowRemoteModels = false;
env.localModelPath = __dirname+'/models/';
const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');
// translate from English to Dutch
const output = await translator("Hello everyone, {
src_lang: "eng_Latn",
tgt_lang: "nld_Latn"),
});
Translating takes time, and is resource-intensive.
So we can use poolifer to thread requests.
Poolifer is a wrapper around node Worker Threads, but also implements a queue system, making it ideal for this kind of heavy async task
We setup Workers in our index.js
script.
import { FixedThreadPool, PoolEvents } from 'poolifier'
// a fixed worker_threads pool
const pooly = new FixedThreadPool(4, '_worker.js', {
errorHandler: (e) => console.error(e),
onlineHandler: () => console.info('worker is online')
})
// some async function calls the translation workers (maybe a post request)
// this is not my real code so might be buggy!
let exampleFunction = async (req,res) => {
let input = {
text: req.body.text,
src_lang: req.body.src_lang,
tgt_lang: req.body.tgt_lang
};
return pooly.execute(input).then(result => {
// for example send back post reply
res.send(result)
});
}
And then the _worker.js
script carries out the translation
...
import { ThreadWorker } from 'poolifier'
// simplified example
let job = async (input) => {
let output = await translate(input.text,{
src_lang: input.src_lang,
tgt_lang: input.tgt_lang,
});
output = output[0].translation_text
return output
}
export default new ThreadWorker(job)