Bot Invasion To Automated Defense: My Journey With ML Deployment

Athreya aka Maneshwar - May 12 - - Dev Community

Remember the bot invasion of my newsletter? I fought back with ML, but building a fancy bot zapper wasn't in the budget. Instead, I deployed a cost-effective model to keep those pesky bots out. Here is how I did it!

In my previous article, "Bots Invaded My Newsletter. Here's How I Fought Back with ML" I shared my experience building a bot-signup detector using machine learning to tackle a surge of unwanted bot signups on my newsletter.

While it wasn't the most sophisticated solution, it was a valuable learning experience.

Few people on Reddit were curious about deploying their own models.

Since I'm still on this journey of learning ML deployment, I wanted to share what I've learned so far about finding an easy and cost-effective approach or even finding more feasible approach from you guys.

Even though I'm just starting out, hopefully this can be helpful for others who are in the same boat!

Backstory Of Identifying The Enemy

I had a bot invasion to my newsletter, I knew how the bots would look(name, email) like.

Bots

Their data were stored in DB collected from signups, I used that for the dataset.

For my weapon of choice, I picked a BERT transformer.

I trained it with a bunch of emails (144 to be exact) to learn the difference between human and bot names and emails; and it was working most of the time.

So it was all ready, just needed to deploy it live and use it in the signup process.

Preparing My Weapon To Fire Against The Enemy

Now that I had my trusty bot detector trained, it was time to figure out how to load it into the battlefield (deployment).

Here's what I learned about deploying a machine learning model in a simple and cost-effective way.

Integrating the bot detector with my newsletter signup process was an exciting adventure.

It felt like discovering a whole new system, just like writing the final line of code that unlocks a new functionality!

Previously I had a transformer which would take the name and email as the input and provide a boolean value indicating if the input signup is bot or not.

For deployment we didn't wanna spin up a new VM and a server to keep listining to the calls or at the same time didn't want our existing services to have this as a piece of them. So went in for AWS Lambda server less deployment.

Can't Use Lambda Straight Away

When I was trying to deploy the transformer model I understood, I cannot use Lambda normally. Because there will be installations like transformers, scikit-learn, and many more.

So the alternate solution was to use Lambda using Docker container images.

This was a good exploration, basically it's a Docker image which you create by installing all the pre deps whatever is necessary for you and host it as a Lambda function.

Docker Container Images (But Too Big!)

I loaded up my previously built transformer of 419 MB .bin file and installed transformers, scikit-learn and may other packages, by the time I built the image it was 9.2 GB!

Clearly that was a horrible solution for such a basic problem.

Logistic Regression - Smaller and Faster

I moved on to Logistic Regression, which took less time to train and prepare the model as compared to the transformer and crazy thing was the binary is 27 KB :D

I went on with adding deps, logic and voila 820 MB Docker image.

So I went ahead and pushed the Docker image to Elastic Container Registry.

ECR is like Docker hub where I can store the Docker image I build.

Then created a Lambda function which uses the docker image from the ECR repo I created earlier, so the cannon was prepared with the load and powder, just had to figure out the firing mechanism.

Firing The Weapon

The initial plan was to trigger the Lambda function directly using the AWS CLI or Boto3 library.

However, I needed a more user-friendly way to activate the bot detector from frontend.

This led me to explore API Gateway.

It's a good service that allows you to create a public endpoint (like a trigger point) that accepts requests and forwards them to your Lambda function behind the scenes.

This was exactly what I needed – a way to invoke the Lambda function using a simple API call.

Integrating the API Gateway with my signup form wasn't completely smooth sailing.

I encountered some challenges mapping the data received by the API Gateway to the format expected by the Lambda function.

Luckily, CloudWatch logs came to the rescue.

With its detailed logs, I could easily debug the issue and get everything working seamlessly.

Killing The Enemy

Now, whenever someone signs up for my newsletter, the API in my frontend form automatically triggers the Lambda function. Here's the magic that happens behind the scenes:

  1. The signup data is sent to the Lambda function.
  2. The function analyzes the data using the trained model to identify potential bots.
  3. If a bot is detected, the function automatically blocks the subscriber using Listmonk's built-in Block API.
    Alt text

  4. Finally, the function sends a notification to my Discord channel, keeping me informed about signup activity (including any blocked bots).

Not bot

bot

With this system in place, I've successfully automated bot detection and eliminated the need for manual intervention.

This feels like a victory in the fight against newsletter bot signups.

Continue reading How to setup a bot detector for yourself here.

Final Thoughts

This journey of deploying a machine learning model to fight newsletter bots has been a valuable learning experience.

In my previous article, "Bots Invaded My Newsletter. Here's How I Fought Back with ML ⚔️" I covered building the bot detector model.

Now, we've explored the deployment side – a crucial step for putting your model to practical use.

Here are some resources to help you get started on your own AI/ML adventure:

  • Building a Logistic Regression Model (ipynb file): Logistic_regression.ipynb (This file demonstrates how I built the simpler and more efficient logistic regression model.)
  • Lightweight Model File (23kb): dt_model_file.pkl (Feel free to download and use this pre-trained model for basic bot detection in your own newsletter signup process.)
  • Lambda Function Code Repository: bot_detect_lambda (This repository contains the code for integrating the bot detection model with AWS Lambda for a serverless deployment.)

Spread the Knowledge!

Share this blog post with your friends who are interested in getting started with AI and machine learning.

Want to learn more or connect with me?

Reddit: athreyaaaa
LinkedIn: maneshwar-athreya

. . . . . . . .