A little confused who's user to follow in dev.to for your newsletter?
This project uses an LLM to summarize and generate an insight about a user's post topics. Also, it can provide an insight about the user's posts and relevance to your prefereed topics, making it easy to choose which user's to follow for your blog feed. This project uses two models,
@cf/facebook/bart-large-cnn to summarize the post content
@hf/mistral/mistral-7b-instruct-v0.2 to paraphrase the post summaries and to generate post relevancy
I started this app in a creative fever, nearing the end of the challenge. I originally wanted to submit just one, but then new ideas started coming at the last minute, and now here we are.
This time, my idea is to use LLM models to summarize a dev.to user posts and giving me insight about the key topics of his/her posts to determine whether his/her posts apply to my liking. This should help me pick which users to follow and align with my interests.
The process starts with using the dev.to API to get the latest articles posted by a certain dev.to username, then I scraped the article content and summarized it using a BART model. Then, Mistral 7b further summarizes this summary, and if provided, the app also suggests whether the topics in those articles align with your preference.
Models used:
Text Summarization: @cf/facebook/bart-large-cnn to summarize the post content
Text Generation: @hf/mistral/mistral-7b-instruct-v0.2 to paraphrase the post summaries and to generate post relevancy
Same as before, I built the web app using Vite, React, and Mantine and the backend using CloudFlare Page Functions.
What I Learned
In this app, I performed a web scraping using CloudFlare Worker and one part of it is to extract text from HTML. Most libraries available at npm usually require DOM manipulation to extract the text, but in this case, we don’t have access to that. Fortunately, CloudFlare Worker runtime has another solution for this, the HTMLRewriter. Originally, HTMLRewriter is intended to transform HTML, not for data extraction. But fortunately, we can use it for extraction. Granted, this is the first time I used the API, but it is surprisingly simple to use.
The future plan for this project is to integrate it with other CloudFlare services. For example, automatically scraping and summarizing an article using CRON triggers, and then sending it to an email or database. This way, we can have our own personalized newsfeed.