Under the Hood
The story begins when I got a call from one startup for a job opportunity, it was last year in mid-August.
I received a call from a company that works on data scraping a lot for competitors' analysis of products in the e-commerce business.
In easy words, the company scraps tonnes of data to analyse the market value for a specific product.
How companies generated revenue was my concern and when I asked them they simply said that we sold data to big companies.
Now that was cool, data scraping is not hard neither it’s will be because of OpenAI and GPT and now one can easily sell the data and make money.
But wasn’t as easy as it sounds because data scraping is not the only thing getting sold the complete market analysis of data along with crucial output information about the products, pricing, rating, and reviews is expected to be delivered and that will change the game.
E-Commerce Product Pricing Data
But the initial steps are easy to execute and this might become the small go-to project for a lot of you backend developers.
- All you need is to pick 4/5 e-commerce platforms in your country or worldwide.
- Scrap them and fetch the price and complete details of the top 10 products such as clothing, shirts, accessories, food items and so on.
- Lastly, just compare the same product category prices among the 4/5 platforms. Repeat the same process for more products and platforms to create an extensive list of competitor's prices for a specific and each kind of product.
Not just for Analysis
Scraping is not only used to understand and analyse the competitor this has become the new oil of the century. Since data is the most important thing now and scraping helps to collect the data scraping is a long-form game.
Scraping needs time to clean the data and not forget the bypassing of the firewall and protected pages.
- Scraping is widely used to improve SEO by understanding pain points in the website hindering the improvements in its Google ranking.
- Scraping is used to aggregate data for training also
- Scraping is used in prediction by understanding the trends
Why Scraping Should be Learning
Understand in this way, if you are a developer and you are applying for jobs why do you want to search again and again on LinkedIn and other platforms?
Automate this repetitive process using scraping and AI.
Your next idea needs some data-backed decisions to be made scrap the data regarding the same and make good choices.
I am using chat GPT a lot for data-backed decisions but I do need my GPT to have access to my choice of data and answer accordingly.
Scrapping with GPT can create task automation with few lines of code and that can also become your next project for your resume.
Tweets Scraper
Twitter is a very good platform to scrap tonnes of data.
From websites, images, profile pictures, and content all of those kinds of content or data can be extracted from Twitter.
Scraping Twitter is not easy we need to categorise it accordingly, for example
- Get the latest tweets about the Shadcn UI library
- Get the latest tweets about the tldraw.com website
- Get the latest blogs on OpenAI API integration
- Get top tweets about Langchain and LLM framework The above tweets provide tonnes of data with links, images, and text as the data type.
This simple tool will give enough information about what’s trending in a particular domain and which library is trending and gaining popularity.
We can easily scrap jobs tweets or tweets containing jobs as the hashtag to find the latest available jobs.
I can think of tonnes of ways to scrap the data from Twitter and use it wherever I want.
Real World Example
Recently, I want to decide which colour of my first t-shirt clothing brand should be selected.
I stumbled upon Google and other platforms and used one API to get the best colour of 2023 for a t-shirt brand and finally decided on the one.
But the story is you can use Twitter and other e-commerce websites' scrapped data to make sharp decisions accordingly.
Another example is that I want to finalise the product I should use to start my clothing brand online.
I collected the top searched, google and loved websites for online clothing business and later on chose the one of my choice but reaching this decision again needed data and for that scraping is a good option.
Scraping can also help to make the best choice because you have multiple options post-scraping.
Sometimes I need to decide which coffee is better and having a coffee aggregator platform for the same will be so helpful and scraping is the key in making such a platform.
Aggregation APIs are becoming so popular because tonnes of websites, and IG pages are available. Most of us are somehow content creators and so many websites confuse us decreasing decision-making capacity and a single aggregator platform for such bombardment information is and will become a must.
Because only data scraping helps daily.dev become the top platform for developers.
daily.dev | Where developers grow together
We need data aggregation platforms in multiple domains such as
News platforms (so many in numbers)
E-commerce Comparison platform (for making sensible choices)
Fast and accurate availability of data for any domain of agriculture
The last one is also important, for example, if you are working in the agriculture domain and need assistance with understanding crops for hybrid cultivation again you need data and most of the time data is not available easily so scraping becomes the last choice.
Something goes with other domains, not just agriculture, why you should search for jobs on LinkedIn, simply scrap the angelist or even LinkedIn and in one go, all the jobs posted will be under your table.
This jobs data platform is serving jobs API for $80/month 😄 and they are using scraping via ATS under the hood to scrap the jobs posted recently.
Simple Job Data API
So scraping is not dead and for training AI models, especially your custom AI model we will need data and scraping becomes the last choice.
I want to hire some developers a few months down the line to scrap tonnes of data in future for me in all the domains I want to run my businesses in future.
Conclusion
Scraping is not dead yet.
Scraping is not only for analysis instead it’s more than that.
Scraping with AI is a deadly combination for automating any task on the computer.
Scraping data is not going anywhere because of its usage and application in almost every domain in the world.
Scraping algorithms are easy to make but hard to fine-tune the data into something tangible.
See you in the next one.
Shrey
iHateReading