I built my own search engine

FDiaz - Nov 1 - - Dev Community

The Beninging

From building react applications to building my own search engine and web crawler for indexing. I’m happy to introduce to you to Zensearch, a search engine where you as a user have more control over what you want your searches to be, you can create entries to crawl different websites and continue using the search engine functionality if you have existing indexed data in the database while it does its work, now I know this may not be the most complex or sophisticated search engine in the world like google or brave search but I built this thing to gauge how much I can do on my own and learn as much as I can while doing it, and oh boy I’ve learned a lot.

It all started when I was building my React web application, a sort of commonplace book for inserting your favorite quotes or adding notes to that specific page as if you’re trying to converse with the author or typing down what you were thinking at that moment in time on a page that corresponds to the page of your physical book, its not a bad project but I just got so bored of building Reactjs applications, not that it’s bad but it felt like I was not going anywhere with it, there’s no technical depth into what I was doing and I was not learning anything from building those ReactJs projects.

so I tried to study about computer networking, Operating systems, Computer architecture and so on then after a few months of studying and building my own application layer protocol like a websocket where I can handle multiple users and each user can join these different rooms or namespaces where they can communicate with each other and I felt ecstatic, alive even. I felt like I could do so many things as long as I understood how the computer works eg: threads, semaphores, process, memory layout, interrupt signals etc, So I thought to myself, what projects can I do to utilize some of things that I’ve learned?

oh and I'm a self-taught btw and I used The Odin Project to learn programming and web development so shout out to those guys because they taught me how to become independent to study and refused to hand hold programmers throughout the curriculum.

Challenges

I've only been able to program using Nodejs, that was my bread and butter along with typescript, so I built the web crawler using Nodejs... pretty stupid right? I mean the plan was to create a crawler that can crawl an array of source URLs from the front-end and let each crawler send these extracted data to the database, and as we all know Yabascript is single-threaded and every asynchronous task is handled by the environment where Yavascript is running eg: browser's apis, node, deno, bun and done.

so doing multi-tasking operations using Nodejs was a suicide mission and it was, from converting the webpage object to be encoded to an 8-bit buffer but then the shared array buffer can only transport 64-bit array buffer due to data alignment so I had to convert from 8-bit buffer to 64-bit by adding some offset paddings and then back from 64-bit buffer to 8-bit buffer after sending data from the crawler to the main thread and then finally parsing it to a vajascript Object... wow that was fun, there is another way for message passing but that creates a copy of the same data that is in the crawler to the main thread so I didn't want to that since it would take so much memory.

I had to handle race conditions using nodejs' atomics module and to this day I still don't understand how that module even works to be honest and annoyed me so much so I had to turn to Golang. I love this language so much, it's so easy to create threads handle race conditions, using semaphores and wait groups, I haven't had the need to use mutex yet and I'm excited to learn it so maybe in the future, along with context would be fun to learn.

Let's move on to front-end shall we? has any of you read this article from frontend masters? You might not need that framework, remember that I said that I got bored of ReactJs? well, this made me appreciate frameworks because of their reusability and their data binding mechanisms.

I don't want to get into too much details about the front-end but I used a PubSub pattern to update any UI changes when data changes and used web components along with the shadow dom to create reusable components, the shadow dom was a pain to access in javascript and style since it is isolated from entire dom tree so accessing it using CSS and DOM API won't work, so yeah those were the only challenges I had but it was fun.. it was fun when I was migrating the crawler from Nodejs to Go.

Things to consider

There are some functionalities that I have not yet been implemented because I was so eager to show off the project but that doesn't matter to me that much even if this is an ongoing project, this won't be a one and done project I will keep improving zensearch in the future so for now here are some key things that are missing:

  • Implementing a list of already indexed websites to be displayed to the users on the front-end.

  • Save the most recently crawled web page for continuation.

  • Create cancellation for crawling but still save the indexed pages up to that point.

  • Rabbitmq's message size limitation for scaling, if a database contains more than the default size that is set in rabbitmq, the message broker will throw an error and crash, so to avoid this I will try to implement a window frame algorithm used in tcp by creating a pipelining mechanism where the array of webpages will be broken into segments and sent to the search engine by N size where N is the size of the window, I still need to think about this.

  • Give users the ability to remove their Indexed websites.

Epilogue

I would like to write more about what I learned and some nuances of my development journey but I think this would be too long, so for now I want to show off my greatest project, and I would be happy to get some feedback from you guys if you have the time and let me know if there are any problems and improvements I could do to make Zensearch better, oh and this is all thanks to theprimeagean this guy inspired me to go deeper into things and to learn the fundamentals instead of just running npm create vite@latest my-vue-app -- --template react-ts in the terminal, which admittedly made me insecure about myself as a programmer and the things that I know but because of that insecurity I've learned new things now I'm always striving to learn more things and would be happy to learn from YOUR feedback so thank you listening to my ted talk.

Github repository for Zensearch

. . .