Have you ever wondered how LinkedIn knows you and your potential connections are just a few clicks away? It's neither magic, nor some clever engineering hidden behind the scenes. What if I tell you that it's just Graph? Yes It is!
So let's dive into the fascinating world of graph algorithms and see how LinkedIn connects you to your professional network!
And don't worry It's not going to be too complex.
Beyond 1st, 2nd, and 3rd Degrees:
We all know those little icons next to LinkedIn profiles indicating our connection level (1st, 2nd, or 3rd degree). But how does LinkedIn calculate these connections? It all starts with a powerful tool called a graph algorithm. Imagine a giant map where people are represented as dots and connections are lines. This complex map, known as a graph, stores information about who's connected to whom.
How is it possible with millions of users and connections?
The challenge of scaling is real as navigating this map efficiently becomes a challenge. To overcome this, LinkedIn uses a special type of graph algorithm called bi-directional BFS (Breadth-First Search). This algorithm simultaneously searches from you and your potential connection, meeting somewhere in the middle to determine your connection level.
But wait, there's more! As you interact and your network grows, constantly searching the entire graph becomes impractical. So, LinkedIn employs a clever caching strategy. It stores your second-degree connections (friends of your friends) locally, allowing for faster lookups without needing to traverse the entire network every time.
Scaling to Millions:
Imagine storing everyone's second-degree connections on a single server! Not feasible. That's why LinkedIn distributes this data across multiple servers, dividing it based on user IDs. But what happens if a server crashes? To ensure redundancy, each shard (portion of data) is replicated on different servers.
Real twist: Speed vs Efficiency
Now comes the real twist. While replicating data ensures availability, it also adds complexity. To avoid hitting every server for each query, LinkedIn uses a technique called set cover. This fancy term basically means finding the smallest number of servers that hold all the information needed for your query, minimizing the number of hops and maximizing speed.
The Secret Sauce: Greedy Set Cover:
LinkedIn uses a modified version of the greedy set cover algorithm, prioritizing servers that hold connections most relevant to your search. Think of it like finding the shortest route on a map by visiting only the essential points. This clever approach reduces the number of servers needed, making queries faster and more efficient.
The End Result: A Connected us!
Thanks to these complex algorithms and clever caching strategies, LinkedIn can efficiently navigate its massive network and show you relevant connections within milliseconds. So, the next time you see those degree icons, remember the invisible technology working tirelessly to connect you with your professional world!
And for the tech-savvy:
A question: Does Facebook's 'Friends of friends' feature work the same way?
Or do they use TAO(Memcached) or something else?
What do you think about it?
If you're curious about the nitty-gritty, the research paper linked in the original content delves deeper into the specific algorithms and optimizations used by LinkedIn.
But for everyone else, hopefully, this blog has shed some light on the magic behind those connection degrees!
There still might be things I am missing on so do let me know in comments.
Inspired by various online discussions and Gaurav Sen.
If you enjoyed this blog, you can follow me on:
It's Valentine Day and I'm not feeling lonely because my keyboard is definitely getting touched tonight XD
#SingleCodersDay
If you'd like to support me, you can sponsor me on GitHub or buy me a coffee.