System design: TinyURL

Jayaprasanna Roddam - Oct 7 - - Dev Community

1. Introduction to TinyURL

A URL shortening service converts long URLs into short, unique URLs. When a user provides a long URL, the service generates a shortened version (e.g., from https://www.example.com/some/really/long/url to https://tinyurl.com/abc123). When the short URL is clicked, it redirects the user to the original long URL.

2. Functional and Non-functional Requirements

Functional Requirements:

  1. Shorten URL: Convert a long URL to a shortened version.
  2. Redirect: When someone accesses the short URL, they should be redirected to the original long URL.
  3. Custom Alias (Optional): Users can create custom short URLs if desired (e.g., https://tinyurl.com/myAlias).

Non-Functional Requirements:

  1. Scalability: The system should handle billions of URL requests daily.
  2. Availability: It should be highly available and responsive.
  3. Latency: Shortened URL redirection should occur with minimal delay.
  4. Fault Tolerance: The system should handle failures gracefully.

3. Scale Estimation and Traffic Considerations

Let’s estimate the scale of the system to understand the volume it needs to handle:

  • Number of Users: Let’s assume 1 billion users over a few years.
  • URLs Shortened per Day: Assume around 100 million new URLs shortened per day globally.
  • Number of Redirects: Each shortened URL could be clicked many times. Assuming each URL is accessed 10 times on average, we’ll have 1 billion redirects per day.
  • Data Retention: Assume we retain all URLs indefinitely, meaning the database size grows with time.

4. URL Generation Strategy

The primary goal is to generate a short, unique identifier for each URL. We aim to generate a small, readable identifier while ensuring uniqueness.

Base62 Encoding:

  • We need to generate unique short strings composed of digits (0-9), uppercase (A-Z), and lowercase letters (a-z), giving us 62 possible characters for each position.
  • To calculate how many short URLs we can generate for a given string length L:

[
N = 62^L
]

  • For a string of length 6: [ N = 62^6 \approx 56.8 \text{ billion unique URLs}. ]
  • For a string of length 7: [ N = 62^7 \approx 3.5 \text{ trillion unique URLs}. ]

Given that we can generate 56.8 billion URLs with just 6 characters, a string of length 6 should be more than enough to handle global scale. If the system grows, we can easily extend to 7 characters.

URL Generation Techniques:

  • Auto-Incrementing IDs:

    • Each new URL is assigned a unique ID from a sequence (e.g., 1, 2, 3, …).
    • We then convert this ID to Base62 to create the short URL. For example, ID 12345 becomes abc123.
    • This approach ensures a unique, collision-free short URL.
  • Randomized Strings (Alternative):

    • Another option is to randomly generate a 6-character string, then check the database to ensure it’s unique before saving it. However, this could lead to collisions, so using a counter-based approach is preferable for simplicity and efficiency.

5. Database Design

We need to store two key pieces of data for each URL:

  1. Short URL (Base62 string): The unique identifier.
  2. Original Long URL: The long URL that the short one maps to.

A simplified schema could look like this:

ID (Auto-increment) Short URL (Base62) Long URL (text) Creation Time
1 abc123 http://example.com 2024-10-07

Key Database Operations:

  1. Insert Operation: When a new long URL is shortened, we insert a new row with the auto-incrementing ID and corresponding Base62 short URL.
  2. Read Operation: When someone accesses the short URL, we look up the corresponding long URL and return it for redirection.

Optimizing Reads and Writes:

  • Caching: To improve performance, frequently accessed URLs can be cached using a service like Redis. This significantly reduces the load on the database for popular URLs.
  • Partitioning: Given the massive scale, we can use horizontal partitioning (sharding) to split the data across multiple database instances.

6. Handling Redirects

The core functionality of the service is to redirect users from a short URL to the original long URL. Here’s how it works:

  1. The user enters the short URL in their browser.
  2. The system extracts the unique identifier (e.g., abc123) from the URL.
  3. A lookup is performed in the database (or cache) to retrieve the corresponding long URL.
  4. The user is redirected to the long URL using an HTTP 302 (Found) or HTTP 301 (Permanent Redirect) response.

7. Scaling the Service

Scaling Reads:

As the system will experience billions of read requests (URL redirects), optimizing for reads is critical. Techniques include:

  • Read Replicas: Deploy multiple read replicas of the database to distribute the load.
  • Caching: Use Redis or Memcached to cache popular URL mappings in memory for quick retrieval.

Scaling Writes:

Although the number of new URLs (writes) is much lower than reads, writes still need to be efficiently handled.

  • Write Sharding: We can shard the database based on the auto-incrementing ID to distribute writes across multiple database servers.

8. Failure Handling

To ensure the system is fault-tolerant:

  • Database Replication: Use master-slave replication for databases to handle failures. If the primary database goes down, a replica can take over.
  • Load Balancing: Distribute traffic across multiple instances of the application to handle sudden spikes or server failures.
  • Backup Strategy: Regular backups of the database ensure data is not lost in case of catastrophic failure.

9. Additional Features

Expiry of URLs (Optional):

Some URL shortening services allow URLs to expire after a certain period. This feature can be implemented by adding an expiry_time column to the database schema, which will check before redirection if the URL has expired.

Analytics (Optional):

Track the number of times each short URL is accessed. This can be useful for users and also for scaling decisions (e.g., which URLs to cache).

10. Mathematical Calculations and Capacity Planning

Let’s go through the key mathematical calculations involved:

  1. Base62 Calculation: We already calculated that with 6 characters, we can generate 56.8 billion unique URLs. This should be sufficient for several years.

  2. Storage Requirements:

    • Assume that each URL mapping requires 1 KB of storage (including metadata).
    • With 100 million URLs shortened per day, after 1 year (365 days), we’ll have: [ 100 \times 10^6 \times 1 \text{ KB} \times 365 = 36.5 \text{ TB} ]
    • So, after 1 year, we’d need around 36.5 TB of storage for URL mappings, which is manageable with modern distributed storage systems.
  3. Read Capacity:

    • If we expect 1 billion redirects per day, and if each read request takes 1 ms to process, we’ll need: [ 1 \text{ billion requests} \times 1 \text{ ms/request} = 1,000,000 \text{ seconds} ] of processing time. This would require multiple read replicas and caching to handle traffic at this scale.

Conclusion

This is a high-level design for a URL-shortening service like TinyURL. We’ve covered everything from the URL generation strategy (Base62 encoding, unique identifiers) to database design, scaling considerations, and performance optimizations. We also factored in scalability, redundancy, and caching to ensure the service can handle billions of requests daily.

This design ensures the system is scalable, efficient, and reliable, addressing both functional and non-functional requirements.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .