Note: This is a companion problem to the System Design problem: Design TinyURL.
TinyURL is a URL shortening service where you enter a URL such as https://leetcode.com/problems/design-tinyurl
and it returns a short URL such as http://tinyurl.com/4e9iAk
. Design a class to encode a URL and decode a tiny URL.
There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.
Implement the Solution
class:
Solution()
Initializes the object of the system.String encode(String longUrl)
Returns a tiny URL for the given longUrl
.String decode(String shortUrl)
Returns the original long URL for the given shortUrl
. It is guaranteed that the given shortUrl
was encoded by the same object.Example 1:
Input: url = "https://leetcode.com/problems/design-tinyurl" Output: "https://leetcode.com/problems/design-tinyurl" Explanation: Solution obj = new Solution(); string tiny = obj.encode(url); // returns the encoded tiny url. string ans = obj.decode(tiny); // returns the original url after decoding it.
Constraints:
1 <= url.length <= 104
url
is guranteed to be a valid URL.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The simplest approach is to generate a random short code for each long web address. If we happen to generate a code that we've already used, we just keep trying to generate new random codes until we find one that's available. We'll maintain a record of which short code corresponds to which long address.
Here's how the algorithm would work step-by-step:
class Codec:
def __init__(self):
self.url_database = []
self.alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
self.short_url_length = 6
def encode(self, longUrl: str) -> str:
while True:
# Keep generating random names until we find one that is not already in our database.
potential_short_name = "".join([random.choice(self.alphabet) for _ in range(self.short_url_length)])
is_name_taken = False
for _, existing_short_name in self.url_database:
if existing_short_name == potential_short_name:
is_name_taken = True
break
if not is_name_taken:
# Once an unused short name is found, we record the pair in our 'notebook' for future lookups.
self.url_database.append((longUrl, potential_short_name))
return "http://tinyurl.com/" + potential_short_name
def decode(self, shortUrl: str) -> str:
short_code_to_find = shortUrl.replace("http://tinyurl.com/", "")
# To find the original address, we must look through every entry one-by-one, just like a notebook.
for original_address, short_code in self.url_database:
if short_code == short_code_to_find:
return original_address
return ""
The strategy is to assign a unique, ever-increasing number to each new long web address, much like a ticket at a coat check. This number is then converted into a very short string, and we simply remember which short string goes with which long address, guaranteeing no two addresses get the same short version.
Here's how the algorithm would work step-by-step:
class Codec:
def __init__(self):
self.alphabet_characters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
self.base_url_prefix = "http://tinyurl.com/"
self.long_urls_list = []
self.url_to_short_code_map = {}
def encode(self, long_url: str) -> str:
# To avoid creating duplicates, we first check if this URL already has a short version.
if long_url in self.url_to_short_code_map:
return self.url_to_short_code_map[long_url]
# Storing the new URL gives it a unique, sequential ID based on its list position.
self.long_urls_list.append(long_url)
unique_id_number = len(self.long_urls_list) - 1
# The ID is converted to a compact base-62 representation for the short URL code.
short_identifier_code = ""
if unique_id_number == 0:
short_identifier_code = self.alphabet_characters[0]
else:
temporary_id_number = unique_id_number
while temporary_id_number > 0:
remainder_index = temporary_id_number % 62
short_identifier_code = self.alphabet_characters[remainder_index] + short_identifier_code
temporary_id_number //= 62
full_short_url = self.base_url_prefix + short_identifier_code
self.url_to_short_code_map[long_url] = full_short_url
return full_short_url
def decode(self, short_url: str) -> str:
short_identifier_code = short_url.replace(self.base_url_prefix, "")
unique_id_number = 0
# The short code must be converted back to its original numeric ID to find the long URL.
for character_in_code in short_identifier_code:
unique_id_number = unique_id_number * 62 + self.alphabet_characters.find(character_in_code)
# The decoded ID directly corresponds to the index of the original URL in our master list.
return self.long_urls_list[unique_id_number]
Case | How to Handle |
---|---|
The encode method is called multiple times with the exact same longUrl. | The implementation should return the pre-existing shortUrl for that longUrl rather than creating a new one to ensure idempotency. |
The decode method receives a full URL string, including the base path like 'http://tinyurl.com/', not just the unique code. | The solution must parse the input string to extract only the unique short code before performing the lookup. |
The number of unique URLs to encode exhausts all possible combinations for a fixed-length short code. | A counter-based approach handles this by naturally increasing the length of the generated code as the counter value grows. |
Two distinct longUrls map to the same shortUrl code, causing a collision. | A collision must be detected and resolved, for example by regenerating the code until a unique one is found. |
The input longUrl is at the maximum allowed length of 10,000 characters. | The data structure, typically a hash map, must efficiently store and retrieve these large strings without significant performance degradation. |
The encode method is called with a URL that is itself a previously generated short URL. | This input should be treated like any other valid URL and encoded to a new, different short URL. |
An internal counter used for generating unique IDs exceeds the maximum value for its data type, like Integer.MAX_VALUE. | Using a 64-bit integer type (long) for the counter provides a much larger keyspace to prevent overflow for practical purposes. |
The data structures for forward (long-to-short) and reverse (short-to-long) lookups become inconsistent. | Both mappings must be updated atomically within the encode function to ensure that every encoded URL is decodable. |