Design a search autocomplete system for a search engine.
Initialization: The system is initialized with a list of sentences and their corresponding frequencies. For example:
sentences = ["i love you", "island", "iroman", "i love leetcode"]
times = [5, 3, 2, 2]
Input: The system receives a character as input. It should return the top 3 sentences that start with the current input. The ranking criteria are:
Termination: When the input character is '#', it signifies the end of a sentence. The current sentence (formed by all inputs since the last '#') should be added to the system with a frequency of 1 (or incremented if it already exists).
Example:
AutocompleteSystem(sentences, times)
input('i') // Returns ["i love you", "island", "i love leetcode"]
input(' ') // Returns ["i love you", "i love leetcode"]
input('a') // Returns []
input('#') // Returns []
input('i') // Returns ["i love you", "island", "i love leetcode"]
Constraints:
Explain how you would design and implement such a system efficiently, considering both time and space complexity.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach to autocomplete involves checking every possible search term prefix against every term in our dataset. We will check each term in the dataset to see if any of them start with the given prefix. Then return all the terms that begin with this prefix.
Here's how the algorithm would work step-by-step:
def autocomplete_brute_force(search_terms, prefix):
suggestions = []
# Iterate through each term to find matches
for search_term in search_terms:
# Check if the current term starts with the prefix
if search_term.startswith(prefix):
# Add matching term to the suggestions
suggestions.append(search_term)
return suggestions
The best way to build an autocomplete system is to create a structure to quickly find popular search terms that start with what the user has typed. We use a tree-like structure that helps us efficiently retrieve suggestions and prioritize them based on their frequency.
Here's how the algorithm would work step-by-step:
class TrieNode:
def __init__(self):
self.children = {}
self.is_end_of_word = False
self.popularity = 0
self.top_suggestions = []
class AutocompleteSystem:
def __init__(self, sentences, times):
self.root = TrieNode()
self.current_node = self.root
self.current_prefix = ""
for i in range(len(sentences)):
self.add_sentence(sentences[i], times[i])
def add_sentence(self, sentence, popularity_count):
node = self.root
for char in sentence:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_of_word = True
node.popularity = popularity_count
def get_top_suggestions(self, node):
suggestions = []
self.dfs(node, self.current_prefix, suggestions)
suggestions.sort(key=lambda item: (-item[0], item[1]))
return [item[1] for item in suggestions[:3]]
def dfs(self, node, current_string, suggestions):
if node.is_end_of_word:
suggestions.append((node.popularity, current_string))
for char, child in node.children.items():
self.dfs(child, current_string + char, suggestions)
def input(self, character):
if character == '#':
self.add_sentence(self.current_prefix, 1) # Add the sentence with popularity 1
self.current_node = self.root
self.current_prefix = ""
return []
self.current_prefix += character
if character not in self.current_node.children:
self.current_node.children[character] = TrieNode()
self.current_node = self.current_node.children[character]
# Need to create a new node to continue search
return []
self.current_node = self.current_node.children[character]
# Prioritize suggestions with high frequency
return self.get_top_suggestions(self.current_node)
Case | How to Handle |
---|---|
Empty search query | Return an empty list or the top k most frequent results, depending on design preference. |
Null or invalid characters in the search query | Filter out invalid characters or handle them according to the problem's requirements (e.g., replace with a space). |
Database of suggestions is empty | Return an empty list if there are no suggestions in the database. |
Maximum query length exceeds system limitations | Truncate the query or return an error if the query exceeds the allowed maximum length. |
Large number of suggestions matching the prefix | Implement pagination or limit the number of results returned to maintain performance. |
Suggestions with identical prefixes but different relevance scores | Sort suggestions with identical prefixes by their relevance scores to prioritize more relevant results. |
User types very fast, generating multiple queries in quick succession | Debounce or throttle the requests to the autocomplete system to prevent overloading the server. |
Unicode characters in search query or suggestions | Ensure proper encoding and handling of Unicode characters to prevent display or indexing issues. |