Design a search autocomplete system that provides top 3 most frequent relevant sentences based on user input.
Initialization: The system is initialized with a list of sentences and their corresponding frequencies (hotness). For example:
sentences = ["i love you", "island", "ironman", "i love coding"]
times = [5, 3, 2, 2]
This means "i love you" has a frequency of 5, "island" has a frequency of 3, and so on.
Input: The system receives user input character by character. After each character, it should return the top 3 most frequent sentences starting with the current input.
If the input exactly matches any sentence, return it
If user inputs a special character #
, it means the user finished typing a new sentence. In this case, add the sentence to data structure and set the frequency as 1 or increment the frequency if the sentence already exists.
Sentences are ranked by frequency (hotness) first and then lexicographical order.
If less than 3 sentences match the current input, return all matching sentences.
Example:
AutocompleteSystem(["i love you", "island", "ironman", "i love coding"], [5, 3, 2, 2])
input("i") -> ["i love you", "island", "i love coding"]
input(" ") -> ["i love you", "i love coding"]
input("a") -> []
input("#") -> []
input("i love it") -> []
input("#") -> []
["i love you", "island", "ironman", "i love coding"]
with frequencies [5, 3, 2, 2]
.What data structures and algorithms would you use to implement this system efficiently? Consider edge cases and analyze the time and space complexity of your solution.
Let's discuss how to design a search autocomplete system. This system should provide search suggestions as the user types a query.
A simple approach would be to store all sentences in a list. For each character the user types, iterate through the entire list of sentences and find the sentences that start with the current input. Sort these matching sentences by hotness (frequency), and return the top 3.
Example:
sentences = ["i love you", "island", "ironman", "i love coding"]
If the user types "i ", the algorithm would:
["i love you", "i love coding"]
{"i love you": 5, "i love coding": 2}
. Sort by frequency and then lexicographically: ["i love you", "i love coding"]
.["i love you", "i love coding"]
(since there are only two matches).Big O Analysis:
A more efficient solution involves using a Trie data structure. A Trie (also known as a prefix tree) is a tree-like data structure that stores strings such that common prefixes share the same node. This allows for fast prefix-based search.
Example:
Using the same sentences as before:
sentences = ["i love you", "island", "ironman", "i love coding"]
["i love you", "i love coding"]
Code (Python):
class TrieNode:
def __init__(self):
self.children = {}
self.is_sentence = False
self.frequency = 0
class AutocompleteSystem:
def __init__(self, sentences, times):
self.root = TrieNode()
self.current_prefix = ""
for i, sentence in enumerate(sentences):
self.add_sentence(sentence, times[i])
def add_sentence(self, sentence, time):
node = self.root
for char in sentence:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_sentence = True
node.frequency = time
def input(self, c):
if c == '#':
self.add_sentence(self.current_prefix, 1)
self.current_prefix = ""
return []
self.current_prefix += c
node = self.root
for char in self.current_prefix:
if char not in node.children:
return []
node = node.children[char]
results = []
self.dfs(node, self.current_prefix, results)
results.sort(key=lambda x: (-x[0], x[1]))
return [sentence for _, sentence in results[:3]]
def dfs(self, node, sentence, results):
if node.is_sentence:
results.append((node.frequency, sentence))
for char, child in node.children.items():
self.dfs(child, sentence + char, results)
Big O Analysis:
current_prefix
to an empty string.