Taro Logo

Design Search Autocomplete System

Hard
Pinterest logo
Pinterest
2 views
Topics:
StringsTrees

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3 historical hot sentences that match the prefix of the part sentence before the current character. For each input sentence, when the character is '#', it means the sentence ends, and you need to store the complete sentence into the historical data.

Implement the AutocompleteSystem class:

  • AutocompleteSystem(String[] sentences, int[] times) Initializes the object with a list of historical sentences and their corresponding number of times each sentence was typed.
  • List<String> input(char c)
    • If c == '#', it means the sentence ends, and you need to store the complete sentence into the historical data.
    • Otherwise, you need to return the top 3 historical hot sentences that match the prefix of the part sentence before the current character.

Example 1:

Input
["AutocompleteSystem", "input", "input", "input", "input"]
[[["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]]
Output
[null, ["i love you", "island", "i love leetcode"], [], [], []]

Explanation
AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]);
obj.input("i"); // return ["i love you", "island", "i love leetcode"], because "i love you" has highest times.
obj.input(" "); // return []. There's no match.
obj.input("a"); // return []. There's no match.
obj.input("#"); // return []. The sentence "i a" was typed. Store the string to the record.

Example 2:

Input
["AutocompleteSystem", "input", "input", "input", "input"]
[[["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]]
Output
[null, ["i love you", "island", "i love leetcode"], [], [], []]

Explanation
AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]);
obj.input("i"); // return ["i love you", "island", "i love leetcode"], because "i love you" has highest times.
obj.input(" "); // return []. There's no match.
obj.input("a"); // return []. There's no match.
obj.input("#"); // return []. The sentence "i a" was typed. Store the string to the record.

Constraints:

  • 1 <= sentences.length <= 100
  • 1 <= sentences[i].length <= 100
  • 1 <= times.length <= 100
  • times[i] >= 1
  • c is a lowercase English letter, space ' ' or special character '#'
  • Each input will be ended with the character '#'.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a sentence in `sentences` and what is the maximum number of sentences in the `sentences` array?
  2. Are the counts in the `times` array guaranteed to be non-negative, and is there any limit to their maximum value?
  3. If multiple sentences have the same frequency and are among the top 3, how should they be ordered lexicographically relative to the other sentences?
  4. If the input character `c` does not match any existing sentences, should I return an empty list or add it as a new sentence with a default frequency of 0?
  5. Can I assume that the input will only contain lowercase English letters and spaces, or should I handle other characters, punctuation, or uppercase letters?

Brute Force Solution

Approach

The brute force approach to search autocomplete is like trying every possible search query starting with the given input. We check each of these queries against all available search terms to see which ones match. This ensures we find all relevant suggestions, even if it's slow.

Here's how the algorithm would work step-by-step:

  1. Take the user's input as the beginning of a search query.
  2. Consider every possible word or phrase that could follow the user's input to form a complete search query.
  3. For each of these possible search queries, compare it against a list of all available search terms.
  4. If a possible search query exactly matches a search term, then include that search term as a suggested autocomplete option.
  5. Do this for every possible search query that starts with the user's input.
  6. Rank the suggested autocomplete options (for example, based on popularity) and present the top results to the user.

Code Implementation

def search_autocomplete_brute_force(search_terms, user_input):
    suggested_terms = []
    
    # Iterate through all possible extensions of the user's input
    for search_term in search_terms:
        if search_term.startswith(user_input):
            # Add the search term to suggested terms if it matches.
            suggested_terms.append(search_term)

    return suggested_terms

def search_autocomplete_with_ranking(
    search_terms,
    user_input,
    popularity_scores
):
    suggested_terms = search_autocomplete_brute_force(
        search_terms,
        user_input
    )

    # Filter for terms where popularity scores are known
    available_suggestions = [
        term for term in suggested_terms if term in popularity_scores
    ]

    # Sort by popularity. 
    ranked_suggestions = sorted(
        available_suggestions,
        key=lambda term: popularity_scores[term],
        reverse=True
    )
    
    # Only show the top 5 suggestions.
    return ranked_suggestions[:5]

Big(O) Analysis

Time Complexity
O(K * S * P)The brute force autocomplete approach iterates through all possible search queries starting with the user's input. Let K be the maximum length of a possible search query, S be the number of available search terms, and P be the number of possible words or phrases that could follow the user's input. We generate each possible query, which takes O(K) time, then compare it against each of the S search terms. Doing this for all P possible continuations results in O(K * S * P) time complexity, where K represents the maximum length of a search query to ensure we check all characters.
Space Complexity
O(1)The brute force approach iterates through all possible search queries and compares them against available search terms. While the number of possible queries can be large, the plain English explanation does not specify the use of any auxiliary data structures that grow with the number of possible queries, search terms, or user input. The algorithm essentially compares strings one at a time, implying constant space for string comparisons. Therefore, no significant auxiliary space is used beyond a few temporary variables, resulting in O(1) space complexity.

Optimal Solution

Approach

To design an efficient search autocomplete system, we'll build a special data structure that organizes all possible search terms. This structure will let us quickly find and rank relevant suggestions as the user types, without needing to search through every single term each time.

Here's how the algorithm would work step-by-step:

  1. First, we'll store all our search terms in a tree-like structure where each branch represents a letter and each path from the root to a node represents a prefix (part of a word).
  2. With this tree, we can efficiently find all search terms that start with a given prefix. When a user types something, we simply traverse down the tree following the letters they've typed.
  3. At each node in the tree, we’ll also keep track of the most popular or relevant search terms that contain that prefix. This helps us rank the suggestions.
  4. When the user types, we quickly traverse the tree to find the appropriate node, retrieve the ranked suggestions associated with that node, and display them to the user.
  5. To keep the suggestions relevant over time, we'll update the popularity scores of the search terms whenever someone selects a suggestion. This ensures that the most frequently chosen terms are always ranked higher.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.suggestions = []

class AutocompleteSystem:
    def __init__(self, sentences, times):
        self.root = TrieNode()
        self.current_prefix = ""
        
        for i in range(len(sentences)):
            self.add_sentence(sentences[i], times[i])

    def add_sentence(self, sentence, hot):
        node = self.root
        for char in sentence:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.suggestions.append((sentence, hot))
            node.suggestions.sort(key=lambda item: (-item[1], item[0]))
            if len(node.suggestions) > 3:
                node.suggestions.pop()

    def get_suggestions(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        
        return [item[0] for item in node.suggestions]

    def input(self, character):
        if character == '#':
            self.add_sentence(self.current_prefix, 1)
            self.current_prefix = ""
            return []

        self.current_prefix += character
        return self.get_suggestions(self.current_prefix)

# The Trie data structure is initialized.
# Each node represents a character in a potential search term.
def example_usage():
    sentences = ["i love you", "island", "iroman", "i love leetcode"]
    times = [5, 3, 2, 2]
    auto_complete_system = AutocompleteSystem(sentences, times)

    print(auto_complete_system.input('i'))
    print(auto_complete_system.input(' '))
    print(auto_complete_system.input('l'))
    print(auto_complete_system.input('o'))
    print(auto_complete_system.input('v'))
    print(auto_complete_system.input('e'))
    print(auto_complete_system.input(' '))
    print(auto_complete_system.input('y'))
    print(auto_complete_system.input('o'))
    print(auto_complete_system.input('u'))
    print(auto_complete_system.input('#'))

    print(auto_complete_system.input('i'))

# Adding a new sentence or incrementing the 'hot' score
# updates the Trie's suggestion lists based on popularity.

# Every typed character updates current_prefix, which then
# queries the trie and returns top suggestions.

# This function demonstrates the AutocompleteSystem in action
# It initializes the system, processes input, and prints the output.
if __name__ == '__main__':
    example_usage()

Big(O) Analysis

Time Complexity
O(m + klogk)Building the Trie data structure initially involves inserting all search terms, where m is the total length of all search terms. This insertion process takes O(m) time. When a user types a prefix, we traverse the Trie, which takes O(p) time where p is the length of the prefix (p <= max length of search term). Finding the top k most popular terms at a node involves sorting, which takes O(klogk) time, where k is the number of suggestions to return. The overall complexity is dominated by Trie construction and retrieving top k suggestions at each Trie node during autocompletion, making it O(m + klogk) assuming p is small in comparision to m.
Space Complexity
O(N * L)The space complexity is primarily determined by the Trie data structure, where N is the number of search terms and L is the average length of these terms. Each node in the Trie stores a character, and the structure branches out to represent all prefixes of the search terms. Therefore, in the worst-case scenario, the Trie will store all characters of all search terms, leading to a space usage proportional to the sum of the lengths of all terms. The storage of popular search terms at each node contributes to this space as well, keeping track of ranked suggestions based on popularity.

Edge Cases

CaseHow to Handle
sentences is null or emptyInitialize the AutocompleteSystem with an empty trie and an empty frequency map.
times is null or empty or has different length from sentencesThrow an IllegalArgumentException or return an error code indicating invalid input.
Input character 'c' is non-ASCII or contains special control charactersHandle only ASCII characters or sanitize the input to prevent unexpected behavior.
sentences contains empty stringsTreat empty strings as valid sentences, but their frequency should be considered accordingly when suggesting top sentences.
times contains negative frequenciesTreat negative frequencies as zero, or throw an IllegalArgumentException, ensuring frequency is non-negative.
Prefix search returns more than 3 matches.Sort the matching sentences based on frequency and lexicographical order, and return only the top 3.
All sentences start with the same prefixEnsure that sorting is still done correctly with all sentences matching, returning top 3 based on frequency and lexicographical order.
The same sentence is added multiple times with #Update the sentence's frequency correctly each time it's entered with #, adding to the total frequency.