Taro Logo

Search Suggestions System

Medium
Two Sigma logo
Two Sigma
0 views
Topics:
ArraysStringsTwo PointersBinary Search

You are given an array of strings products and a string searchWord.

Design a system that suggests at most three product names from products after each character of searchWord is typed. Suggested products should have common prefix with searchWord. If there are more than three products with a common prefix return the three lexicographically minimums products.

Return a list of lists of the suggested products after each character of searchWord is typed.

Example 1:

Input: products = ["mobile","mouse","moneypot","monitor","mousepad"], searchWord = "mouse"
Output: [["mobile","moneypot","monitor"],["mobile","moneypot","monitor"],["mouse","mousepad"],["mouse","mousepad"],["mouse","mousepad"]]
Explanation: products sorted lexicographically = ["mobile","moneypot","monitor","mouse","mousepad"].
After typing m and mo all products match and we show user ["mobile","moneypot","monitor"].
After typing mou, mous and mouse the system suggests ["mouse","mousepad"].

Example 2:

Input: products = ["havana"], searchWord = "havana"
Output: [["havana"],["havana"],["havana"],["havana"],["havana"],["havana"]]
Explanation: The only word "havana" will be always suggested while typing the search word.

Constraints:

  • 1 <= products.length <= 1000
  • 1 <= products[i].length <= 3000
  • 1 <= sum(products[i].length) <= 2 * 104
  • All the strings of products are unique.
  • products[i] consists of lowercase English letters.
  • 1 <= searchWord.length <= 1000
  • searchWord consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the length of `products` and `searchWord`? Are they large enough that performance should be a major concern?
  2. If, for a given prefix, there are fewer than 3 product names that match the prefix, should I return all of the matching product names, or pad the result with empty strings/null values?
  3. Are the `products` already sorted lexicographically, or do I need to sort them first?
  4. Can the `products` array contain empty strings or null values? What should I do if `searchWord` is empty or null?
  5. Is the search case-sensitive? For example, should "mouse" match "Mouse"?

Brute Force Solution

Approach

Imagine you're typing a search query, and as you type, the system suggests products. The brute force way to do this means, for each letter you type, we check all the products to see which ones start with what you've typed so far. We then pick the top few suggestions to display.

Here's how the algorithm would work step-by-step:

  1. When the user types the first letter, go through the entire list of products.
  2. For each product, see if it starts with that first letter.
  3. Make a list of all the products that do start with that letter.
  4. Sort that list alphabetically.
  5. Show the first three products from that sorted list as suggestions.
  6. When the user types the next letter, repeat the process. Go through the original list of products.
  7. This time, check if each product starts with the two letters the user has typed.
  8. Again, make a list of the matching products, sort them, and show the top three as suggestions.
  9. Keep repeating this process for each new letter the user types.

Code Implementation

def search_suggestions_system(products, search_word):
    search_results = []
    prefix = ""

    for character in search_word:
        prefix += character
        matching_products = []

        # Iterate over products to find those that start with the current prefix
        for product in products:
            if product.startswith(prefix):
                matching_products.append(product)

        # Sort the matching products alphabetically
        matching_products.sort()

        # Select the top 3 matching products (or fewer, if there aren't 3)
        suggestions = matching_products[:3]

        # Add the suggestions to the overall results
        search_results.append(suggestions)

    return search_results

Big(O) Analysis

Time Complexity
O(m*n*log(n))For each character typed in the search word (of length m), the algorithm iterates through all products (of size n) to check if they start with the prefix. For each prefix, building the matching list takes O(n) time. Then, the matching list is sorted alphabetically, which takes O(n*log(n)) time in the worst case. Therefore, the time complexity for each character is O(n*log(n)), and since this is repeated for each of the m characters in the search word, the overall time complexity is O(m*n*log(n)).
Space Complexity
O(N)For each letter typed, the algorithm creates a list of matching products. In the worst-case scenario, where the first letter matches all N products, a list of size N is created. Sorting this list also requires O(N) space in some sorting algorithms. Thus, the auxiliary space used is proportional to the number of products, N, leading to a space complexity of O(N).

Optimal Solution

Approach

The most efficient way to provide search suggestions is to first organize all the products alphabetically. Then, as someone types, we quickly narrow down the possibilities to only those products that start with what they've typed so far, returning only the first three matching results.

Here's how the algorithm would work step-by-step:

  1. First, sort all the product names alphabetically to make searching faster later.
  2. As the user types each letter, build a search string from the letters they've typed so far.
  3. Using the sorted product list, find the products that begin with the search string. This is done efficiently, avoiding checking every single product.
  4. If there are more than three matching products, only return the first three alphabetically.
  5. If there are three or fewer matching products, return all of them.
  6. Repeat this process each time the user types another letter, showing new suggestions immediately.

Code Implementation

def search_suggestions(products, search_word):
    products.sort()
    result = []
    search_string = ""

    for char in search_word:
        search_string += char
        matching_products = []

        # Only iterate products starting with the prefix
        for product_name in products:

            if product_name.startswith(search_string):

                matching_products.append(product_name)

        # Limit the number of suggestions to a maximum of 3

        suggestions = sorted(matching_products)[:3]
        result.append(suggestions)

    return result

Big(O) Analysis

Time Complexity
O(n log n + m*k)Sorting the product list initially takes O(n log n) time, where n is the number of products. The outer loop iterates 'm' times, where 'm' is the length of the search word. Inside this loop, we find the products that start with the current prefix. Finding these prefixes can take up to O(k) time in the worst case for each of the m searches, where k is the number of products to be searched for prefix matches. Therefore, the overall time complexity is dominated by O(n log n + m*k), where k represents the number of searches.
Space Complexity
O(N)The primary space complexity comes from sorting the input list of products, where N is the number of products. While some sorting algorithms can be done in-place (modifying the original array directly), many common and efficient sorting algorithms like merge sort or quicksort (in its typical implementation) require auxiliary space proportional to the number of elements being sorted. This is needed to store temporary copies of the product list during the sorting process. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
products is null or emptyReturn an empty list if the input products array is null or empty.
searchWord is null or emptyReturn an empty list if the searchWord is null or empty.
products array contains empty stringsFilter out empty strings from the products list before processing.
products array contains strings that don't start with a prefix of searchWordThe solution should correctly filter out products that do not match the prefix.
searchWord is longer than any string in productsThe solution should return the top 3 lexicographically smallest matches until the prefix is longer than any word.
products array contains duplicate stringsThe solution should handle duplicate strings correctly by including them in the ranked matches if they are among the top 3 lexicographically smallest.
products array contains strings with mixed caseConvert both the products and searchWord to lowercase for case-insensitive matching.
When there are fewer than 3 products that match a given prefixReturn all matching products for that prefix, even if fewer than 3.