Taro Logo

Prefix and Suffix Search

Hard
Google logo
Google
4 views
Topics:
ArraysStrings

Design a special dictionary that searches the words in it by a prefix and a suffix. Implement the WordFilter class with the following methods:

  1. WordFilter(string[] words): Initializes the object with the words in the dictionary.
  2. f(string pref, string suff): Returns the index of the word in the dictionary, which has the prefix pref and the suffix suff. If there is more than one valid index, return the largest of them. If there is no such word in the dictionary, return -1.

For example:

WordFilter wordFilter = new WordFilter(["apple"]);
wordFilter.f("a", "e"); // return 0, because the word at index 0 has prefix = "a" and suffix = "e".

Constraints:

  • 1 <= words.length <= 10^4
  • 1 <= words[i].length <= 7
  • 1 <= pref.length, suff.length <= 7
  • words[i], pref and suff consist of lowercase English letters only.
  • At most 10^4 calls will be made to the function f.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word, and what is the maximum number of words that will be provided during initialization?
  2. Are the prefix and suffix case-sensitive? Should I treat 'Apple' and 'apple' as the same?
  3. If multiple words match both the prefix and suffix, should I return the index of the word with the largest index, or the smallest, or is it arbitrary?
  4. Can the prefix or suffix be empty strings? If so, what should the function return if no words match the empty prefix/suffix?
  5. Will the input words contain only lowercase letters, or can they include other characters (e.g., uppercase, numbers, special characters)?

Brute Force Solution

Approach

The goal is to find a word from a given list that matches both a given prefix and a given suffix. The brute force method involves checking every single word in the list to see if it satisfies the given prefix and suffix conditions. It is like comparing the given prefix and suffix against all possibilities one by one until we find a match.

Here's how the algorithm would work step-by-step:

  1. Start with the very first word in the list.
  2. Check if the beginning part of that word perfectly matches the provided prefix.
  3. If it doesn't, move to the next word.
  4. If it does match the prefix, then check if the ending part of that same word perfectly matches the provided suffix.
  5. If both the prefix and suffix match for that word, remember that word as a potential result.
  6. Repeat this process, going through each word in the entire list, checking for prefix and suffix matches.
  7. If multiple words satisfy both the prefix and suffix conditions, select the one that appears last in the original word list.

Code Implementation

def prefix_suffix_search_brute_force(word_list, prefix, suffix):
   matching_word_index = -1

   for word_index, word in enumerate(word_list):
       # Check if the word starts with the given prefix
       if word.startswith(prefix):

           # Check if the word ends with the given suffix
           if word.endswith(suffix):

               # Store the index of the current word
               matching_word_index = word_index

   return matching_word_index

Big(O) Analysis

Time Complexity
O(n*k)The provided solution iterates through each word in the input list of words, which we can represent as n. For each word, we need to check if it matches the given prefix and suffix. Let's assume the average length of a word, and the prefix and suffix is k. The prefix and suffix matching process involves comparing characters, which takes O(k) time. Therefore, for each word in the list, we perform O(k) operations. Consequently, the overall time complexity is O(n*k).
Space Complexity
O(1)The described brute force algorithm iterates through the input word list, but it doesn't create any auxiliary data structures to store intermediate results or visited words. It only uses a constant amount of extra space to store a potential result (the index of the matching word). Thus, the auxiliary space complexity is independent of the input size N (the number of words in the list) and remains constant.

Optimal Solution

Approach

The trick to solving this problem efficiently is to combine the prefix and suffix into a single, searchable term. This allows us to quickly find words matching both criteria without needing to examine every word individually. We can use a specialized data structure to speed up the search process.

Here's how the algorithm would work step-by-step:

  1. First, create a special combined version of each word in the list by joining the word itself with the same word. This combined version will be used to create all possible prefix-suffix combinations.
  2. Next, create all the possible prefix-suffix combinations of these combined words. For example, combine all possible prefixes of the word with all possible suffixes of the word, with a special separator in between.
  3. Store all these prefix-suffix combinations in a data structure that allows you to quickly find the items. Think of it as a super-fast lookup table.
  4. When you are given a specific prefix and suffix to search for, combine them together with the special separator used earlier.
  5. Now, look for this combined prefix-suffix in your fast lookup table. The lookup table will quickly give you all the matching words from the initial list.

Code Implementation

class WordFilter:

    def __init__(self, words):
        self.prefix_suffix_map = {}
        self.separator = '#'

        for word_index, word in enumerate(words):
            combined_word = word + self.separator + word

            for i in range(len(word) + 1):
                prefix = combined_word[:i]
                for j in range(len(word) + 1):
                    suffix = combined_word[len(word) + 1 : len(combined_word) - len(word) + j + len(word)]
                    prefix_suffix = prefix + self.separator + word[len(word) - len(suffix):]

                    # Store the index of the word for fast retrieval
                    self.prefix_suffix_map[prefix_suffix] = word_index

    def find_index(self, prefix, suffix):
        # Construct the combined prefix-suffix key for lookup
        combined_search_term = prefix + self.separator + suffix
        
        # Retrieve the index from the map, return -1 if not found
        if combined_search_term in self.prefix_suffix_map:
            return self.prefix_suffix_map[combined_search_term]
        else:
            return -1

Big(O) Analysis

Time Complexity
O(N * K^2)The proposed solution iterates through each of the N words in the input list. For each word, it generates all possible prefixes and suffixes. Assuming the maximum length of a word is K, there can be K prefixes and K suffixes. The algorithm then generates all prefix-suffix combinations, leading to O(K^2) combinations for each word. Therefore, the overall time complexity is O(N * K^2), dominated by generating prefix-suffix combinations for all words. The lookup operation is assumed to be O(1) after pre-processing.
Space Complexity
O(N * K^2)The auxiliary space is dominated by the storage of prefix-suffix combinations in the lookup table, where N is the number of words in the input list and K is the maximum length of a word. For each word, we generate all possible prefixes and suffixes, resulting in approximately K prefixes and K suffixes. Each prefix-suffix combination (of which there are K * K = K^2) is stored. Therefore, across all N words, the space complexity is O(N * K^2) due to storing these combinations in the lookup table. The combined words are of length 2K so do not affect the overall space complexity. Specifically, this lookup table is likely implemented using a hash map or a trie, and its size grows proportionally to N * K^2.

Edge Cases

CaseHow to Handle
Empty words arrayReturn -1, as no word can match an empty dictionary.
Null words array or null prefix/suffixThrow IllegalArgumentException to indicate invalid input.
Words array contains empty stringsHandle empty strings appropriately; the weight of an empty string should be considered.
Prefix or suffix longer than any word in the words arrayNo match should be found, return -1.
Large words array affecting memory usage and search timeOptimize Trie data structure for memory efficiency and use efficient search algorithms.
Prefix and suffix are the same and match multiple wordsReturn the largest index of the matching words.
Prefix and suffix are both empty stringsReturn the index of the last word in the words array, representing the highest possible index.
Integer overflow if word lengths are very large during weight calculationUse appropriate data types (e.g., long) or modular arithmetic to prevent overflow.