Taro Logo

Design Add and Search Words Data Structure

Medium
Google logo
Google
2 views
Topics:
StringsTreesRecursion

Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the WordDictionary class:

  • WordDictionary() Initializes the object.
  • void addWord(word) Adds word to the data structure, it can be matched later.
  • bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots '.' where dots can be matched with any letter.

For example:

WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word that will be added to the data structure, and what is the maximum number of words that will be added?
  2. Will the words consist only of lowercase English letters, or can they contain other characters?
  3. When searching with a wildcard '.', does it only match a single character, or can it match multiple characters or no character at all?
  4. If the search word doesn't exist in the data structure, should the `search` method return `false`?
  5. How frequently will the `addWord` and `search` operations be called relative to each other? Is one operation significantly more common than the other?

Brute Force Solution

Approach

We're building a system to store words and search for them later. A brute force approach for searching simply checks every word we have stored until we find one that matches, trying all possibilities. This is like searching for a book in a library by looking at every book on every shelf.

Here's how the algorithm would work step-by-step:

  1. When adding a word, we simply put it into our word list.
  2. When searching for a word, and the search term doesn't have any special 'wildcard' characters, we go through our entire word list and compare each word to the search term. We report 'found' if we find an exact match, and 'not found' if we reach the end of the list without finding a match.
  3. If the search term does have a wildcard character (let's say a dot, which means 'any letter'), we need to be more careful. We still go through our entire list of words.
  4. For each word in our list, we compare it to the search term. If we find a dot in the search term, we allow the letter in the word at that same position to be anything.
  5. We only consider the word a match if the letters match everywhere *except* where the search term has a dot, where we allow any letter to match.
  6. We continue checking all words in the list until we find a match. If we reach the end without a match, we report 'not found'.

Code Implementation

class WordDictionary:

    def __init__(self):
        self.word_list = []

    def addWord(self, word: str) -> None:
        self.word_list.append(word)

    def search(self, word: str) -> bool:
        # Iterate through each word we have stored
        for existing_word in self.word_list:
            if len(existing_word) != len(word):
                continue

            match = True
            # Check each letter in the current stored word
            for index in range(len(word)):
                if word[index] == '.':
                    continue
                # If chars do not match, not a valid word.
                if word[index] != existing_word[index]:
                    match = False
                    break

            # Return true if the current word matches
            if match:
                return True

        return False

Big(O) Analysis

Time Complexity
O(n*m)Adding a word takes O(1) as it's a simple append operation. The search function iterates through all n words in the data structure. For each word, it compares it with the search term of length m. Therefore, the time complexity for the search operation is proportional to the product of the number of words (n) and the length of the search term (m), resulting in O(n*m).
Space Complexity
O(1)The described solution stores words in a list, but this list is part of the data structure's state, not auxiliary space used within the search or add operations. The search operation iterates through this list but does not create any additional data structures that scale with the number of words stored. Therefore, the auxiliary space used is constant, independent of the number of words (N) added to the data structure.

Optimal Solution

Approach

We will create a data structure that efficiently stores words and allows us to search for words, even with wildcard characters. The key is to use a structure that lets us quickly narrow down the search instead of checking every word individually.

Here's how the algorithm would work step-by-step:

  1. First, organize the words into a tree-like structure where each level represents a letter in the word. This is like a family tree, but for words.
  2. When adding a new word, trace its letters down the tree, creating new branches if needed. The end of each word will be marked specially.
  3. For searching, follow the letters of the search word down the tree. If you hit a wildcard character, try all possible branches at that point. This is how we handle unknown letters.
  4. If you reach the end of the search word and are at a marked end-of-word spot in the tree, you've found a match! Otherwise, it's not a match.

Code Implementation

class WordDictionary:

    def __init__(self):
        self.trie_root = {}

    def addWord(self, word: str) -> None:
        current_node = self.trie_root
        for char in word:
            if char not in current_node:
                current_node[char] = {}
            current_node = current_node[char]
        current_node['#'] = True

    def search(self, word: str) -> bool:
        def search_in_node(word, node):
            for i, char in enumerate(word):
                if char == '.':
                    # Explore all branches for a wildcard
                    for child in node:
                        if child != '#' and search_in_node(word[i+1:], node[child]):
                            return True
                    return False
                else:
                    if char not in node:
                        return False
                    node = node[char]

            # Check if end of word is marked at the end of search
            return '#' in node

        return search_in_node(word, self.trie_root)

Big(O) Analysis

Time Complexity
O(M * 26^N)The addWord operation takes O(M) time, where M is the length of the word being added, as it traverses the Trie once for each character. The search operation's worst-case time complexity is O(M * 26^N), where M is the length of the search word and N is the number of '.' wildcard characters in the search word. Each wildcard can branch to 26 possible characters in the Trie, leading to exponential growth in the search space. If there are no wildcards, the search is O(M).
Space Complexity
O(M)The primary space usage comes from the Trie data structure, where M is the total number of characters across all added words. Each node in the Trie stores references to its children (up to 26 for each letter), and nodes are created for each unique character in each added word. Thus, in the worst case, if all the added words have completely distinct characters, the Trie might need to store a node for each character across all words, leading to a space complexity proportional to the total number of characters.

Edge Cases

CaseHow to Handle
Null or empty word being addedTreat null or empty words as invalid inputs and either ignore them or throw an exception.
Null or empty search patternAn empty search pattern should likely return true if and only if an empty string was added before; otherwise, it is handled like any other pattern.
Very long words or a very large number of words addedThe Trie data structure's memory usage and search time could become a bottleneck; ensure efficient memory allocation and consider alternative data structures if scalability is a major concern.
Search pattern with a large number of '.' charactersExcessive wildcard characters could lead to exponential backtracking, potentially causing timeouts, so limit the depth of recursion or use dynamic programming.
Words containing characters other than lowercase letters 'a'-'z'Handle non-lowercase characters by either rejecting them or extending the Trie to support them, based on problem requirements.
Adding the same word multiple timesAdding duplicates should not affect the search functionality; Trie structure inherently handles multiple insertions of same word correctly.
No words have been added before searchingSearching should return false for any non-empty pattern if no words have been added to the data structure.
Integer overflow when calculating indices in large TriesEnsure that index calculations within the Trie remain within integer bounds to avoid unexpected behavior.