Taro Logo

Design Add and Search Words Data Structure

Medium
Datadog logo
Datadog
1 view
Topics:
StringsRecursion

Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the WordDictionary class:

  • WordDictionary() Initializes the object.
  • void addWord(word) Adds word to the data structure, it can be matched later.
  • bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots '.' where dots can be matched with any letter.

Example:

Input
["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]

Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True

Constraints:

  • 1 <= word.length <= 25
  • word in addWord consists of lowercase English letters.
  • word in search consist of '.' or lowercase English letters.
  • There will be at most 2 dots in word for search queries.
  • At most 104 calls will be made to addWord and search.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word that will be added to the data structure?
  2. What characters can the words consist of other than lowercase English letters?
  3. How many `addWord` and `search` operations should I expect in a typical test case? This will help me consider time complexity implications.
  4. Can I assume that the input `word` to the `search` function will only contain lowercase English letters and the '.' character?
  5. Are there any memory constraints I should be aware of when designing this data structure?

Brute Force Solution

Approach

The brute force approach to this word search problem involves simply storing every word we are given, and then, when asked to search, checking all stored words to see if they match the pattern. A match with a wildcard means checking all possible letters that could replace the wildcard.

Here's how the algorithm would work step-by-step:

  1. When a new word is added, just store it in a list or a container to keep track of all the words we have.
  2. When asked to search for a word (pattern), go through our entire list of stored words one by one.
  3. For each stored word, compare it to the search pattern character by character.
  4. If the search pattern has a wildcard character at a certain position, try matching it with every possible letter of the alphabet.
  5. If the current stored word matches the search pattern (including wildcard replacements), return True to indicate a match is found.
  6. If we've gone through all the stored words and none of them match the search pattern, return False.

Code Implementation

class WordDictionary:

    def __init__(self):
        self.word_list = []

    def addWord(self, word: str) -> None:
        self.word_list.append(word)

    def search(self, word: str) -> bool:
        # Iterate through each word we've stored
        for stored_word in self.word_list:
            if len(stored_word) != len(word):
                continue

            # Compare the characters
            match = True
            for index in range(len(word)):
                if word[index] == '.':
                    continue
                if word[index] != stored_word[index]:
                    match = False
                    break

            # Found a match, return True
            if match:
                return True

        # If we made it here, no matches found
        return False

Big(O) Analysis

Time Complexity
O(N*M*26^K)Let N be the number of words stored, M be the length of the search word, and K be the number of wildcard characters in the search word. For each stored word, we potentially iterate through all M characters of the search word. When a wildcard is encountered, we explore all 26 possibilities for that character. Since there can be K wildcards, in the worst case we branch 26 times for each wildcard, leading to 26^K possibilities. We repeat this check for each of the N stored words, resulting in O(N*M*26^K) time complexity.
Space Complexity
O(N)The described brute force approach stores all added words in a list or container. Let N be the total number of characters across all words added to the data structure. The space used by this list scales linearly with the total number of characters in the stored words. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

We'll create a structure to store words, making it efficient to add new words and search for words, even with wildcard characters. We'll use a special kind of tree where each level represents a character in a word, which will make our search super fast.

Here's how the algorithm would work step-by-step:

  1. When adding a word, follow the path in the tree that matches the letters of the word. If a path doesn't exist, create it.
  2. Mark the end of the word at the final character node so we know it's a complete word.
  3. When searching for a word (potentially with wildcard characters), start from the root of the tree.
  4. If the character is a letter, follow the corresponding path in the tree.
  5. If the character is a wildcard, explore all possible paths from that point. This involves checking every branch at that level of the tree.
  6. Continue down each possible path until you either find the end of a word or run out of paths.
  7. If you reach the end of the search string and you're at a node that marks the end of a word, then the search is successful.

Code Implementation

class WordDictionary:

    def __init__(self):
        self.trie_root = {}

    def addWord(self, word: str) -> None:
        current_node = self.trie_root
        for char in word:
            if char not in current_node:
                current_node[char] = {}
            current_node = current_node[char]
        current_node['$'] = True

    def search(self, word: str) -> bool:
        def search_in_node(word, node):
            for i, char in enumerate(word):
                if char == '.':
                    # Explore all branches for wildcard.
                    for child in node:
                        if child != '$' and search_in_node(word[i+1:], node[child]):
                            return True
                    return False
                else:
                    if char not in node:
                        return False
                    node = node[char]

            # Check if the path leads to a complete word.
            return '$' in node

        # Initiate search from the root of the trie.
        return search_in_node(word, self.trie_root)

Big(O) Analysis

Time Complexity
O(26^n)Adding a word of length n takes O(n) time, as we traverse the Trie character by character. The search function's time complexity is more complex due to the wildcard character '.'. In the worst case, where the search pattern consists only of wildcard characters (e.g., '...'), the search function might have to explore every possible path in the Trie. For a word of length n, and 26 possible choices for each wildcard character, this could lead to exploring up to 26^n paths. Therefore, the overall time complexity of the search operation is O(26^n), where n is the length of the word being searched.
Space Complexity
O(N)The space complexity is dominated by the Trie data structure that stores the words. In the worst-case scenario, where all words are distinct and share no common prefixes, the Trie will store N nodes, where N is the total number of characters across all added words. Each node stores up to 26 references (for each possible letter of the alphabet) to its children. Additionally, the recursive search, in the case of wildcard characters, may create a call stack of depth up to the length of the search word. Therefore, the space required scales linearly with the number of characters stored in the Trie, leading to O(N) space complexity.

Edge Cases

CaseHow to Handle
Null or empty word input for addWordIgnore null/empty words or throw an IllegalArgumentException based on requirements; the Trie's structure is not altered.
Null or empty word input for searchReturn true if the Trie is empty (no words added) or false if it's not, based on specific project requirement.
Word containing only '.' for search, and empty TrieReturn false because an empty Trie cannot match even a wildcard search.
Word containing only '.' for search, and Trie containing words of a different lengthReturn false if there are no words of the same length as the search word.
Very long words added to the dictionary, causing deep Trie branchesTrie structure can handle very long words, but deep recursion in search may lead to stack overflow, requiring iterative DFS.
Adding the same word multiple timesAdding the same word multiple times simply adds another path leading to 'isWord', which does not change search results.
Words with unicode charactersTrie nodes should be able to store unicode characters or be explicitly limited to ASCII.
Maximum number of addWord/search calls leading to excessive memory usageConsider memory usage and potentially implement a Least Recently Used (LRU) cache or some other mechanism to limit Trie size if needed.