Taro Logo

Implement Trie II (Prefix Tree)

Medium
Amazon logo
Amazon
2 views
Topics:
StringsTrees

Let's implement a Trie (also known as a prefix tree) with a few specific functionalities. A Trie is a tree-like data structure used for efficient retrieval of strings. Each node in the Trie represents a character, and the path from the root to a node forms a prefix or a complete word.

Your task is to implement a Trie class with the following methods:

  1. Trie(): Initializes the Trie object.
  2. insert(word: str): Inserts the string word into the Trie. If the word is already present, increment the count of that word.
  3. countWordsEqualTo(word: str) -> int: Returns the number of times the string word is present in the Trie.
  4. countWordsStartingWith(prefix: str) -> int: Returns the number of strings in the Trie that start with the given prefix.
  5. erase(word: str): Decrements the count of the string word in the Trie. If the count reaches zero, effectively remove the word. If the word does not exist, nothing happens.

For example:

Trie trie = new Trie();
trie.insert("apple");
trie.insert("apple");
trie.countWordsEqualTo("apple"); // Returns 2
trie.countWordsStartingWith("app"); // Returns 2
trie.insert("app");
trie.countWordsStartingWith("app"); // Returns 3
trie.erase("apple");
trie.countWordsEqualTo("apple"); // Returns 1
trie.countWordsStartingWith("app"); // Returns 2
trie.erase("apple");
trie.countWordsEqualTo("apple"); // Returns 0
trie.erase("apple"); // does nothing as apple's count is already 0

Consider edge cases such as inserting an empty string, searching for a non-existent word or prefix, and erasing a word that is not in the Trie.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word that can be inserted into the trie?
  2. Will the input consist of only lowercase English letters, or can it include other characters?
  3. If I call erase(word) and the word doesn't exist, or its count is already zero, should the function do nothing or return an error?
  4. What are the expected number of calls to each of the four methods (insert, countWordsEqualTo, countWordsStartingWith, and erase), and what is the overall scale of operations?
  5. Can I assume that the input `word` or `prefix` will never be null or empty?

Brute Force Solution

Approach

The goal is to build a system to store words so we can quickly find out how many words start with a particular sequence of letters, and how many times a specific word was inserted. A brute force approach directly uses the words themselves for these operations, without any special structure.

Here's how the algorithm would work step-by-step:

  1. To add a new word, simply store it in a list of all added words. We don't worry about organizing them in any special way.
  2. To count how many words start with a given sequence, we'll go through every word we've stored.
  3. For each stored word, we'll check if the sequence is at the beginning of the word. If it is, we increase our count.
  4. After checking all the stored words, we return the total count.
  5. To count how many times a specific word was added, we'll go through our list of stored words again.
  6. This time, we'll count how many times the exact word appears in our list.
  7. Finally, we'll return the count of how many times the exact word was added.

Code Implementation

class Trie:

    def __init__(self):
        self.word_list = []

    def insert(self, word: str) -> None:
        # Simply append the word to our word list.
        self.word_list.append(word)

    def countWordsEqualTo(self, word: str) -> int:
        word_count = 0
        # Iterate to count exact matches.
        for existing_word in self.word_list:
            if existing_word == word:
                word_count += 1
        return word_count

    def countWordsStartingWith(self, prefix: str) -> int:
        prefix_count = 0
        # We need to check all stored words.
        for existing_word in self.word_list:

            # Check if word starts with prefix.
            if existing_word.startswith(prefix):
                prefix_count += 1

        return prefix_count

    def erase(self, word: str) -> None:
        #Remove the first occurrence of the word.
        if word in self.word_list:
            self.word_list.remove(word)

Big(O) Analysis

Time Complexity
O(n*m)Adding a word takes O(1) time as we simply append it to a list. Counting words with a given prefix requires iterating through all n stored words, and for each word, checking if it starts with the given prefix of length m, which takes O(m) time. Counting occurrences of a specific word also requires iterating through all n stored words and comparing each stored word with the target word, which takes O(m) time per comparison. Therefore, the overall time complexity is O(n*m) for both counting operations.
Space Complexity
O(N)The provided solution stores all added words in a list. In the worst case, where all N words added are unique, the list will store all N words. Therefore, the auxiliary space used is proportional to the number of words added, N. No other significant data structures are used. Thus, the space complexity is O(N).

Optimal Solution

Approach

We'll use a special tree-like structure to store words. This structure helps us quickly find words with the same starting parts and also keep track of how many times a word was added or if it's a real word.

Here's how the algorithm would work step-by-step:

  1. Imagine a tree where each branch represents a letter. Start with an empty root.
  2. To add a word, follow the branches corresponding to the letters in the word, creating new branches if needed.
  3. At the end of each word's path, mark that it's a complete word and keep a counter of how many times we've added it.
  4. Also, at each branch along the word's path, keep a counter of how many words pass through that branch.
  5. To check how many times a word appears, follow its path and check the counter at the end of the word's path.
  6. To find out how many words start with a certain prefix, follow the path of the prefix and check the counter on the last branch of the prefix. This tells you how many complete words exist that contain that prefix.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.word_count = 0
        self.prefix_count = 0

class Trie:

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                current_node.children[char] = TrieNode()
            current_node = current_node.children[char]
            current_node.prefix_count += 1

        # Mark the end of the word and increase its count
        current_node.word_count += 1

    def countWordsEqualTo(self, word: str) -> int:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                return 0
            current_node = current_node.children[char]

        # Return the number of times the word was inserted
        return current_node.word_count

    def countWordsStartingWith(self, prefix: str) -> int:
        current_node = self.root
        for char in prefix:
            if char not in current_node.children:
                return 0
            current_node = current_node.children[char]

        # Return the number of words starting with the prefix
        return current_node.prefix_count

    def erase(self, word: str) -> None:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                return
            current_node = current_node.children[char]
            current_node.prefix_count -= 1

        # Only decrement if word exists
        if current_node.word_count > 0:
            current_node.word_count -= 1

Big(O) Analysis

Time Complexity
O(m)The time complexity for insert, countWordsEqualTo, countWordsStartingWith, and erase operations in a Trie (Prefix Tree) depends on the length of the input string (word or prefix), which we denote as 'm'. The insert operation traverses the tree along the path defined by the word, creating nodes if they don't exist. The count operations also traverse a path matching the word or prefix. Similarly, the erase operation follows the word's path to decrement counts. Therefore, each of these operations takes time proportional to the length of the input string, resulting in O(m) time complexity.
Space Complexity
O(M * L)The Trie data structure's space complexity depends on the number of words added (M) and the maximum length of any word (L). Each node in the Trie stores a character and potentially references to other Trie nodes, representing the next character in a word. In the worst-case scenario, where the words have no common prefixes, each word will create its own branch of length L, consuming space proportional to L for each word. Therefore, the space complexity is proportional to the total number of characters across all added words, which is O(M * L), considering M words of maximum length L. The counts at each node (words passing through and complete words) contribute constant space per node and are thus included in the M * L estimation.

Edge Cases

CaseHow to Handle
Null or empty word inserted.Treat as a valid insertion (empty word has its own presence) or raise IllegalArgumentException based on requirements.
Inserting the same word multiple times.Increment the count of the word at the terminal node to reflect its multiple occurrences.
Searching for an empty prefix.Return the total number of words in the trie (count of root node occurrences) which will effectively give words starting with empty prefix.
Erasing a word that doesn't exist.Do nothing or raise an exception; ensure no underflow of counts at Trie nodes, and count does not dip below 0.
Erasing a word with count > 1.Decrement the count of the word at the terminal node; also update the count in the path (prefix words).
Very long words are inserted, impacting memory usage.Consider limiting the maximum word length or switching to a more memory-efficient Trie implementation if memory becomes a bottleneck (e.g., compressed trie).
Inserting words with non-alphabetic characters (e.g., numbers, symbols).Define a character set (alphabet) and handle characters outside that set, possibly raising an error or mapping them to a default character.
Large number of insert/delete operations potentially leading to integer overflow of word counts.Use a larger integer type (e.g., long) to store word counts to prevent overflow, and consider handling potential maximum count limits.