Taro Logo

Implement Trie II (Prefix Tree)

Medium
Asked by:
Profile picture
Profile picture
6 views
Topics:
StringsTrees

A Trie (also known as a prefix tree) is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

Implement the Trie class:

  • Trie() Initializes the trie object.
  • void insert(String word) Inserts the string word into the trie.
  • int countWordsEqualTo(String word) Returns the number of instances of the string word in the trie.
  • int countWordsStartingWith(String prefix) Returns the number of strings in the trie that have the string prefix as a prefix.
  • void erase(String word) Erases the string word from the trie.

Example 1:

Input
["Trie", "insert", "insert", "countWordsEqualTo", "countWordsStartingWith", "erase", "countWordsEqualTo", "countWordsStartingWith", "erase", "countWordsStartingWith"]
[[], ["apple"], ["apple"], ["apple"], ["app"], ["apple"], ["apple"], ["app"], ["apple"], ["app"]]
Output
[null, null, null, 2, 2, null, 1, 1, null, 0]

Explanation
Trie trie = new Trie();
trie.insert("apple");   // Inserts "apple".
trie.insert("apple");   // Inserts another "apple".
trie.countWordsEqualTo("apple"); // Returns 2, because there are two "apple" strings in the trie.
trie.countWordsStartingWith("app"); // Returns 2, because there are two strings which start with "app".
trie.erase("apple"); // Erases one "apple".
trie.countWordsEqualTo("apple"); // Returns 1, because there is still one "apple" in the trie.
trie.countWordsStartingWith("app"); // Returns 1, because there is still one string which starts with "app".
trie.erase("apple"); // Erases "apple". Now the trie is empty.
trie.countWordsStartingWith("app"); // Returns 0, because there are no strings which start with "app".

Constraints:

  • 1 <= word.length, prefix.length <= 2000
  • word and prefix consist only of lowercase English letters.
  • At most 3 * 104 calls in total will be made to insert, countWordsEqualTo, countWordsStartingWith, and erase.
  • It is guaranteed that for any operation, the string word will exist in the trie when it is called.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word that can be inserted into the trie?
  2. Will the input consist of only lowercase English letters, or can it include other characters?
  3. If I call erase(word) and the word doesn't exist, or its count is already zero, should the function do nothing or return an error?
  4. What are the expected number of calls to each of the four methods (insert, countWordsEqualTo, countWordsStartingWith, and erase), and what is the overall scale of operations?
  5. Can I assume that the input `word` or `prefix` will never be null or empty?

Brute Force Solution

Approach

The goal is to build a system to store words so we can quickly find out how many words start with a particular sequence of letters, and how many times a specific word was inserted. A brute force approach directly uses the words themselves for these operations, without any special structure.

Here's how the algorithm would work step-by-step:

  1. To add a new word, simply store it in a list of all added words. We don't worry about organizing them in any special way.
  2. To count how many words start with a given sequence, we'll go through every word we've stored.
  3. For each stored word, we'll check if the sequence is at the beginning of the word. If it is, we increase our count.
  4. After checking all the stored words, we return the total count.
  5. To count how many times a specific word was added, we'll go through our list of stored words again.
  6. This time, we'll count how many times the exact word appears in our list.
  7. Finally, we'll return the count of how many times the exact word was added.

Code Implementation

class Trie:

    def __init__(self):
        self.word_list = []

    def insert(self, word: str) -> None:
        # Simply append the word to our word list.
        self.word_list.append(word)

    def countWordsEqualTo(self, word: str) -> int:
        word_count = 0
        # Iterate to count exact matches.
        for existing_word in self.word_list:
            if existing_word == word:
                word_count += 1
        return word_count

    def countWordsStartingWith(self, prefix: str) -> int:
        prefix_count = 0
        # We need to check all stored words.
        for existing_word in self.word_list:

            # Check if word starts with prefix.
            if existing_word.startswith(prefix):
                prefix_count += 1

        return prefix_count

    def erase(self, word: str) -> None:
        #Remove the first occurrence of the word.
        if word in self.word_list:
            self.word_list.remove(word)

Big(O) Analysis

Time Complexity
O(n*m)Adding a word takes O(1) time as we simply append it to a list. Counting words with a given prefix requires iterating through all n stored words, and for each word, checking if it starts with the given prefix of length m, which takes O(m) time. Counting occurrences of a specific word also requires iterating through all n stored words and comparing each stored word with the target word, which takes O(m) time per comparison. Therefore, the overall time complexity is O(n*m) for both counting operations.
Space Complexity
O(N)The provided solution stores all added words in a list. In the worst case, where all N words added are unique, the list will store all N words. Therefore, the auxiliary space used is proportional to the number of words added, N. No other significant data structures are used. Thus, the space complexity is O(N).

Optimal Solution

Approach

We'll use a special tree-like structure to store words. This structure helps us quickly find words with the same starting parts and also keep track of how many times a word was added or if it's a real word.

Here's how the algorithm would work step-by-step:

  1. Imagine a tree where each branch represents a letter. Start with an empty root.
  2. To add a word, follow the branches corresponding to the letters in the word, creating new branches if needed.
  3. At the end of each word's path, mark that it's a complete word and keep a counter of how many times we've added it.
  4. Also, at each branch along the word's path, keep a counter of how many words pass through that branch.
  5. To check how many times a word appears, follow its path and check the counter at the end of the word's path.
  6. To find out how many words start with a certain prefix, follow the path of the prefix and check the counter on the last branch of the prefix. This tells you how many complete words exist that contain that prefix.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.word_count = 0
        self.prefix_count = 0

class Trie:

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                current_node.children[char] = TrieNode()
            current_node = current_node.children[char]
            current_node.prefix_count += 1

        # Mark the end of the word and increase its count
        current_node.word_count += 1

    def countWordsEqualTo(self, word: str) -> int:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                return 0
            current_node = current_node.children[char]

        # Return the number of times the word was inserted
        return current_node.word_count

    def countWordsStartingWith(self, prefix: str) -> int:
        current_node = self.root
        for char in prefix:
            if char not in current_node.children:
                return 0
            current_node = current_node.children[char]

        # Return the number of words starting with the prefix
        return current_node.prefix_count

    def erase(self, word: str) -> None:
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                return
            current_node = current_node.children[char]
            current_node.prefix_count -= 1

        # Only decrement if word exists
        if current_node.word_count > 0:
            current_node.word_count -= 1

Big(O) Analysis

Time Complexity
O(m)The time complexity for insert, countWordsEqualTo, countWordsStartingWith, and erase operations in a Trie (Prefix Tree) depends on the length of the input string (word or prefix), which we denote as 'm'. The insert operation traverses the tree along the path defined by the word, creating nodes if they don't exist. The count operations also traverse a path matching the word or prefix. Similarly, the erase operation follows the word's path to decrement counts. Therefore, each of these operations takes time proportional to the length of the input string, resulting in O(m) time complexity.
Space Complexity
O(M * L)The Trie data structure's space complexity depends on the number of words added (M) and the maximum length of any word (L). Each node in the Trie stores a character and potentially references to other Trie nodes, representing the next character in a word. In the worst-case scenario, where the words have no common prefixes, each word will create its own branch of length L, consuming space proportional to L for each word. Therefore, the space complexity is proportional to the total number of characters across all added words, which is O(M * L), considering M words of maximum length L. The counts at each node (words passing through and complete words) contribute constant space per node and are thus included in the M * L estimation.

Edge Cases

Null or empty word inserted.
How to Handle:
Treat as a valid insertion (empty word has its own presence) or raise IllegalArgumentException based on requirements.
Inserting the same word multiple times.
How to Handle:
Increment the count of the word at the terminal node to reflect its multiple occurrences.
Searching for an empty prefix.
How to Handle:
Return the total number of words in the trie (count of root node occurrences) which will effectively give words starting with empty prefix.
Erasing a word that doesn't exist.
How to Handle:
Do nothing or raise an exception; ensure no underflow of counts at Trie nodes, and count does not dip below 0.
Erasing a word with count > 1.
How to Handle:
Decrement the count of the word at the terminal node; also update the count in the path (prefix words).
Very long words are inserted, impacting memory usage.
How to Handle:
Consider limiting the maximum word length or switching to a more memory-efficient Trie implementation if memory becomes a bottleneck (e.g., compressed trie).
Inserting words with non-alphabetic characters (e.g., numbers, symbols).
How to Handle:
Define a character set (alphabet) and handle characters outside that set, possibly raising an error or mapping them to a default character.
Large number of insert/delete operations potentially leading to integer overflow of word counts.
How to Handle:
Use a larger integer type (e.g., long) to store word counts to prevent overflow, and consider handling potential maximum count limits.