Taro Logo

Implement Trie (Prefix Tree)

Medium
Roblox logo
Roblox
0 views
Topics:
StringsTrees

A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

Implement the Trie class:

  • Trie() Initializes the trie object.
  • void insert(String word) Inserts the string word into the trie.
  • boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
  • boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.

Example 1:

Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]

Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple");   // return True
trie.search("app");     // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app");     // return True

Constraints:

  • 1 <= word.length, prefix.length <= 2000
  • word and prefix consist only of lowercase English letters.
  • At most 3 * 104 calls in total will be made to insert, search, and startsWith.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters will the words and prefixes consist of? Can I assume they are all lowercase English letters?
  2. Are there any limits on the length of the words that will be inserted or searched for?
  3. How frequently will each operation (insert, search, startsWith) be called, relative to the others? Should I optimize for any particular operation?
  4. Is the Trie case-sensitive? For example, should `search("apple")` return true if only `insert("Apple")` was called?
  5. What should `search` or `startsWith` return if the input word/prefix is an empty string?

Brute Force Solution

Approach

Let's build a system to store words in a way that makes it easy to find words that start with a specific sequence of letters. A brute force approach involves checking every single existing word to see if it starts with the input sequence.

Here's how the algorithm would work step-by-step:

  1. To start, we keep a simple list of all the words we need to store.
  2. When we want to add a new word, we just add it to our list.
  3. If someone asks us to find words that start with a specific sequence of letters, we go through every single word in our list.
  4. For each word, we check if it begins with the given sequence of letters.
  5. If a word does start with the sequence, we add it to a separate list of matching words.
  6. After checking all the words in our original list, we return the list of matching words.

Code Implementation

class BruteForceTrie:
    def __init__(self):
        self.word_list = []

    def insert(self, word):
        self.word_list.append(word)

    def search_prefix(self, prefix):
        matching_words = []

        # Iterate through the list of stored words
        for existing_word in self.word_list:

            # Check if the current word starts with the given prefix
            if existing_word.startswith(prefix):
                matching_words.append(existing_word)

        return matching_words

Big(O) Analysis

Time Complexity
Space Complexity
O(K)The provided algorithm uses an auxiliary list to store the matching words. In the worst case, every word in the original list of N words starts with the given prefix. If K is the number of matching words that start with the input sequence, then the algorithm will allocate a list of size K to store these words. Thus, the auxiliary space complexity is O(K), where K is the number of matching words. In the worst case, K could be equal to N.

Optimal Solution

Approach

To efficiently store and search words, we'll create a special tree-like structure. Each level of the tree represents a character, and following a path down the tree spells out a word.

Here's how the algorithm would work step-by-step:

  1. Imagine each node in the tree as a letter in a word.
  2. Starting with an empty root node, when we add a word, we go down the tree creating new nodes for each letter if they don't already exist.
  3. If a node already exists for a letter, we just follow that existing path.
  4. At the end of each word, we mark the last node to indicate that a word ends there.
  5. To search for a word, we follow the path of letters in the tree.
  6. If we reach the end of the word and the last node is marked as the end of a word, then the word exists.
  7. To check for a prefix, we follow the letters in the tree.
  8. If we can find all the letters of the prefix, then the prefix exists, even if the last node isn't the end of a complete word.

Code Implementation

class Trie:

    def __init__(self):
        self.root_node = {}

    def insert(self, word):
        current_node = self.root_node
        for char in word:
            if char not in current_node:
                # Create node if it doesn't exist
                current_node[char] = {}

            current_node = current_node[char]

        # Mark the end of a word
        current_node['#'] = True

    def search(self, word):
        current_node = self.root_node
        for char in word:
            if char not in current_node:
                return False
            current_node = current_node[char]

        # Check if the node represents the end of a word
        return '#' in current_node

    def startsWith(self, prefix):
        current_node = self.root_node
        for char in prefix:
            if char not in current_node:
                return False
            current_node = current_node[char]

        # Prefix found if we reach here
        return True

Big(O) Analysis

Time Complexity
O(m)The time complexity of the insert, search, and startsWith operations in a Trie is primarily determined by the length of the input string, represented by 'm'. When inserting a word, we iterate through each character of the word to create or traverse nodes in the Trie. Similarly, for search and startsWith, we traverse the Trie following the characters of the input string. Therefore, the number of operations grows linearly with the length 'm' of the word being inserted, searched, or checked for prefix existence, resulting in a time complexity of O(m).
Space Complexity
O(M*L)The space complexity is primarily determined by the Trie's node structure. For each word inserted, nodes are created for each character in the word, up to the word's length. In the worst case, where there are M words and each word has a maximum length of L and shares no common prefixes, the Trie could contain M * L nodes. Each node stores a fixed number of pointers (one for each possible character), contributing to the space usage per node. Therefore, the auxiliary space grows proportionally to the total number of characters across all words, resulting in O(M*L) space complexity.

Edge Cases

CaseHow to Handle
Null or empty word for insertTreat as valid and insert empty node or return immediately without inserting anything, depending on the requirements/API specification.
Null or empty prefix for startsWithReturn true if the Trie is not empty, false otherwise, according to if the root exists.
Null or empty word for searchReturn true only if a node representing the empty word was previously inserted and marked as the end of a word.
Inserting the same word multiple timesThe Trie should correctly update the 'isEndOfWord' flag on the last character's node for each insertion, without creating duplicates.
Searching for a prefix that is a complete wordShould return true if the prefix is a word and has been inserted.
Inserting words with non-alphabetic charactersThe solution needs to specify supported charsets and handle them accordingly (e.g., throw an exception or ignore characters if outside the supported set).
Large number of words inserted, memory constraintsThe Trie's memory usage can grow significantly with the number of words; solution should be designed to be memory-efficient, if memory limit is imposed.
Words with extremely long lengthCheck for stack overflow possibility, if using recursion, and potentially use iterative approach instead.