Taro Logo

Implement Trie (Prefix Tree)

Medium
5 views
2 months ago

A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. Implement the Trie class with the following methods:

  • Trie(): Initializes the trie object.
  • void insert(String word): Inserts the string word into the trie.
  • boolean search(String word): Returns true if the string word is in the trie, and false otherwise.
  • boolean startsWith(String prefix): Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.

For example:

Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]

Implement the Trie class in your preferred programming language. Discuss the time and space complexity of each operation. Also, consider edge cases and how to handle them effectively.

Sample Answer

Trie Implementation

A trie (pronounced "try"), or prefix tree, is a tree-like data structure used for efficient retrieval of keys in a dataset of strings. Each node in a trie represents a prefix, and the root node represents an empty string. This structure allows for quick prefix-based search and autocomplete functionalities.

Naive Approach (Not Efficient)

A naive approach would involve storing all the strings in a list and then, for search and startsWith operations, iterating through the list to find matching strings. This method is inefficient for a large number of strings or frequent search/prefix operations.

Optimal Solution: Trie Implementation

The optimal solution uses a Trie data structure. Each node in the trie stores a character, and the path from the root to a node represents a prefix. Each node also has a flag indicating whether the prefix represented by the node is a complete word.

Code

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_word = True

    def search(self, word: str) -> bool:
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_word

    def startsWith(self, prefix: str) -> bool:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

# Example Usage
trie = Trie()
trie.insert("apple")
print(trie.search("apple"))   # return True
print(trie.search("app"))     # return False
print(trie.startsWith("app")) # return True
trie.insert("app")
print(trie.search("app"))     # return True

Diagram

Consider inserting "apple", "app", and "apricot" into the Trie. The resulting trie structure would look like this:

Root
└── a
    ├── p
    │   ├── p (is_word: True)
    │   │   └── l
    │   │       └── e (is_word: True)
    │   └── r
    │       └── i
    │           └── c
    │               └── o
    │                   └── t (is_word: True)

Big(O) Run-time Analysis

  • Insert: O(m), where m is the length of the word.
  • Search: O(m), where m is the length of the word.
  • startsWith: O(p), where p is the length of the prefix.

The time complexity for all operations is proportional to the length of the key being inserted or searched, making it very efficient for large datasets with common prefixes.

Big(O) Space Usage Analysis

  • The space complexity is O(N*k), where N is the number of words in the trie and k is the average length of the word or the alphabet size. Each node stores references to potentially all characters of the alphabet in its children dictionary. Therefore, the space used depends on the number of words and their lengths as well as the size of the alphabet.

Edge Cases

  1. Empty String:

    • Inserting an empty string should be handled gracefully, usually by setting the is_word flag at the root node.
    • Searching for an empty string should return True if the root node's is_word flag is set, and False otherwise.
    • startsWith with an empty string should always return True as every trie effectively starts with an empty string.
  2. Non-lowercase Characters:

    • The code should either handle non-lowercase characters (e.g., by converting them to lowercase before insertion/search) or explicitly reject words containing such characters.
  3. Large Number of Strings with No Common Prefixes:

    • If the trie contains a large number of strings with no common prefixes, the space usage can become significant, as each string will add a separate branch to the trie.
  4. Prefix Longer than Stored Words:

    • The startsWith function should return False if the given prefix is longer than any word stored in the trie and does not match any existing prefixes.