Taro Logo

Stream of Characters

Hard
Asked by:
Profile picture
Profile picture
5 views
Topics:
StringsArrays

Design an algorithm that accepts a stream of characters and checks if a suffix of these characters is a string of a given array of strings words.

For example, if words = ["abc", "xyz"] and the stream added the four characters (one by one) 'a', 'x', 'y', and 'z', your algorithm should detect that the suffix "xyz" of the characters "axyz" matches "xyz" from words.

Implement the StreamChecker class:

  • StreamChecker(String[] words) Initializes the object with the strings array words.
  • boolean query(char letter) Accepts a new character from the stream and returns true if any non-empty suffix from the stream forms a word that is in words.

Example 1:

Input
["StreamChecker", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query"]
[[["cd", "f", "kl"]], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"], ["l"]]
Output
[null, false, false, false, true, false, true, false, false, false, false, false, true]

Explanation
StreamChecker streamChecker = new StreamChecker(["cd", "f", "kl"]);
streamChecker.query("a"); // return False
streamChecker.query("b"); // return False
streamChecker.query("c"); // return False
streamChecker.query("d"); // return True, because 'cd' is in the wordlist
streamChecker.query("e"); // return False
streamChecker.query("f"); // return True, because 'f' is in the wordlist
streamChecker.query("g"); // return False
streamChecker.query("h"); // return False
streamChecker.query("i"); // return False
streamChecker.query("j"); // return False
streamChecker.query("k"); // return False
streamChecker.query("l"); // return True, because 'kl' is in the wordlist

Constraints:

  • 1 <= words.length <= 2000
  • 1 <= words[i].length <= 200
  • words[i] consists of lowercase English letters.
  • letter is a lowercase English letter.
  • At most 4 * 104 calls will be made to query.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a word in the `words` list, and what is the maximum number of words we can expect in the list?
  2. Can the characters in the stream and words be any Unicode character, or are they limited to a specific character set, like ASCII or lowercase English letters?
  3. Are the words in the `words` list guaranteed to be unique, or could there be duplicates?
  4. If multiple suffixes of the stream match words in the `words` list, does the `query` function need to indicate all of them, or just return true if at least one exists?
  5. What should the data structure do upon initialization with an empty `words` list? Should it still be able to process the stream or throw an exception when querying?

Brute Force Solution

Approach

We want to see if characters from a stream form a word from a given dictionary. The brute force method simply checks every possible prefix of the stream against all words in the dictionary.

Here's how the algorithm would work step-by-step:

  1. Take the first character from the stream.
  2. Check if that single character is a word in the dictionary.
  3. Take the first two characters from the stream.
  4. Check if that two-character sequence is a word in the dictionary.
  5. Keep adding characters one at a time from the stream to your sequence.
  6. For each new sequence, check if the sequence is a word in the dictionary.
  7. If you find a sequence that's a word, remember that you found it.
  8. Continue this process as the stream provides more characters.

Code Implementation

def stream_of_characters_brute_force(words):
    streamed_characters = []
    def query(character):
        streamed_characters.append(character)
        for start_index in range(len(streamed_characters)):

            # Iterate through possible start indices in the stream
            current_string = "".join(streamed_characters[start_index:])

            # Check if current_string matches any word
            if current_string in words:
                return True

        #If no match, we have to return False
        return False

    return query

Big(O) Analysis

Time Complexity
O(n*m*k)Let n be the length of the stream, m be the number of words in the dictionary, and k be the average length of a word in the dictionary. For each prefix of the stream, which can have lengths from 1 to n, we iterate through the dictionary of m words. For each word in the dictionary, we compare it to the current prefix, which takes O(k) time in the worst case if the word's length is k. Thus, for each of the n prefixes, we do m comparisons, each costing k, resulting in a time complexity of O(n*m*k).
Space Complexity
O(1)The provided algorithm iteratively builds a sequence of characters from the stream and checks if this sequence exists in the dictionary. While the sequence itself grows with each character from the stream, the algorithm doesn't persistently store all prefixes. It only keeps track of the current sequence being checked. Since the size of the dictionary is fixed and the algorithm only stores the current prefix being checked and a boolean to indicate if any word was found, the auxiliary space used is constant, independent of the length of the stream.

Optimal Solution

Approach

We're given a stream of characters and need to quickly check if any recent sequence of characters matches a given set of words. The trick is to build a specialized data structure that efficiently stores the words we're looking for, enabling fast prefix matching as new characters arrive.

Here's how the algorithm would work step-by-step:

  1. First, we'll create a special tree-like structure from the list of words. Each level of the tree represents a character in a word, and paths from the root to a leaf represent complete words.
  2. We'll also keep track of which nodes in the tree represent the end of a valid word.
  3. As the stream of characters comes in, we'll start at the root of our tree.
  4. For each new character, we'll try to follow a matching path in the tree. If we can, we move down the tree.
  5. If there's no matching path, we go back to the root and start again with the new character.
  6. If we reach a node that marks the end of a valid word, we know we've found a match in the recent character stream.
  7. This lets us quickly check for words without having to compare the incoming characters to every single word in our list each time.

Code Implementation

class StreamChecker:

    def __init__(self, words):
        self.trie = { }
        self.words = words
        for word in words:
            current_node = self.trie
            for char in reversed(word):
                if char not in current_node:
                    current_node[char] = { }
                current_node = current_node[char]
            current_node['#'] = True
        self.current_string = ""

    def query(self, letter):
        self.current_string += letter
        current_node = self.trie
        for char in reversed(self.current_string):
            if char not in current_node:
                return False

            current_node = current_node[char]
            # Check for complete word. 
            if '#' in current_node:
                return True

        return False

# Your StreamChecker object will be instantiated and called as such:
# stream_checker = StreamChecker(words)
# result = stream_checker.query(letter)

Big(O) Analysis

Time Complexity
O(m*k)Let m be the length of the character stream and k be the maximum length of any word in the dictionary. For each character in the stream (m), we traverse the Trie. In the worst case, for each character, we might traverse a path in the Trie of length k, the maximum word length, before either finding a match or resetting to the root. Therefore, the time complexity is O(m*k).
Space Complexity
O(T)The auxiliary space is dominated by the Trie (tree-like structure) created from the given set of words. The space used by the Trie depends on the total number of characters across all words and how much overlap they have. In the worst case, where there is minimal overlap between the words, the Trie would store all the characters of all the words. Therefore, if T represents the total number of characters in all the words we are searching for, the space complexity is O(T).

Edge Cases

CaseHow to Handle
words is null or emptyInitialize the data structure with an empty Trie and return false for all queries.
words contains an empty stringTreat the empty string as a valid word in the Trie.
words contains duplicate wordsThe Trie construction will naturally handle duplicates without issues.
Stream contains only a single character which matches a wordShould correctly return true as the single character is a word suffix.
Long stream where no suffix matches any word in the 'words' arrayConsistently return false, demonstrating the algorithm avoids false positives.
A word in 'words' is a prefix of another word in 'words'Trie should correctly identify both words as valid suffixes if they appear in the stream.
Maximum length of the stream or the words is extremely large, exceeding available memoryThe Trie data structure's memory consumption should be carefully considered to prevent memory overflow, perhaps by limiting the max length of words in words.
The character stream is extremely long and the prefix is a substring of a word. This substring does not represent a valid word until the stream ends with the remainder of the word.The Trie correctly identifies valid words when they are suffixes of the stream.