Taro Logo

Occurrences After Bigram

Easy
Google logo
Google
2 views
Topics:
ArraysStrings

Given two strings first and second, consider occurrences in some text of the form "first second third", where second comes immediately after first, and third comes immediately after second.

Return an array of all the words third for each occurrence of "first second third".

Example 1:

Input: text = "alice is a good girl she is a good student", first = "a", second = "good"
Output: ["girl","student"]

Example 2:

Input: text = "we will we will rock you", first = "we", second = "will"
Output: ["we","rock"]

Constraints:

  • 1 <= text.length <= 1000
  • text consists of lowercase English letters and spaces.
  • All the words in text are separated by a single space.
  • 1 <= first.length, second.length <= 10
  • first and second consist of lowercase English letters.
  • text will not have any leading or trailing spaces.

Write a function that takes the text, first word, and second word as input and returns a list of all third words that occur after the first and second words in the specified order. What are the time and space complexities of your solution? How does your solution handle edge cases like empty text or when the first and second words are not found?

Solution


Naive Solution

A straightforward approach is to split the input text into words and then iterate through the words, checking for the occurrences of first and second in sequence. If found, the subsequent word is added to the result.

Code

def find_third_word_naive(text: str, first: str, second: str) -> list[str]:
    words = text.split()
    result = []
    for i in range(len(words) - 2):
        if words[i] == first and words[i + 1] == second:
            result.append(words[i + 2])
    return result

Big O Runtime

  • Time Complexity: O(N), where N is the number of words in the text.
  • Space Complexity: O(N), to store the words array.

Optimal Solution

The optimal solution is essentially the same as the naive approach since we need to iterate through the text to find the occurrences. However, we can improve readability and maintainability.

Code

def find_third_word_optimal(text: str, first: str, second: str) -> list[str]:
    words = text.split()
    result = []
    n = len(words)
    for i in range(n - 2):
        if words[i] == first and words[i+1] == second:
            result.append(words[i+2])
    return result

Big O Runtime

  • Time Complexity: O(N), where N is the number of words in the text.
  • Space Complexity: O(M), where M is the number of occurrences of the third word.

Edge Cases

  • Empty input text: Return an empty list.
  • Text with fewer than three words: Return an empty list.
  • first or second not found in the text: Return an empty list.
  • Overlapping occurrences: The algorithm should handle cases where first and second appear multiple times and their third words need to be captured.

Summary of Optimizations

Since the problem requires us to find all occurrences, we must iterate through the text at least once. Thus, both the naive and optimal approaches have the same time complexity of O(N), where N is the number of words. Space complexity depends on storing the words in a list and the number of matching "third" words found. No significant optimizations are possible without changing the problem constraints.