Taro Logo

Occurrences After Bigram

Easy
Asked by:
Profile picture
10 views
Topics:
ArraysStrings

Given two strings first and second, consider occurrences in some text of the form "first second third", where second comes immediately after first, and third comes immediately after second.

Return an array of all the words third for each occurrence of "first second third".

Example 1:

Input: text = "alice is a good girl she is a good student", first = "a", second = "good"
Output: ["girl","student"]

Example 2:

Input: text = "we will we will rock you", first = "we", second = "will"
Output: ["we","rock"]

Constraints:

  • 1 <= text.length <= 1000
  • text consists of lowercase English letters and spaces.
  • All the words in text are separated by a single space.
  • 1 <= first.length, second.length <= 10
  • first and second consist of lowercase English letters.
  • text will not have any leading or trailing spaces.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input text consist of characters other than lowercase English letters and spaces?
  2. What should I return if the bigram 'first second' does not appear in the text at all?
  3. If the 'third' word appears multiple times after the bigram, should I return all occurrences?
  4. Are 'first' and 'second' guaranteed to be different words?
  5. What are the maximum lengths of the input text string and the individual words 'first' and 'second'?

Brute Force Solution

Approach

We're given some text and two words (a bigram). The goal is to find all words that appear directly after the bigram in the text. The brute force method involves examining every possible position in the text and checking if the bigram exists at that position.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the text.
  2. Check if the first and second words in the text match our bigram.
  3. If they do match, record the word immediately following the bigram.
  4. Move to the next position in the text (one word forward).
  5. Again, check if the two words starting at this new position match our bigram.
  6. If they do, record the word after the bigram.
  7. Continue this process, moving one word at a time, until we reach the end of the text.
  8. Finally, list all the words we recorded as appearing after the bigram. These are our results.

Code Implementation

def find_occurrences_after_bigram(text, first_word, second_word):
    words = text.split()
    result = []

    # Iterate through the words, stopping early enough to check for the bigram
    for index in range(len(words) - 2):

        # Check if the current word and the next word match the bigram
        if words[index] == first_word and words[index + 1] == second_word:

            # Add the word after the bigram to the result
            result.append(words[index + 2])

    return result

Big(O) Analysis

Time Complexity
O(n)The described algorithm iterates through the text, word by word, up to n times, where n is the number of words in the text. At each position, a constant amount of work is done: checking if the current word and the next word match the given bigram and, if they do, recording the subsequent word. Therefore, the time complexity is directly proportional to the number of words in the text, resulting in O(n).
Space Complexity
O(N)The primary auxiliary space usage comes from storing the words that appear after the bigram. In the worst-case scenario, every occurrence of the bigram might be followed by a unique word, which would mean storing up to N words, where N is the number of words in the input text. The list used to store these words therefore grows linearly with the input size. This results in an auxiliary space complexity of O(N).

Optimal Solution

Approach

The problem asks us to find words that appear immediately after a specific pair of words. The trick is to efficiently scan the text only once, keeping track of the previous two words. If they match our target pair, we record the following word.

Here's how the algorithm would work step-by-step:

  1. Start by splitting the given text into a sequence of individual words.
  2. Go through the list of words, one by one, from beginning to end.
  3. For each word, check if the two words before it match the specific pair of words we are looking for.
  4. If the two previous words match the target pair, then add the current word to our list of results.
  5. Continue doing this until we reach the end of the word list.
  6. Finally, return the list of words that appeared after the target pair.

Code Implementation

def find_occurrences(text, first_word, second_word):
    words = text.split()
    result = []
    # Iterate through the words starting from the third word.
    for i in range(2, len(words)):
        # Check if the previous two words match the target bigram.
        if words[i - 2] == first_word and words[i - 1] == second_word:
            # Add the current word to the result.
            result.append(words[i])

    return result

Big(O) Analysis

Time Complexity
O(n)The algorithm splits the text into n words, where n is the total number of words in the input text. It then iterates through this list of words once. Inside the loop, the check to see if the previous two words match the target bigram takes constant time, O(1). Therefore, the overall time complexity is dominated by the single iteration through the n words, resulting in O(n) time complexity.
Space Complexity
O(N)The primary space complexity comes from the list of results we are building to store the words that appear after the target bigram. In the worst-case scenario, every word after the first two in the input text could potentially follow the bigram. Therefore, in the worst case, the size of the list of results could grow linearly with the input size N (the number of words in the text). Thus, the auxiliary space complexity is O(N).

Edge Cases

Empty text string
How to Handle:
Return an empty list immediately, as there can be no bigram occurrences.
Text string with less than three words
How to Handle:
Return an empty list immediately, as a bigram requires at least two words followed by a potential third.
bigram first and second words are identical
How to Handle:
The solution should correctly identify occurrences even if the bigram words are the same.
Text contains only the bigram, repeated multiple times
How to Handle:
The solution should identify all occurrences where the bigram is followed by another word.
Very long text string (scalability)
How to Handle:
Ensure the solution's time complexity is linear or near-linear (e.g., O(n)) for efficient processing.
bigram words are at the very end of the text string
How to Handle:
The code needs to avoid index out-of-bounds errors when checking for the third word following the bigram.
The third word (after the bigram) appears multiple times in the text.
How to Handle:
The solution should correctly identify all instances of the third word following the specified bigram, and add them to the output.
Case sensitivity in word matching (e.g., 'the' vs 'The').
How to Handle:
Convert all words to lowercase before comparison to ensure case-insensitive matching.