Taro Logo

Minimum Window Subsequence

Hard
Amazon logo
Amazon
4 views
Topics:
StringsDynamic ProgrammingTwo Pointers

Given two strings S and T, find the minimum window in S which will contain all the characters in T in subsequence. In other words, find the shortest subsequence of S such that T is a subsequence of it.

Example 1:

S = "abcdebdde", T = "bde"
Output: "bcde"
Explanation: "bcde" is the shortest subsequence of S that contains T as a subsequence.

Example 2:

S = "jmeqqgkqlhg", T = "u"
Output: ""
Explanation: T is not a subsequence of S.

Constraints:

  • 1 <= S.length, T.length <= 10^5
  • S and T consist of lowercase English letters.

Could you implement a function that takes two strings S and T as input and returns the minimum window subsequence of S that contains T?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the strings S and T contain duplicate characters, and if so, how should they be handled?
  2. If there is no subsequence in S that contains T, what should the function return (e.g., null, empty string, or a specific error code)?
  3. What are the maximum lengths of strings S and T? Are there any memory constraints I should be aware of?
  4. Is T guaranteed to be non-empty? What if T is an empty string?
  5. If there are multiple minimum window subsequences, is it acceptable to return any one of them?

Brute Force Solution

Approach

The brute force approach to finding the shortest window involves checking every possible small chunk within the larger piece of text. We see if that chunk contains our target sequence, and keep track of the smallest valid chunk we've found so far.

Here's how the algorithm would work step-by-step:

  1. Start with the very first small part of the big text.
  2. See if the small part has all the pieces of the sequence we're looking for, and in the right order.
  3. If it does, remember its size and where it begins.
  4. Now, look at the next small part of the big text (shifted by one).
  5. Again, check if this new part contains the whole sequence in the correct order.
  6. If it does and it's smaller than the smallest part we've found so far, then remember this new part instead.
  7. Keep moving through the big text, checking every possible small part one by one.
  8. When you reach the end of the big text, the small part you remembered will be the shortest part that contains the entire sequence.

Code Implementation

def min_window_subsequence_brute_force(text_string, sequence_string):
    minimum_window_start = ""
    minimum_window_length = float('inf')

    for text_start_index in range(len(text_string)):
        for text_end_index in range(text_start_index, len(text_string)):
            subsequence_index = 0
            #Iterate over the text chunk.

            for text_index in range(text_start_index, text_end_index + 1):
                #Compare characters in text with subsequence.
                if subsequence_index < len(sequence_string) and \
                   text_string[text_index] == sequence_string[subsequence_index]:
                    subsequence_index += 1

            # See if the subsequence was fully found.
            if subsequence_index == len(sequence_string):

                current_window_length = text_end_index - text_start_index + 1

                #If this window is smaller, update it.
                if current_window_length < minimum_window_length:

                    minimum_window_start = text_string[text_start_index : text_end_index + 1]
                    minimum_window_length = current_window_length

    return minimum_window_start

Big(O) Analysis

Time Complexity
O(m*n)The outer loop iterates through all possible starting positions in the larger string (m), taking O(m) time. For each starting position, we potentially iterate through the larger string (m) to find a matching subsequence of the smaller string (n) , resulting in O(n) operations in the worst case. Therefore, the overall time complexity is O(m*n), where m is the length of the larger string and n is the length of the subsequence.
Space Complexity
O(1)The brute force approach described only involves keeping track of the start and end indices of the current window and the best (smallest) window found so far. These indices are stored in a few constant-size variables. Therefore, the amount of extra memory used doesn't depend on the size of the input strings. The space complexity is constant.

Optimal Solution

Approach

The most efficient way to solve this problem is to use a two-pointer approach that focuses on finding the shortest possible chunk of the first string that contains the second string as a subsequence. We slide a 'window' across the first string, adjusting its start and end points to find the smallest valid window.

Here's how the algorithm would work step-by-step:

  1. Start with the smallest possible window, beginning at the start of both strings.
  2. Move the right edge of the window until we find a subsequence of the second string inside the window.
  3. Once you have a valid window, try shrinking it from the left side to find a smaller valid window.
  4. Keep track of the smallest valid window you've found so far.
  5. Slide the right edge of the window again to find the next potential valid window and repeat steps 3 and 4.
  6. Continue this process until the right edge of the window reaches the end of the first string.
  7. The smallest window found during this process is the minimum window subsequence.

Code Implementation

def minimum_window_subsequence(string1, string2):
    string1_length = len(string1)
    string2_length = len(string2)

    minimum_window_start = 0
    minimum_window_length = float('inf')

    string2_index = 0

    for string1_index in range(string1_length):
        if string1[string1_index] == string2[string2_index]:
            string2_index += 1

        # When we find a match for string2, 
        # we attempt to minimize the window size.
        if string2_index == string2_length:
            window_end = string1_index + 1
            window_start = string1_index
            string2_index -= 1

            while string2_index >= 0:
                if string1[window_start] == string2[string2_index]:
                    string2_index -= 1
                window_start -= 1

            window_start += 1
            window_length = window_end - window_start

            # Updating our minimum window.
            if window_length < minimum_window_length:
                minimum_window_length = window_length
                minimum_window_start = window_start

            string2_index = 0

    # Return empty string if no valid window found.
    if minimum_window_length == float('inf'):
        return ""

    # Extracting the min window subsequence from string1.
    return string1[minimum_window_start:minimum_window_start + minimum_window_length]

Big(O) Analysis

Time Complexity
O(m*n)Let n be the length of string s1 and m be the length of string s2. The outer loop iterates through s1 (length n). The inner loop potentially iterates through both s1 and s2 to find a subsequence (potentially length m). The worst-case scenario involves repeatedly scanning portions of s1 to match the subsequence s2 at each step of the outer loop. Thus the time complexity is O(m*n).
Space Complexity
O(1)The algorithm uses a constant amount of extra space. It primarily stores a few integer variables to keep track of the window's start and end positions, and the best window found so far. No auxiliary data structures like arrays, hash maps, or recursion are used that would scale with the input string lengths. Therefore, the space complexity is independent of the input size and remains constant.

Edge Cases

CaseHow to Handle
Empty string S or TReturn an empty string since no subsequence can be formed if either string is empty.
Length of S is less than length of TReturn an empty string since T cannot be a subsequence of S if S is shorter.
T is not a subsequence of SReturn an empty string when no valid window is found.
S and T are identicalReturn S (or T) as the minimum window subsequence since they are the same.
T contains characters not present in SReturn an empty string since a character in T must be present in S to be a subsequence.
S and T contain only one characterIf the characters match, return that single character; otherwise, return an empty string.
Long strings S and T, causing potential performance issuesOptimize the solution using dynamic programming with memoization to avoid redundant calculations and potentially reduce time complexity.
Multiple minimum window subsequences existThe algorithm should return only one of the minimum window subsequences; the specific one depends on the implementation.