Taro Logo

Minimum Window Substring

Hard
Snap logo
Snap
2 views
Topics:
StringsSliding WindowsTwo Pointers

Given two strings s and t of lengths m and n respectively, return the minimum window substring of s such that every character in t (including duplicates) is included in the window. If there is no such substring, return the empty string "".

The testcases will be generated such that the answer is unique.

Example 1:

Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
Explanation: The minimum window substring "BANC" includes 'A', 'B', and 'C' from string t.

Example 2:

Input: s = "a", t = "a"
Output: "a"
Explanation: The entire string s is the minimum window.

Example 3:

Input: s = "a", t = "aa"
Output: ""
Explanation: Both 'a's from t must be included in the window.
Since the largest window of s only has one 'a', return empty string.

Constraints:

  • m == s.length
  • n == t.length
  • 1 <= m, n <= 105
  • s and t consist of uppercase and lowercase English letters.

Follow up: Could you find an algorithm that runs in O(m + n) time?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected behavior if string `t` is longer than string `s`? Should I return an empty string or is that an invalid input?
  2. Can the input strings `s` and `t` contain characters outside of the ASCII range, or are they limited to a specific character set?
  3. If there are multiple minimum windows, is it acceptable to return any one of them, or is there a specific criterion for choosing one?
  4. What should I return if string `s` does not contain all the characters in string `t`? Should I return an empty string or null?
  5. Can string `t` contain duplicate characters, and if so, how should that impact my search for the minimum window in string `s`?

Brute Force Solution

Approach

The brute force approach for finding the smallest section of text containing specific characters involves checking every possible section, one by one. We'll look at progressively larger sections until we find one that has all the characters we need. We then remember the smallest section we found that meets the requirement and continue until no more sections are possible.

Here's how the algorithm would work step-by-step:

  1. Start by considering sections containing only the first character of the main text.
  2. Next, look at sections containing the first two characters, then the first three, and so on, checking each section to see if it contains all the required characters.
  3. If a section contains all the required characters, note its size and the section itself.
  4. Continue this process, shifting the starting point of the section to the second character of the text, and again examining sections of increasing length.
  5. Repeat this process of shifting the starting point and checking sections of increasing length until you've looked at all possible sections of the main text.
  6. Finally, compare all the sections you found that contain all the required characters, and identify the smallest one. This is your answer.

Code Implementation

def min_window_substring_brute_force(text_string, characters_to_include):
    minimum_window = ""
    minimum_window_length = float('inf')

    for starting_index in range(len(text_string)):
        for ending_index in range(starting_index, len(text_string)):
            substring = text_string[starting_index:ending_index + 1]

            # Verify this substring contains all target chars
            if all_characters_present(substring, characters_to_include):

                # Keep track of the min valid window
                current_window_length = len(substring)

                if current_window_length < minimum_window_length:
                    minimum_window_length = current_window_length
                    minimum_window = substring

    return minimum_window

def all_characters_present(substring, characters_to_include):
    for character in characters_to_include:
        if character not in substring:
            return False

    return True

Big(O) Analysis

Time Complexity
O(n³)The outer loop iterates n times, once for each starting position of the substring within the main text of length n. The inner loop also iterates up to n times, defining the length of the substring being considered. For each substring, we iterate up to n times to check if it contains all the required characters from the target string. Thus, the total operations approximate n * n * n which simplifies to O(n³).
Space Complexity
O(1)The brute force approach, as described, iteratively checks substrings without using auxiliary data structures that scale with the input size. It primarily involves comparing substrings of the main text with the required characters. The algorithm does not mention storing intermediate substrings or character counts in extra memory. Therefore, the space complexity is dominated by a few constant-size variables used for indexing and comparison, resulting in O(1) space complexity.

Optimal Solution

Approach

We need to find the smallest part of a larger piece of text that contains all the characters from a specific set of characters. Instead of checking every possible part of the larger text, we cleverly slide a window along the text, expanding or shrinking it as needed to find the minimum sized part that meets our requirements.

Here's how the algorithm would work step-by-step:

  1. First, we figure out how many times each character appears in the set of required characters.
  2. Then, we start a window at the beginning of the larger text and expand it to the right until it contains all the required characters at least as many times as needed.
  3. Once we have a valid window (meaning it contains all the required characters), we try to shrink the window from the left to make it as small as possible while still containing all the required characters.
  4. While shrinking, we keep track of the smallest valid window we've found so far.
  5. After we can't shrink the window anymore, we expand it again to the right to find the next valid window.
  6. We keep expanding and shrinking the window, always remembering the smallest valid window, until we reach the end of the larger text.
  7. Finally, the smallest valid window we found is our answer.

Code Implementation

def minimum_window_substring(text_string, characters_to_include):
    if not text_string or not characters_to_include:
        return ""

    required_character_counts = {}
    for character in characters_to_include:
        required_character_counts[character] = required_character_counts.get(character, 0) + 1

    window_start = 0
    window_end = 0
    formed = 0
    window_character_counts = {}
    minimum_window_length = float('inf')
    minimum_window_start = 0

    while window_end < len(text_string):
        character = text_string[window_end]
        window_character_counts[character] = window_character_counts.get(character, 0) + 1

        if character in required_character_counts and \
           window_character_counts[character] == required_character_counts[character]:
            formed += 1

        # Shrink the window while all characters are included.
        while formed == len(required_character_counts):
            
            if window_end - window_start + 1 < minimum_window_length:
                minimum_window_length = window_end - window_start + 1
                minimum_window_start = window_start

            character = text_string[window_start]
            window_character_counts[character] -= 1

            # Decrement 'formed' only if the char is required and its count drops.
            if character in required_character_counts and \
               window_character_counts[character] < required_character_counts[character]:
                formed -= 1

            window_start += 1

        window_end += 1

    # If no valid window was found.
    if minimum_window_length == float('inf'):
        return ""
    else:
        # Return the minimum window substring.
        return text_string[minimum_window_start : minimum_window_start + minimum_window_length]

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the string 's' using two pointers, 'left' and 'right', effectively sliding a window across it. The 'right' pointer expands the window, and the 'left' pointer shrinks it. Each pointer moves from left to right at most once, covering the entire string. Therefore, the number of operations is proportional to the length of the string 's', denoted as 'n', leading to a time complexity of O(n).
Space Complexity
O(1)The algorithm uses a fixed number of hash maps to store character frequencies, and these hash maps can store at most the number of unique characters in the target string `t`. Since the number of unique characters is independent of the length of the input string `s`, the space used by these hash maps is constant. In addition, a few integer variables are used to track the window's start, end, and required character counts, but their space usage is also constant. Therefore, the overall auxiliary space complexity is O(1).

Edge Cases

CaseHow to Handle
Empty string s or tReturn an empty string immediately since a substring cannot be formed.
String t is longer than string sReturn an empty string immediately as t cannot be a substring of s.
String t contains characters not present in string sThe algorithm should return an empty string since a substring fulfilling the requirement is impossible.
String s and t are identicalReturn s itself as it is the minimum window.
String t contains duplicate characters and s doesn't have enough of themThe algorithm will correctly determine that no substring exists and return an empty string.
String s is very long and contains multiple valid substrings of tThe sliding window approach efficiently finds the minimum window among all valid substrings, and scales linearly with s length.
String t contains all the same characters.The algorithm should find the smallest contiguous block of those characters in s.
String s or t contain unicode or extended ASCII characters.Ensure the character frequency maps support the full character set, often requiring using wider character types or libraries.