Taro Logo

Minimum Window Substring

Hard
6 views
10 years ago

Okay, so for this question, we're going to be working on a string manipulation problem. The goal is to write a function that finds the smallest substring within a larger string that contains all the characters of a given target string. Think of it like finding the smallest 'window' in the larger string that 'covers' all the characters in the target. The substring needs to have all the target characters, but they don't necessarily need to be in the same order.

Specifically, you will be given two strings:

  • S: The main string where you will search for the substring.
  • T: The target string containing the characters that the substring must include.

Your function should return the shortest substring of S that contains all characters of T. If no such substring exists, return an empty string "".

Let's clarify with a few examples:

  1. S = "ADOBECODEBANC", T = "ABC"

    • The minimum window substring is "BANC".
  2. S = "aa", T = "aa"

    • The minimum window substring is "aa".
  3. S = "ADOBECODEBANC", T = "AABC"

    • The minimum window substring is "ADOBECODEBA". Because we need 2 'A's
  4. S = "a", T = "aa"

    • The minimum window substring is "".
  5. S = "abc", T = "d"

    • The minimum window substring is "".
  6. S = "aaaaaaaaaaaabbbbbcddddd", T = "abcdd"

  • The minimum window substring is "abbbbbcddddd"

Consider edge cases and aim for an efficient solution. Feel free to ask clarifying questions before you start coding.

Sample Answer

Minimum Window Substring

Okay, I understand the question. We're given two strings, s and t, and we need to find the smallest substring of s that contains all the characters in t. Let's start with a brute-force approach, then optimize.

1. Brute-Force Solution

The brute-force approach involves iterating through all possible substrings of s and checking if each substring contains all the characters of t.

Here's how it would work:

  1. Generate all substrings of s. This can be done using nested loops.
  2. For each substring, check if it contains all characters of t with the required frequencies. We can use a helper function to determine this.
  3. Keep track of the smallest substring that satisfies the condition.

Code Example (Python):

python def brute_force_min_window(s, t): if not s or not t: return ""

min_window = ""
min_length = float('inf')

for i in range(len(s)):
    for j in range(i, len(s)):
        substring = s[i:j+1]
        if contains_all_chars(substring, t):
            if len(substring) < min_length:
                min_length = len(substring)
                min_window = substring

return min_window

def contains_all_chars(substring, t): t_counts = {} for char in t: t_counts[char] = t_counts.get(char, 0) + 1

substring_counts = {}
for char in substring:
    substring_counts[char] = substring_counts.get(char, 0) + 1

for char, count in t_counts.items():
    if char not in substring_counts or substring_counts[char] < count:
        return False

return True

Explanation:

  • The outer loops iterate through all possible start and end indices of substrings in s.
  • The contains_all_chars function checks if a substring contains all characters of t with the necessary frequencies.

Big O Analysis:

  • Time Complexity: O(n3), where n is the length of s. We have O(n2) for generating all substrings, and O(n) in the worst case for contains_all_chars.
  • Space Complexity: O(m), where m is the length of t, due to the hash maps used to store character counts.

2. Optimal Solution (Sliding Window)

The optimal solution uses the sliding window technique to achieve a linear time complexity for most inputs.

Algorithm:

  1. Create two hash maps: t_counts to store the character counts of t, and window_counts to store the character counts of the current window in s.
  2. Initialize two pointers, left and right, to 0. These pointers define the sliding window.
  3. Initialize formed to 0. This variable keeps track of how many characters in t have their required counts satisfied in the current window.
  4. Initialize required to the number of distinct characters in t.
  5. Expand the window by moving the right pointer to the right, one character at a time.
  6. For each character added to the window, update window_counts. If the count of that character in window_counts matches its count in t_counts, increment formed.
  7. While formed == required, which means the window contains all characters of t with the required frequencies:
    • Update the minimum window if the current window is smaller than the current minimum window.
    • Shrink the window by moving the left pointer to the right, one character at a time.
    • For each character removed from the window, update window_counts. If the count of that character in window_counts falls below its count in t_counts, decrement formed.
  8. Return the minimum window found.

Code Example (Python):

python from collections import defaultdict

def min_window(s, t): if not s or not t: return ""

t_counts = defaultdict(int)
for char in t:
    t_counts[char] += 1

required = len(t_counts)
left, right = 0, 0
formed = 0
window_counts = defaultdict(int)

min_window = ""
min_length = float('inf')

while right < len(s):
    char = s[right]
    window_counts[char] += 1

    if char in t_counts and window_counts[char] == t_counts[char]:
        formed += 1

    while left <= right and formed == required:
        char = s[left]

        if right - left + 1 < min_length:
            min_length = right - left + 1
            min_window = s[left:right + 1]

        window_counts[char] -= 1
        if char in t_counts and window_counts[char] < t_counts[char]:
            formed -= 1

        left += 1

    right += 1

return min_window

Explanation:

  • t_counts stores the required frequencies of each character in t.
  • window_counts stores the frequencies of characters in the current sliding window.
  • formed keeps track of how many distinct characters in t have their required frequencies met in the current window.
  • required is the number of distinct characters in t.
  • The while loop expands and shrinks the window until the minimum window is found.

Big O Analysis:

  • Time Complexity: O(n), where n is the length of s. The left and right pointers each move at most n steps.
  • Space Complexity: O(m), where m is the number of distinct characters in s and t. In the worst case, we may need to store all characters of s and t in the hash maps.

3. Edge Cases

  • s or t is empty: Return an empty string.
  • t is longer than s: Return an empty string since it's impossible to find a substring of s that contains all characters of t.
  • s does not contain all characters of t: Return an empty string.
  • t contains duplicate characters: The algorithm should handle duplicate characters correctly by maintaining the required frequencies in the t_counts map.

4. Conclusion

The sliding window technique provides an efficient solution to the Minimum Window Substring problem. It achieves linear time complexity, making it suitable for large input strings.