Minimum Deletions to Make String Balanced Interview Question for Amazon

Problem Description

You are given a string s consisting only of characters 'a' and 'b'. You can delete any number of characters in s to make s balanced. s is balanced if there is no pair of indices (i,j) such that i < j and s[i] = 'b' and s[j]= 'a'. Return the minimum number of deletions needed to make s balanced.

Example 1:

Input: s = "aababbab"
Output: 2
Explanation: You can either:
Delete the characters at 0-indexed positions 2 and 6 ("aa<u>b</u>abb<u>a</u>b" -> "aaabbb"), or
Delete the characters at 0-indexed positions 3 and 6 ("aab<u>a</u>bb<u>a</u>b" -> "aabbbb").

Example 2:

Input: s = "bbaaaaabb"
Output: 2
Explanation: The only solution is to delete the first two characters.

Brute Force Solution

A brute-force solution would involve generating all possible subsequences of the input string s and checking if each subsequence is balanced. We then find the balanced subsequence with the maximum length, and the minimum number of deletions would be the difference between the original string's length and the maximum balanced subsequence's length.

Here's a conceptual Python implementation:

def is_balanced(s):
    for i in range(len(s)):
        for j in range(i + 1, len(s)):
            if s[i] == 'b' and s[j] == 'a':
                return False
    return True

def brute_force_min_deletions(s):
    n = len(s)
    max_len = 0

    for i in range(1 << n): # Iterate through all possible subsequences
        subsequence = ""
        for j in range(n):
            if (i >> j) & 1:
                subsequence += s[j]

        if is_balanced(subsequence):
            max_len = max(max_len, len(subsequence))

    return n - max_len

# Example Usage
s = "aababbab"
print(brute_force_min_deletions(s)) # Output: 2

However, this brute-force approach is highly inefficient due to the exponential number of subsequences it needs to check (2^n). It's not suitable for larger strings, hence the need for a more optimized solution.

Optimal Solution (Dynamic Programming)

We can solve this problem more efficiently using dynamic programming. The key idea is to consider two cases for each character in the string s:

Keep the character.
Delete the character.

We maintain a DP table dp[i][j] where:

i represents the index of the character in s (from 0 to n-1).
j represents the state: 0 means we haven't encountered any 'b' yet, and 1 means we have encountered at least one 'b'.
dp[i][j] stores the minimum number of deletions needed to make the substring s[0...i] balanced given the state j.

Here's the Python code:

def min_deletions(s):
    n = len(s)
    dp = [[0, 0] for _ in range(n + 1)]

    for i in range(1, n + 1):
        if s[i-1] == 'a':
            # If the current char is 'a',
            # We can either keep it, which means the 'b' encountered state remains the same,
            # Or delete it, which increases the deletion count.
            dp[i][0] = dp[i-1][0]
            dp[i][1] = min(dp[i-1][1], dp[i-1][0])
        else:
            # If the current char is 'b',
            # If we are in state 0 (no 'b' encountered yet), we transition to state 1.
            # Or we can delete the 'b'.
            dp[i][0] = dp[i-1][0] + 1
            dp[i][1] = dp[i-1][1] + 1

        dp[i][0] = min(dp[i][0], dp[i-1][0] + 1)
        dp[i][1] = min(dp[i][1], dp[i-1][1] + 1)

    return min(dp[n][0], dp[n][1])


# Example Usage
s = "aababbab"
print(min_deletions(s))  # Output: 2

s = "bbaaaaabb"
print(min_deletions(s)) # Output: 2

Explanation

Initialization: dp[0][0] = dp[0][1] = 0 (empty string is balanced with 0 deletions).
Iteration: For each character s[i], we update dp[i][0] and dp[i][1] based on the following logic:
- If s[i] == 'a':
  - dp[i][0] = dp[i-1][0] (keep 'a', no 'b' encountered before)
  - dp[i][1] = min(dp[i-1][0], dp[i-1][1]) (keep 'a', 'b' might have been encountered before)
- If s[i] == 'b':
  - dp[i][0] = dp[i-1][0] + 1 (keep 'b', but it must be deleted. Also, no 'b' encountered before).
  - dp[i][1] = dp[i-1][1] + 1 (keep 'b', but it must be deleted. 'b' encountered before).
Result: min(dp[n][0], dp[n][1]) gives the minimum deletions needed for the entire string s.

Space-Optimized Solution

We can further optimize the space complexity by realizing that dp[i] only depends on dp[i-1]. Therefore, we can maintain only two variables to store the previous and current DP values.

def min_deletions_optimized(s):
    n = len(s)
    no_b = 0  # Minimum deletions if no 'b' has been encountered
    has_b = 0  # Minimum deletions if 'b' has been encountered

    for char in s:
        if char == 'a':
            has_b = min(no_b, has_b)
        else:
            no_b += 1
            has_b += 1

        no_b = min(no_b, has_b) #Crucial step for cases like "bbbbba"

    return min(no_b, has_b)

# Example Usage
s = "aababbab"
print(min_deletions_optimized(s))  # Output: 2

s = "bbaaaaabb"
print(min_deletions_optimized(s)) # Output: 2

Time Complexity Analysis

The optimal solution (both the DP and space-optimized versions) iterates through the string once. Therefore, the time complexity is O(n), where n is the length of the string s.

Space Complexity Analysis

DP Solution: The DP solution uses a 2D table of size (n+1) x 2, where n is the length of the string. Thus, the space complexity is O(n).
Space-Optimized Solution: The space-optimized solution only uses a constant number of variables ( no_b and has_b). Therefore, the space complexity is O(1).

Edge Cases

Empty String: If the input string is empty, the function should return 0 because an empty string is considered balanced.
String with only 'a's: If the input string contains only 'a's (e.g., "aaaa"), the function should return 0 because it's already balanced.
String with only 'b's: If the input string contains only 'b's (e.g., "bbbb"), the function should return 0 because it's already balanced.
String with alternating 'a's and 'b's: If the input string alternates between 'a's and 'b's (e.g., "ababab"), the function should return the number of 'b's that precede an 'a', or vice versa, whichever is smaller. In this case, dynamic programming provides the correct solution.
String with all 'b's followed by all 'a's: For strings like "bbbbba", there needs to be a crucial step that ensures either all leading b's are removed or all trailing a's are removed. The space-optimized code handles this with no_b = min(no_b, has_b) after processing each character.