Taro Logo

Number of Substrings Containing All Three Characters

Medium
a month ago

Given a string s consisting only of characters a, b and c.

Return the number of substrings containing at least one occurrence of all these characters a, b and c.

Example 1:

Input: s = "abcabc"
Output: 10
Explanation: The substrings containing at least one occurrence of the characters a, b and c are "abc", "abca", "abcab", "abcabc", "bca", "bcab", "bcabc", "cab", "cabc" and "abc" (again).

Example 2:

Input: s = "aaacb"
Output: 3
Explanation: The substrings containing at least one occurrence of the characters a, b and c are "aaacb", "aacb" and "acb".

Example 3:

Input: s = "abc"
Output: 1

Constraints:

  • 3 <= s.length <= 5 x 10^4
  • s only consists of a, b or c characters.
Sample Answer

Problem Description

Given a string s consisting only of characters a, b and c, the task is to return the number of substrings containing at least one occurrence of all these characters a, b and c.

Naive Solution

The brute-force solution involves generating all possible substrings of the given string and checking if each substring contains at least one occurrence of each character 'a', 'b', and 'c'.

def count_substrings_naive(s):
    n = len(s)
    count = 0
    for i in range(n):
        for j in range(i, n):
            sub = s[i:j+1]
            if 'a' in sub and 'b' in sub and 'c' in sub:
                count += 1
    return count

Optimal Solution

A more efficient approach uses a sliding window technique. We maintain a window and expand it to the right until it contains at least one 'a', 'b', and 'c'. Then, we shrink the window from the left until one of the required characters is no longer present. For each valid window, we can calculate the number of substrings that include that window.

def count_substrings(s):
    n = len(s)
    count = 0
    left = 0
    freq = {'a': 0, 'b': 0, 'c': 0}
    
    for right in range(n):
        freq[s[right]] += 1
        
        while freq['a'] > 0 and freq['b'] > 0 and freq['c'] > 0:
            count += (n - right)
            freq[s[left]] -= 1
            left += 1
            
    return count

Example

For s = "abcabc":

  1. Initially, left = 0, freq = {'a': 0, 'b': 0, 'c': 0}.
  2. right iterates from 0 to 5.
  3. When right = 2, freq = {'a': 1, 'b': 1, 'c': 1}. The condition is met.
  4. count += (6 - 2) = 4. freq becomes {'a': 0, 'b': 1, 'c': 1}, and left becomes 1.
  5. When right = 3, freq = {'a': 1, 'b': 1, 'c': 1}. The condition is met.
  6. count += (6 - 3) = 3. freq becomes {'a': 1, 'b': 0, 'c': 1}, and left becomes 2.
  7. When right = 4, freq = {'a': 1, 'b': 1, 'c': 1}. The condition is met.
  8. count += (6 - 4) = 2. freq becomes {'a': 1, 'b': 1, 'c': 0}, and left becomes 3.
  9. When right = 5, freq = {'a': 1, 'b': 1, 'c': 1}. The condition is met.
  10. count += (6 - 5) = 1. freq becomes {'a': 1, 'b': 1, 'c': 0}, and left becomes 4.

The final count is 4 + 3 + 2 + 1 = 10.

Big(O) Runtime Analysis

The optimal solution has a time complexity of O(n), where n is the length of the string. This is because we iterate through the string once with the right pointer and, in the worst case, the left pointer also iterates through the string once. The operations inside the loops are constant time operations.

The naive solution has a time complexity of O(n^3) because we are generating all possible substrings which takes O(n^2) time and for each substring we are checking for presence of a, b, c which takes O(n) time in the worst case.

Big(O) Space Usage Analysis

The optimal solution uses a constant amount of extra space for the frequency map freq, which stores the counts of characters 'a', 'b', and 'c'. Therefore, the space complexity is O(1).

The naive solution has space complexity of O(n) since we are storing the substring of length n in the worst case.

Edge Cases

  1. Empty String: If the input string is empty, the function should return 0.
  2. String with Length Less Than 3: If the string length is less than 3, it cannot contain all three characters, so the function should return 0.
  3. String Containing Only One or Two Unique Characters: If the string contains only one or two of the required characters ('a', 'b', 'c'), the function should return 0.

These edge cases are handled correctly by the optimal solution, as the while loop condition will never be met if any of the characters are missing, and count will remain 0.