Taro Logo

Number of Substrings Containing All Three Characters

Medium
Meta logo
Meta
Topics:
StringsTwo PointersSliding Windows

Given a string s consisting only of characters a, b and c.

Return the number of substrings containing at least one occurrence of all these characters a, b and c.

For example:

  • s = "abcabc" should return 10. The substrings are "abc", "abca", "abcab", "abcabc", "bca", "bcab", "bcabc", "cab", "cabc" and "abc".
  • s = "aaacb" should return 3. The substrings are "aaacb", "aacb" and "acb".
  • s = "abc" should return 1.

Can you come up with an efficient solution?

Solution


Naive Approach

A brute force solution would involve generating all possible substrings of the given string s and then checking each substring to see if it contains at least one occurrence of each of the characters 'a', 'b', and 'c'.

Algorithm:

  1. Iterate through all possible start indices of the substring (from 0 to n-1).
  2. For each start index, iterate through all possible end indices (from start to n-1).
  3. Extract the substring.
  4. Check if the substring contains 'a', 'b', and 'c'.
  5. If it does, increment the counter.
  6. Return the counter.

Code:

def count_substrings_naive(s):
    n = len(s)
    count = 0
    for i in range(n):
        for j in range(i, n):
            sub = s[i:j+1]
            if 'a' in sub and 'b' in sub and 'c' in sub:
                count += 1
    return count

Time Complexity: O(n^3)

The outer loops iterate n times each, resulting in O(n^2). The check for 'a', 'b', and 'c' within the inner loop takes O(n) in the worst case, giving a total complexity of O(n^3).

Space Complexity: O(1)

The space complexity is O(1) as we are only using a constant amount of extra space.

Optimal Approach

A more efficient approach utilizes the sliding window technique.

Algorithm:

  1. Initialize two pointers, left and right, to 0.
  2. Initialize a counter to 0.
  3. Expand the right pointer until the window [left, right] contains at least one 'a', 'b', and 'c'.
  4. Once a valid window is found, increment the counter by n - right. This is because all substrings starting from left and ending at or after right will also be valid.
  5. Move the left pointer to the right until the window is no longer valid (i.e., it doesn't contain at least one 'a', 'b', and 'c').
  6. Repeat steps 3-5 until the right pointer reaches the end of the string.
  7. Return the counter.

Code:

def count_substrings(s):
    n = len(s)
    count = 0
    left = 0
    freq = {'a': 0, 'b': 0, 'c': 0}
    required = 3
    formed = 0

    for right in range(n):
        char = s[right]
        freq[char] += 1
        if freq[char] == 1:
            formed += 1

        while formed == required:
            count += n - right
            char = s[left]
            freq[char] -= 1
            if freq[char] == 0:
                formed -= 1
            left += 1

    return count

Time Complexity: O(n)

The right pointer iterates through the string once. The left pointer may iterate multiple times, but in total, it will not iterate more than n times. Therefore, the time complexity is O(n).

Space Complexity: O(1)

We use a dictionary freq to store the frequencies of the characters 'a', 'b', and 'c'. The size of this dictionary is constant regardless of the size of the input string. Thus, the space complexity is O(1).

Edge Cases:

  • Empty string: The problem statement specifies that the string length will be at least 3, so we don't need to worry about empty string edge case.
  • String with length less than 3: Similarly, the minimum length of string is 3, so it is not a concern.
  • String contains characters other than 'a', 'b', or 'c': The problem statement states that the string will only contain characters 'a', 'b', or 'c', so we don't have to validate the input.
  • String doesn't contain one or more of the characters 'a', 'b', and 'c': The algorithm correctly handles the cases where one or more characters is not present. It returns 0 in these cases since no substring can contain all three characters.