Taro Logo

Number of Substrings Containing All Three Characters

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+4
More companies
Profile picture
Profile picture
Profile picture
Profile picture
69 views
Topics:
StringsTwo PointersSliding Windows

Given a string s consisting only of characters a, b and c.

Return the number of substrings containing at least one occurrence of all these characters a, b and c.

Example 1:

Input: s = "abcabc"
Output: 10
Explanation: The substrings containing at least one occurrence of the characters ab and c are "abc", "abca", "abcab", "abcabc", "bca", "bcab", "bcabc", "cab", "cabc" and "abc" (again). 

Example 2:

Input: s = "aaacb"
Output: 3
Explanation: The substrings containing at least one occurrence of the characters ab and c are "aaacb", "aacb" and "acb". 

Example 3:

Input: s = "abc"
Output: 1

Constraints:

  • 3 <= s.length <= 5 x 10^4
  • s only consists of a, b or characters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string contain characters other than 'a', 'b', and 'c'?
  2. What is the expected return value if the input string doesn't contain all three characters ('a', 'b', and 'c')?
  3. Is the input string case-sensitive (should I treat 'A' the same as 'a')?
  4. What is the maximum length of the input string?
  5. By substring, do you mean a contiguous sequence of characters?

Brute Force Solution

Approach

The brute force approach to this problem is very straightforward. We will look at every possible substring of the given string and then, for each one, check if it meets our criteria of containing all three characters.

Here's how the algorithm would work step-by-step:

  1. First, consider all possible substrings of length 1. Check if each one contains all three required characters.
  2. Then, consider all possible substrings of length 2. Check if each one contains all three required characters.
  3. Continue this process, increasing the substring length by one each time, until you reach the length of the entire original string.
  4. For each substring that you checked, keep a count of the number of substrings that contain all three required characters.
  5. Once you have looked at every possible substring length, the total count is your answer.

Code Implementation

def number_of_substrings(input_string):
    string_length = len(input_string)
    substring_count = 0

    for substring_length in range(1, string_length + 1):
        for starting_index in range(string_length - substring_length + 1):
            substring = input_string[starting_index:starting_index + substring_length]

            # Check if the substring contains all three characters.
            if ('a' in substring) and ('b' in substring) and ('c' in substring):
                substring_count += 1

    # Return the total count of substrings that meet the criteria.
    return substring_count

Big(O) Analysis

Time Complexity
O(n³)The algorithm iterates through all possible substring lengths from 1 to n, where n is the length of the input string. For each length, it iterates through all possible starting positions for the substring, resulting in another loop of up to n iterations. For each substring, it checks if it contains all three required characters, which takes O(n) time in the worst case (scanning the substring). Thus, the total time complexity is O(n * n * n), which simplifies to O(n³).
Space Complexity
O(1)The brute-force approach, as described, iterates through substrings and checks each one for the presence of all three characters. The plain English explanation doesn't explicitly mention any auxiliary data structures being used to store substrings or character counts beyond the bare minimum to perform the existence check for each substring. Therefore, the space complexity is dominated by a few constant-sized variables used within the loops and conditional checks. This means the auxiliary space required is independent of the input string's length, N. Thus the space complexity is O(1).

Optimal Solution

Approach

The trick is to efficiently find the *earliest* point where we have all three characters. Then, we can count all valid substrings that *start* before or at this point, without checking every single one. We can slide the window to the right repeatedly to count all substring occurrences efficiently.

Here's how the algorithm would work step-by-step:

  1. Keep track of how many of each character you've seen.
  2. Start from the beginning of the string and move to the right until you've seen at least one of each of the three required characters.
  3. Once you've seen all three characters, you know that every substring starting from the beginning of the string up to *this* point is valid. Count them.
  4. Now, slide the start of your substring to the right. If the character you're leaving is the *last* one of that type that you saw, then you need to find a new substring.
  5. If you didn't lose a required character, then every substring starting *here* up to your current right side is valid, so count them and slide again.
  6. If you *did* lose a required character, move the *right* end of your substring forward until you see that character again. Then count the new valid substrings.
  7. Keep going until you reach the end of the string. The sum of all the counts will give you the answer.

Code Implementation

def number_of_substrings(input_string):
    string_length = len(input_string)
    count = 0
    left = 0
    character_counts = {'a': 0, 'b': 0, 'c': 0}

    for right in range(string_length):
        character_counts[input_string[right]] += 1

        # Move left until substring is not valid
        while (
            character_counts['a'] > 0
            and character_counts['b'] > 0
            and character_counts['c'] > 0
        ):
            # Count valid substrings ending at right
            count += string_length - right

            # Shrink the window from the left
            character_counts[input_string[left]] -= 1
            left += 1

    return count

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the string of length n with two pointers, 'left' and 'right'. The 'right' pointer advances until all three characters are found, and then the 'left' pointer advances while maintaining the condition of having all three characters. In the worst case, both 'left' and 'right' pointers traverse the entire string once. Therefore, the time complexity is O(n), where n is the length of the string.
Space Complexity
O(1)The algorithm primarily uses a fixed-size counter (or hash map) to track the counts of the three characters. The size of this counter is independent of the input string length, N. Additional variables are used for window indices, which take up constant space. Therefore, the auxiliary space used remains constant, regardless of the input size.

Edge Cases

Empty string input
How to Handle:
Return 0 since an empty string cannot contain all three characters.
String with length less than 3
How to Handle:
Return 0 as a string of length less than 3 cannot contain all three distinct characters.
String containing only one or two of the required characters (a, b, c)
How to Handle:
Return 0, because the substring can never contain all three distinct characters.
String with a very long length (e.g., exceeding memory limits) and high repetitions
How to Handle:
Ensure the sliding window approach uses constant space and iterates efficiently avoiding memory issues even with large input sizes, and confirm the integer count doesn't overflow.
String with all 'a's, 'b's, and 'c's clustered together at the beginning
How to Handle:
The sliding window should correctly identify the first valid substring and proceed to count the rest efficiently.
String where the last occurrence of one of the characters is near the end
How to Handle:
The sliding window should extend correctly to include the last occurrence and terminate appropriately.
String with characters other than 'a', 'b', or 'c'
How to Handle:
Ignore the other characters and only consider 'a', 'b', and 'c' for the count using conditional checks within the loop.
Integer overflow when calculating the number of valid substrings
How to Handle:
Use a 64-bit integer type (long) to store the number of substrings to avoid potential overflow issues with very large inputs.