Taro Logo

Count Unique Characters of All Substrings of a Given String

Hard
Asked by:
Profile picture
Profile picture
44 views
Topics:
Strings

Let's define a function countUniqueChars(s) that returns the number of unique characters in s.

  • For example, calling countUniqueChars(s) if s = "LEETCODE" then "L", "T", "C", "O", "D" are the unique characters since they appear only once in s, therefore countUniqueChars(s) = 5.

Given a string s, return the sum of countUniqueChars(t) where t is a substring of s. The test cases are generated such that the answer fits in a 32-bit integer.

Notice that some substrings can be repeated so in this case you have to count the repeated ones too.

Example 1:

Input: s = "ABC"
Output: 10
Explanation: All possible substrings are: "A","B","C","AB","BC" and "ABC".
Every substring is composed with only unique letters.
Sum of lengths of all substring is 1 + 1 + 1 + 2 + 2 + 3 = 10

Example 2:

Input: s = "ABA"
Output: 8
Explanation: The same as example 1, except countUniqueChars("ABA") = 1.

Example 3:

Input: s = "LEETCODE"
Output: 92

Constraints:

  • 1 <= s.length <= 105
  • s consists of uppercase English letters only.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string, and are there any specific performance requirements?
  2. Is the input string guaranteed to contain only ASCII characters, or can it include Unicode characters?
  3. By 'substring', do you mean a contiguous sequence of characters? For example, is 'abc' a substring of 'abcdefg'?
  4. If the input string is empty, what should the function return?
  5. Are we concerned about integer overflow when summing the counts of unique characters across all substrings?

Brute Force Solution

Approach

The brute force strategy for counting unique characters in substrings involves looking at every possible piece of the main string. For each of those pieces, we figure out how many different characters it contains, then add up those counts from all the pieces.

Here's how the algorithm would work step-by-step:

  1. First, consider every possible starting position in the string.
  2. For each starting position, consider every possible ending position that comes after it. This gives you all the possible substrings.
  3. For each substring you find, identify all the unique characters that appear in it. A character is unique if it appears one or more times in the substring but we only count it once.
  4. Count how many unique characters you found in that substring.
  5. Add that count to a running total.
  6. Repeat the process of finding substrings, counting unique characters, and adding to the total, until you have considered every possible substring of the original string.
  7. The final total is the answer: the sum of unique characters in all the substrings.

Code Implementation

def count_unique_characters_of_all_substrings(given_string):
    string_length = len(given_string)
    total_unique_character_count = 0

    for starting_index in range(string_length):
        for ending_index in range(starting_index, string_length):
            substring = given_string[starting_index : ending_index+1]

            # Create an empty set to track the unique characters
            unique_characters = set()

            for char_index in range(len(substring)):

                # Add the character to our set of unique characters.
                unique_characters.add(substring[char_index])

            # Add the unique character count of the substring to the running total
            total_unique_character_count += len(unique_characters)

    return total_unique_character_count

Big(O) Analysis

Time Complexity
O(n³)The algorithm iterates through all possible substrings of the input string. The outer loop iterates 'n' times, where 'n' is the length of the string, defining the start of the substring. The inner loop also iterates 'n' times, defining the end of the substring. For each substring, we iterate through the substring to identify and count the unique characters, which takes O(n) time in the worst case (when all characters are unique). Therefore, the overall time complexity is O(n * n * n) which simplifies to O(n³).
Space Complexity
O(N)The algorithm iterates through all possible substrings of the input string. For each substring, it identifies unique characters. This requires, at worst, storing all N characters of the substring in a set or hash map to track uniqueness. The space used by the algorithm grows linearly with the input string length. Therefore, the space complexity is O(N), where N is the length of the input string.

Optimal Solution

Approach

The efficient approach cleverly focuses on individual characters within the string. For each character, it determines how many substrings it is a unique character in by considering its position and the positions of other identical characters around it. By summing these counts, we get the total unique character count across all substrings.

Here's how the algorithm would work step-by-step:

  1. Consider each character in the string one at a time.
  2. For the character being considered, find the positions of the same character before and after it in the string.
  3. Imagine a range where the character being considered is unique. The left end of this range is determined by the next occurrence of the same character on the left. The right end of this range is similarly determined by the next occurrence of the same character on the right.
  4. Count how many substrings contain the character and lie completely within this range. This is equivalent to finding all substrings where the character is unique. This count is obtained by multiplying the number of positions to the left by the number of positions to the right (plus 1 for each side).
  5. Add this count to a running total.
  6. Repeat this process for every character in the string.
  7. The final running total is the desired result: the total number of unique characters across all substrings.

Code Implementation

def count_unique_characters(input_string):
    total_unique_count = 0
    string_length = len(input_string)

    for index in range(string_length):
        # Find the nearest same char index to the left.
        left_index = -1
        for search_left in range(index - 1, -1, -1):
            if input_string[search_left] == input_string[index]:
                left_index = search_left
                break

        # Find the nearest same char index to the right.
        right_index = string_length
        for search_right in range(index + 1, string_length):
            if input_string[search_right] == input_string[index]:
                right_index = search_right
                break

        # Calculate substrings where char at index is unique
        number_left = index - left_index

        number_right = right_index - index

        # Calculate and accumulate unique substring counts.
        total_unique_count += number_left * number_right

    return total_unique_count

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through each of the n characters in the input string once. For each character, it finds the nearest occurrences of the same character to its left and right. These lookups can be done in constant time, assuming the positions of all characters are precomputed using a hashmap in O(n) time which doesn't affect the overall complexity. Therefore, the dominant operation is the single pass through the string, making the time complexity O(n).
Space Complexity
O(1)The algorithm primarily uses a few integer variables to store the positions of characters to the left and right of the current character being considered. These variables, like indices, consume a constant amount of memory, regardless of the length N of the input string. No auxiliary data structures that scale with the input string length are used. Therefore, the space complexity is O(1).

Edge Cases

Null or empty input string
How to Handle:
Return 0 immediately as there are no substrings to process.
String of length 1
How to Handle:
Return 1 since the string itself is the only substring and it contains one unique character.
String with all identical characters (e.g., 'aaaa')
How to Handle:
The unique character count for each substring will always be 1, so the sum depends on number of substrings.
String with all unique characters (e.g., 'abcdefg')
How to Handle:
Each substring will have a unique count equal to its length, leading to a larger sum.
String with maximum possible length (considering memory constraints)
How to Handle:
Ensure the solution's time and space complexity are efficient enough to handle potentially large strings without causing timeouts or memory errors.
String containing only two distinct characters repeated many times (e.g., 'abababab')
How to Handle:
The solution should correctly calculate unique character counts considering overlapping substrings.
String where the unique character appears only at the beginning or end
How to Handle:
Ensure the calculation correctly accounts for contributions from substrings containing that character.
Integer overflow when calculating the sum of unique character counts
How to Handle:
Use a data type with a larger range (e.g., long) to store the sum to prevent integer overflow.