Taro Logo

Count Unique Characters of All Substrings of a Given String

Hard
Amazon logo
Amazon
1 view
Topics:
StringsDynamic Programming

Let's define a function countUniqueChars(s) that returns the number of unique characters in s. A unique character is one that appears only once in the string.

For example, if s = "LEETCODE" then countUniqueChars(s) = 5 because the unique characters are "L", "T", "C", "O", and "D".

Given a string s, return the sum of countUniqueChars(t) where t is a substring of s. The test cases are generated such that the answer fits in a 32-bit integer.

Note that some substrings can be repeated, so you have to count the repeated ones too.

Example 1:

Input: s = "ABC" Output: 10 Explanation: All possible substrings are: "A","B","C","AB","BC" and "ABC". Every substring is composed of only unique letters. The sum of lengths of all substrings is 1 + 1 + 1 + 2 + 2 + 3 = 10

Example 2:

Input: s = "ABA" Output: 8 Explanation: The same as example 1, except countUniqueChars("ABA") = 1.

Example 3:

Input: s = "LEETCODE" Output: 92

Constraints:

  • 1 <= s.length <= 10^5
  • s consists of uppercase English letters only.

Solution


Let's analyze the problem of counting unique characters in substrings.

Naive Approach

A straightforward approach is to generate all possible substrings of the given string s, then for each substring, count the number of unique characters. Summing these counts will give us the final answer.

Code (Python):

def countUniqueChars(s):
    counts = {}
    unique_count = 0
    for char in s:
        counts[char] = counts.get(char, 0) + 1
    for char in counts:
        if counts[char] == 1:
            unique_count += 1
    return unique_count

def solveNaive(s):
    n = len(s)
    total_unique_chars = 0
    for i in range(n):
        for j in range(i, n):
            substring = s[i:j+1]
            total_unique_chars += countUniqueChars(substring)
    return total_unique_chars

# Example usage:
s = "LEETCODE"
print(solveNaive(s)) # Output: 92

Time Complexity: O(n^3), where n is the length of the string. This is because generating all substrings takes O(n^2) time, and counting unique characters in each substring takes O(n) time.

Space Complexity: O(n), where n is the length of the string, because in the worst case, we might store all the characters of a substring in a hash map.

Optimal Approach

A more efficient approach involves considering each character in the string and calculating its contribution to the total count of unique characters in all substrings.

For each character s[i], we need to find how many substrings exist where s[i] appears exactly once. Let left be the index of the previous occurrence of s[i] in the string (or -1 if it's the first occurrence) and right be the index of the next occurrence of s[i] in the string (or n if it's the last occurrence).

The number of substrings in which s[i] appears exactly once is (i - left) * (right - i). This is because there are (i - left) choices for the start index of the substring (from left + 1 to i) and (right - i) choices for the end index of the substring (from i to right - 1).

Code (Python):

def solveOptimal(s):
    n = len(s)
    left = [-1] * n
    right = [n] * n
    last_occurrence = {}

    # Calculate left boundaries
    for i in range(n):
        if s[i] in last_occurrence:
            left[i] = last_occurrence[s[i]]
        last_occurrence[s[i]] = i

    last_occurrence = {}

    # Calculate right boundaries
    for i in range(n - 1, -1, -1):
        if s[i] in last_occurrence:
            right[i] = last_occurrence[s[i]]
        last_occurrence[s[i]] = i

    total_unique_chars = 0
    for i in range(n):
        total_unique_chars += (i - left[i]) * (right[i] - i)

    return total_unique_chars

# Example usage:
s = "LEETCODE"
print(solveOptimal(s)) # Output: 92

Time Complexity: O(n), where n is the length of the string. We iterate through the string a few times, but each iteration takes O(n) time.

Space Complexity: O(n), where n is the length of the string. This is because we store the left and right arrays, each of size n.

Edge Cases

  • Empty string: If the input string is empty, the function should return 0.
  • String with one character: If the string has only one character, the function should return 1.
  • String with all same characters: The algorithm handles this correctly, as the contribution of each character will be calculated based on its left and right boundaries.
  • Large input string: The algorithm is designed to handle large input strings efficiently with O(n) time complexity.