Taro Logo

Sum of Beauty of All Substrings

#421 Most AskedMedium
Topics:
StringsArraysSliding Windows

The beauty of a string is the difference in frequencies between the most frequent and least frequent characters.

  • For example, the beauty of "abaacc" is 3 - 1 = 2.

Given a string s, return the sum of beauty of all of its substrings.

Example 1:

Input: s = "aabcb"
Output: 5
Explanation: The substrings with non-zero beauty are ["aab","aabc","aabcb","abcb","bcb"], each with beauty equal to 1.

Example 2:

Input: s = "aabcbaa"
Output: 17

Constraints:

  • 1 <= s.length <= 500
  • s consists of only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string `s`?
  2. Can the input string `s` contain characters other than lowercase English letters?
  3. Could the input string `s` be empty or null?
  4. Could you define the term 'beauty' of a string explicitly, especially how the maximum and minimum frequency characters are determined when there are multiple characters with the same frequency?
  5. If the string `s` consists of only one unique character, what would be the beauty of that string?

Brute Force Solution

Approach

To find the sum of beauty of all substrings, we'll look at every possible substring of the given string. For each substring, we'll calculate its beauty, and then add it to a running total.

Here's how the algorithm would work step-by-step:

  1. Start with the first letter of the string. Consider it as a substring all by itself.
  2. Calculate the 'beauty' of this single-letter substring. 'Beauty' is determined by some rules involving the counts of the most and least frequent characters.
  3. Add this beauty value to your running total.
  4. Next, consider the first two letters as a substring. Calculate its beauty.
  5. Add this beauty value to the running total.
  6. Continue extending the substring from the beginning, one letter at a time, calculating the beauty and adding it to the total each time.
  7. Once you've considered all substrings starting from the first letter, move on to the second letter and repeat the process.
  8. Keep doing this until you've considered all possible substrings that begin at each possible starting position within the original string.
  9. The final running total will be the sum of the beauty values of all substrings.

Code Implementation

def sum_of_beauty_of_all_substrings(input_string):
    total_beauty = 0

    for starting_index in range(len(input_string)):
        for ending_index in range(starting_index, len(input_string)):
            substring = input_string[starting_index:ending_index + 1]
            
            # Calculate the beauty of the current substring
            total_beauty += calculate_beauty(substring)

    return total_beauty

def calculate_beauty(substring):
    if not substring:
        return 0

    character_counts = {}
    for char in substring:
        character_counts[char] = character_counts.get(char, 0) + 1

    min_frequency = float('inf')
    max_frequency = float('-inf')
    
    # Find the minimum and maximum character frequencies
    for char in character_counts:
        frequency = character_counts[char]
        min_frequency = min(min_frequency, frequency)
        max_frequency = max(max_frequency, frequency)

    # Beauty is the difference between max and min frequencies
    return max_frequency - min_frequency

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible substrings of the given string of length n. The outer loop iterates from the starting position of the substring (0 to n-1), and the inner loop iterates to extend the substring to the end of the string. Within each substring, the beauty is calculated which involves finding the frequency of each character. Finding the beauty of each substring involves looping through the substring to count characters and find the minimum and maximum frequencies which takes O(n) time. Since we have nested loops to generate each substring which has complexity O(n^2), and the beauty calculation of each substring takes O(n), the overall complexity is O(n^3).
Space Complexity
O(1)The algorithm iterates through substrings, calculating beauty on the fly and adding it to a running total. No auxiliary data structures of significant size are created to store intermediate substrings or character counts across all substrings. The beauty calculation for each substring likely involves temporary variables to track character frequencies, but this usage is constant for each substring and doesn't scale with the input size, N, the length of the original string. Therefore, the space complexity is O(1).

Optimal Solution

Approach

The efficient solution avoids checking every substring individually. Instead, it concentrates on figuring out the frequency of each character within each substring, and cleverly calculates the 'beauty' score based on these frequencies without repetitive recalculations.

Here's how the algorithm would work step-by-step:

  1. For every possible substring within the given string, we need to calculate a 'beauty' score.
  2. To calculate the beauty score of a substring, first count how many times each letter of the alphabet appears in that substring.
  3. Find the most frequent letter in the substring and the least frequent letter in the substring.
  4. The beauty score for that substring is the difference between the count of the most frequent letter and the count of the least frequent letter.
  5. To avoid recalculating the letter counts for overlapping substrings, we use a strategy where we consider all substrings starting at a specific position in the main string and extending to different lengths.
  6. As we extend a substring by one character, we update the letter frequency counts from the previous substring. This allows us to quickly compute the beauty score for the newly extended substring based on the previous substring's counts, instead of starting from scratch.
  7. Sum all the beauty scores of all the possible substrings to get the final answer.

Code Implementation

def sum_of_beauty_of_all_substrings(input_string):
    total_beauty_score = 0
    string_length = len(input_string)

    for left in range(string_length):
        character_counts = {}
        for right in range(left, string_length):
            current_character = input_string[right]

            if current_character in character_counts:
                character_counts[current_character] += 1
            else:
                character_counts[current_character] = 1

            # Find the maximum and minimum frequency of characters in substring
            max_frequency = 0
            min_frequency = string_length

            for char in character_counts:
                current_frequency = character_counts[char]
                max_frequency = max(max_frequency, current_frequency)
                min_frequency = min(min_frequency, current_frequency)

            #Beauty score is the difference between the most and least frequent character
            substring_beauty = max_frequency - min_frequency
            total_beauty_score += substring_beauty

    return total_beauty_score

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through all possible substrings of the input string. There is an outer loop that iterates 'n' times (where 'n' is the length of the string) to select the starting position of the substring. The inner process extends each substring from that starting position, iterating up to 'n' times in the worst case. The frequency counting and min/max operations within the inner loop take constant time, but the dominant cost is the nested loop structure that generates all possible substrings leading to approximately n * n/2 calculations. Therefore the time complexity simplifies to O(n²).
Space Complexity
O(1)The algorithm uses a frequency count array (or similar data structure) to store the frequency of each character within a substring. Since the number of possible characters is fixed (26 for lowercase English letters), the size of this array remains constant regardless of the input string's length, N. No other data structures are used that scale with the input size. Therefore, the auxiliary space complexity is constant.

Edge Cases

Empty string input
How to Handle:
Return 0 immediately as there are no substrings to process.
String with a single character
How to Handle:
Return 0 since the min and max frequency will be the same for the only character in the substring.
String with all identical characters (e.g., 'aaaa')
How to Handle:
The beauty of each substring will be 0, as the min and max frequencies will be equal.
String with highly skewed character distribution (e.g., 'aaaaabc')
How to Handle:
The algorithm should correctly calculate min/max frequencies based on the actual character counts.
Very long string input to test performance
How to Handle:
Ensure the time complexity is O(n^2) where n is string length and avoid unnecessary computations.
String with a wide range of Unicode characters
How to Handle:
The frequency count array needs to be large enough to accommodate all possible characters, or a hash map can be used.
String contains only two distinct characters repeated many times (e.g., 'ababab')
How to Handle:
Test correct min/max frequency calculations for all possible substrings with only two characters.
String where one character appears only once, and all other characters appear many times.
How to Handle:
The minimum frequency will always be 1, affecting the beauty calculation.
0/1037 completed