Taro Logo

Substring With Largest Variance

Hard
Asked by:
Profile picture
3 views
Topics:
StringsDynamic Programming

The variance of a string is defined as the largest difference between the number of occurrences of any 2 characters present in the string. Note the two characters may or may not be the same.

Given a string s consisting of lowercase English letters only, return the largest variance possible among all substrings of s.

A substring is a contiguous sequence of characters within a string.

Example 1:

Input: s = "aababbb"
Output: 3
Explanation:
All possible variances along with their respective substrings are listed below:
- Variance 0 for substrings "a", "aa", "ab", "abab", "aababb", "ba", "b", "bb", and "bbb".
- Variance 1 for substrings "aab", "aba", "abb", "aabab", "ababb", "aababbb", and "bab".
- Variance 2 for substrings "aaba", "ababbb", "abbb", and "babb".
- Variance 3 for substring "babbb".
Since the largest possible variance is 3, we return it.

Example 2:

Input: s = "abcde"
Output: 0
Explanation:
No letter occurs more than once in s, so the variance of every substring is 0.

Constraints:

  • 1 <= s.length <= 104
  • s consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible characters in the input string, and is the input case-sensitive?
  2. Is an empty string a valid input, and if so, what should the returned variance be?
  3. If no substring with at least two distinct characters exists, what should the returned value be?
  4. Can the same character be considered as both the major and minor character within a single substring?
  5. By 'variance,' do you mean the maximum difference between the counts of two distinct characters in any substring of the given string?

Brute Force Solution

Approach

We're going to find the largest difference between how often two letters appear in any part of a given piece of text. The simplest way to do this is to look at every possible piece of the text and compare the counts of all possible letter pairs.

Here's how the algorithm would work step-by-step:

  1. First, consider every possible starting point in the text.
  2. From each of those starting points, consider every possible ending point later in the text.
  3. For each piece of text defined by those start and end points, count how many times each letter of the alphabet shows up.
  4. Compare every possible pair of letters to find the biggest difference in their counts within that specific piece of text.
  5. Remember the largest difference you find among all the pairs of letters for that specific piece of text.
  6. Repeat this process for every possible piece of text (every combination of start and end points).
  7. Finally, after checking all possible pieces of text, pick the overall largest difference that you found.

Code Implementation

def largest_variance_brute_force(text):
    maximum_variance = 0
    text_length = len(text)

    for start_index in range(text_length):
        for end_index in range(start_index, text_length):
            sub_string = text[start_index : end_index+1]

            # Create a dictionary to count character occurrences
            character_counts = {}
            for char in sub_string:
                character_counts[char] = character_counts.get(char, 0) + 1

            # Find the variance between each character pair
            for first_char in character_counts:
                for second_char in character_counts:
                    first_char_count = character_counts.get(first_char, 0)
                    second_char_count = character_counts.get(second_char, 0)

                    #Ensure both characters appear in the substring
                    if first_char_count > 0 and second_char_count > 0:
                        variance = abs(first_char_count - second_char_count)
                        maximum_variance = max(maximum_variance, variance)

    return maximum_variance

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible substrings of the input string of length n. This is done using two nested loops, one to select the starting index and another to select the ending index, resulting in O(n^2) substrings. For each substring, it counts the occurrences of each letter of the alphabet (26 letters). Then it compares all pairs of letters which is a constant time operation O(26*26). However, counting the occurrences of each letter within a substring of length n takes O(n) time in the worst case. Therefore, the overall time complexity is O(n^2 * n) which simplifies to O(n^3).
Space Complexity
O(1)The provided solution iterates through all possible substrings using nested loops, and within each substring, it counts letter frequencies and calculates the largest variance. The only auxiliary space used is for storing a few integer variables to track counts and the maximum variance seen so far. The number of such variables is fixed and does not depend on the size of the input string (N). Therefore, the space complexity is constant.

Optimal Solution

Approach

The key is to avoid checking every possible substring. Instead, focus on each pair of characters to find the biggest difference between their occurrences within the string, simulating a reset whenever a count becomes non-positive to effectively restart the search for a more optimal variance from that point forward.

Here's how the algorithm would work step-by-step:

  1. Consider each possible pair of different characters in the string.
  2. For each pair, walk through the string keeping track of the difference between how many times the first character appears versus the second character.
  3. If the count goes negative, it means we've seen more of the second character than the first. At this point, reset the count to zero because any further substring starting from before the negative count will have less variance than one starting now.
  4. Keep track of the largest difference we've seen so far for each pair of characters. This represents the biggest variance we've found.
  5. To ensure we're not missing cases where one of the characters doesn't appear, check if the second character exists at all. If it doesn't, move on to the next pair of characters.
  6. After considering every pair of characters, the largest variance we've kept track of is the solution.

Code Implementation

def substring_with_largest_variance(string):
    max_variance = 0
    unique_characters = set(string)

    for main_char in unique_characters:
        for comparison_char in unique_characters:
            if main_char == comparison_char:
                continue

            main_char_count = 0
            comparison_char_count = 0

            has_comparison_char = False
            for char in string:
                if char == main_char:
                    main_char_count += 1
                elif char == comparison_char:
                    comparison_char_count += 1
                    has_comparison_char = True

                #If comparison char doesn't exist, no variance
                if not has_comparison_char:
                    continue

                variance = main_char_count - comparison_char_count

                max_variance = max(max_variance, variance)

                # Reset when count is negative
                if variance < 0:
                    main_char_count = 0
                    comparison_char_count = 0

    return max_variance

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through all possible pairs of distinct characters in the input string. In the worst case, where all characters are distinct, the number of pairs is a constant related to the size of the alphabet (e.g., 26 choose 2 for lowercase English letters), making it an O(1) operation. For each pair, the algorithm iterates through the string once to calculate the variance, which takes O(n) time, where n is the length of the input string. Because the alphabet size is constant, the overall time complexity is dominated by the single pass through the string n, therefore O(n).
Space Complexity
O(1)The algorithm iterates through pairs of characters and maintains a count, but does not use any data structures like arrays, lists, or hashmaps that scale with the input string's length. The space usage is dominated by a few integer variables to store the current difference, maximum variance, and potentially flags for character existence. Therefore, the auxiliary space remains constant regardless of the input string length N.

Edge Cases

CaseHow to Handle
Empty stringReturn 0 immediately since no variance can exist in an empty string.
String with only one distinct characterReturn 0 immediately because variance requires at least two distinct characters.
String with only two characters, but the less frequent character appears only once.The algorithm should correctly calculate the variance by identifying the most and least frequent characters and iterating through the string.
Long string with a large number of distinct charactersThe solution should scale efficiently to handle a large input size without exceeding reasonable time or memory limits, especially the distinct character counting part.
String with many occurrences of one character and very few of anotherEnsure that the less frequent character is correctly identified as the 'min' character and that the algorithm correctly accounts for its rare occurrences.
String where no valid substring produces a positive variance (e.g. 'aaaa')The algorithm should return 0 in this case, as there's no substring with positive variance.
String with extreme length potentially causing integer overflow in count calculationsUse a data type that can accommodate large counts, such as long, to prevent integer overflow during calculations.
String where the maximum variance substring is at the beginning or end of the input string.The algorithm must properly consider substrings starting from the beginning and ending at the end of the string to find the maximal variance.