The variance of a string is defined as the largest difference between the number of occurrences of any 2
characters present in the string. Note the two characters may or may not be the same.
Given a string s
consisting of lowercase English letters only, return the largest variance possible among all substrings of s
.
A substring is a contiguous sequence of characters within a string.
Example 1:
Input: s = "aababbb" Output: 3 Explanation: All possible variances along with their respective substrings are listed below: - Variance 0 for substrings "a", "aa", "ab", "abab", "aababb", "ba", "b", "bb", and "bbb". - Variance 1 for substrings "aab", "aba", "abb", "aabab", "ababb", "aababbb", and "bab". - Variance 2 for substrings "aaba", "ababbb", "abbb", and "babb". - Variance 3 for substring "babbb". Since the largest possible variance is 3, we return it.
Example 2:
Input: s = "abcde" Output: 0 Explanation: No letter occurs more than once in s, so the variance of every substring is 0.
Constraints:
1 <= s.length <= 104
s
consists of lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
We're going to find the largest difference between how often two letters appear in any part of a given piece of text. The simplest way to do this is to look at every possible piece of the text and compare the counts of all possible letter pairs.
Here's how the algorithm would work step-by-step:
def largest_variance_brute_force(text):
maximum_variance = 0
text_length = len(text)
for start_index in range(text_length):
for end_index in range(start_index, text_length):
sub_string = text[start_index : end_index+1]
# Create a dictionary to count character occurrences
character_counts = {}
for char in sub_string:
character_counts[char] = character_counts.get(char, 0) + 1
# Find the variance between each character pair
for first_char in character_counts:
for second_char in character_counts:
first_char_count = character_counts.get(first_char, 0)
second_char_count = character_counts.get(second_char, 0)
#Ensure both characters appear in the substring
if first_char_count > 0 and second_char_count > 0:
variance = abs(first_char_count - second_char_count)
maximum_variance = max(maximum_variance, variance)
return maximum_variance
The key is to avoid checking every possible substring. Instead, focus on each pair of characters to find the biggest difference between their occurrences within the string, simulating a reset whenever a count becomes non-positive to effectively restart the search for a more optimal variance from that point forward.
Here's how the algorithm would work step-by-step:
def substring_with_largest_variance(string):
max_variance = 0
unique_characters = set(string)
for main_char in unique_characters:
for comparison_char in unique_characters:
if main_char == comparison_char:
continue
main_char_count = 0
comparison_char_count = 0
has_comparison_char = False
for char in string:
if char == main_char:
main_char_count += 1
elif char == comparison_char:
comparison_char_count += 1
has_comparison_char = True
#If comparison char doesn't exist, no variance
if not has_comparison_char:
continue
variance = main_char_count - comparison_char_count
max_variance = max(max_variance, variance)
# Reset when count is negative
if variance < 0:
main_char_count = 0
comparison_char_count = 0
return max_variance
Case | How to Handle |
---|---|
Empty string | Return 0 immediately since no variance can exist in an empty string. |
String with only one distinct character | Return 0 immediately because variance requires at least two distinct characters. |
String with only two characters, but the less frequent character appears only once. | The algorithm should correctly calculate the variance by identifying the most and least frequent characters and iterating through the string. |
Long string with a large number of distinct characters | The solution should scale efficiently to handle a large input size without exceeding reasonable time or memory limits, especially the distinct character counting part. |
String with many occurrences of one character and very few of another | Ensure that the less frequent character is correctly identified as the 'min' character and that the algorithm correctly accounts for its rare occurrences. |
String where no valid substring produces a positive variance (e.g. 'aaaa') | The algorithm should return 0 in this case, as there's no substring with positive variance. |
String with extreme length potentially causing integer overflow in count calculations | Use a data type that can accommodate large counts, such as long, to prevent integer overflow during calculations. |
String where the maximum variance substring is at the beginning or end of the input string. | The algorithm must properly consider substrings starting from the beginning and ending at the end of the string to find the maximal variance. |