Given a string s
, return the number of homogenous substrings of s
. Since the answer may be too large, return it modulo 109 + 7
.
A string is homogenous if all the characters of the string are the same.
A substring is a contiguous sequence of characters within a string.
Example 1:
Input: s = "abbcccaa" Output: 13 Explanation: The homogenous substrings are listed as below: "a" appears 3 times. "aa" appears 1 time. "b" appears 2 times. "bb" appears 1 time. "c" appears 3 times. "cc" appears 2 times. "ccc" appears 1 time. 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13.
Example 2:
Input: s = "xy" Output: 2 Explanation: The homogenous substrings are "x" and "y".
Example 3:
Input: s = "zzzzz" Output: 15
Constraints:
1 <= s.length <= 105
s
consists of lowercase letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force way to count homogenous substrings is very straightforward. We look at every possible piece of the whole string. Then, for each of those pieces, we check if it's made up of only one character.
Here's how the algorithm would work step-by-step:
def count_homogenous_substrings_brute_force(input_string):
string_length = len(input_string)
homogenous_substring_count = 0
# Iterate through all possible substring lengths
for substring_length in range(1, string_length + 1):
# Iterate through all possible starting positions for the substring
for starting_index in range(string_length - substring_length + 1):
# Extract the substring
substring = input_string[starting_index:starting_index + substring_length]
# Check if the substring is homogenous
is_homogenous = True
#Checking if the substring has zero length. Added for safety.
if len(substring) == 0:
is_homogenous = False
# Iterate through the substring characters
for character_index in range(1, len(substring)):
#Check the current character against the previous
if substring[character_index] != substring[0]:
is_homogenous = False
break
# Increment the count if the substring is homogenous
if is_homogenous:
homogenous_substring_count += 1
return homogenous_substring_count
The efficient way to count homogenous substrings involves identifying contiguous blocks of identical characters. Instead of checking every possible substring, we focus on calculating the number of homogenous substrings within each of these blocks and then summing them up. This avoids redundant calculations and drastically improves performance.
Here's how the algorithm would work step-by-step:
def count_homogenous_substrings(input_string):
string_length = len(input_string)
total_homogenous_substrings = 0
start_index = 0
while start_index < string_length:
end_index = start_index
# Find the end of the current homogenous section
while (end_index < string_length and
input_string[start_index] == input_string[end_index]):
end_index += 1
# Calculate the length of the homogenous section
homogenous_length = end_index - start_index
# Calculate number of substrings
total_homogenous_substrings +=
homogenous_length * (homogenous_length + 1) // 2
# Move to the next section
start_index = end_index
return total_homogenous_substrings
Case | How to Handle |
---|---|
Null or empty input string | Return 0 as there are no substrings. |
String with a single character | Return 1 since a single character is a homogenous substring of length 1. |
String with all identical characters (e.g., 'aaaaa') | The formula n*(n+1)/2 efficiently calculates the count of substrings in this case. |
String with alternating characters (e.g., 'ababab') | The solution should correctly count each single character substring as a homogenous substring. |
Very long string (approaching maximum string length supported by the language) | Ensure the solution uses an efficient algorithm (O(n)) to avoid exceeding time limits. |
String containing non-ASCII characters | The character comparison should handle Unicode characters correctly, which it will natively in most languages. |
Integer overflow when calculating the number of substrings | Use a 64-bit integer type (long) to store the count of homogenous substrings if necessary. |
String containing mixed case characters, but we only care about homogenity of same characters | Ensure the count resets to zero if casing is different ( 'aAaa' would have substrings 'a', 'A', 'aa') |