Taro Logo

Count Substrings With K-Frequency Characters I

Medium
Google logo
Google
2 views
Topics:
Strings

Given a string s and an integer k, return the total number of substrings of s where at least one character appears at least k times.

Example 1:

Input: s = "abacb", k = 2

Output: 4

Explanation:

The valid substrings are:

  • "aba" (character 'a' appears 2 times).
  • "abac" (character 'a' appears 2 times).
  • "abacb" (character 'a' appears 2 times).
  • "bacb" (character 'b' appears 2 times).

Example 2:

Input: s = "abcde", k = 1

Output: 15

Explanation:

All substrings are valid because every character appears at least once.

Constraints:

  • 1 <= s.length <= 3000
  • 1 <= k <= s.length
  • s consists only of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range for 'k', and what data type is it?
  2. Is the input string guaranteed to be non-empty, and what characters can it contain (e.g., only lowercase English letters)?
  3. If no substrings contain a character exactly 'k' times, what should the function return?
  4. Are overlapping substrings counted multiple times? For example, in 'abab' with k=2, is 'abab' itself a valid substring?
  5. Is 'k' guaranteed to be non-negative?

Brute Force Solution

Approach

The brute force approach to counting substrings involves checking every possible substring within a given string. For each substring, we count how many times each character appears. We then check if any character appears exactly 'k' times and count it if it does.

Here's how the algorithm would work step-by-step:

  1. Consider every possible substring of the input string, starting from the very first character and a length of one.
  2. Extend the length of the substring one character at a time, continuing to the end of the string.
  3. For each substring, count the number of times each character appears within it.
  4. Check if there exists at least one character whose frequency is exactly equal to the given value 'k'.
  5. If the condition is met, increment a counter. This counter keeps track of the number of substrings satisfying the requirement.
  6. Move to the next starting character in the main string and repeat the process of finding all the substrings beginning at this new position.
  7. Continue this process until you've considered every possible starting position in the main string.
  8. The final value of the counter represents the total number of substrings that have exactly one character appearing 'k' times.

Code Implementation

def count_substrings_with_k_frequency(input_string, k_frequency):
    substring_count = 0
    string_length = len(input_string)

    for start_index in range(string_length):
        for end_index in range(start_index, string_length):
            substring = input_string[start_index:end_index + 1]
            frequency_map = {}

            for character in substring:
                frequency_map[character] = frequency_map.get(character, 0) + 1

            # Check if any character's frequency is exactly k
            found_k_frequency = False
            for character in frequency_map:
                if frequency_map[character] == k_frequency:
                    found_k_frequency = True
                    break

            # Increment the counter if a character appears k times
            if found_k_frequency:
                substring_count += 1

    return substring_count

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible substrings of the input string. There are approximately n*(n+1)/2 such substrings, which is O(n^2). For each substring, we count the frequency of each character, which takes O(n) time in the worst case since the substring can be of length n. Therefore, the overall time complexity is O(n^2) * O(n) = O(n^3).
Space Complexity
O(1)The auxiliary space complexity is O(1) because for each substring considered, the algorithm uses a constant amount of extra memory to store character counts. Specifically, it only needs to store a fixed-size character count data structure (e.g., an array of size 26 for lowercase English letters, or a hash map) to store the frequency of each character. The size of this character count data structure is independent of the length of the input string (N), so the space used is constant.

Optimal Solution

Approach

To efficiently count substrings where a character appears exactly K times, we avoid checking every possible substring. The strategy focuses on smartly expanding substrings from each possible start position and efficiently updating counts.

Here's how the algorithm would work step-by-step:

  1. Consider each position in the string as a potential starting point of a substring.
  2. From each starting position, expand the substring one character at a time to the right.
  3. While expanding, keep track of the frequency of each character in the current substring.
  4. For each expanded substring, check if there is exactly one character whose frequency is exactly equal to K and all other characters have a frequency not equal to K.
  5. If the condition in the previous step is met, increment a counter to keep track of valid substrings.
  6. After expanding from each starting position, return the final counter, which represents the number of substrings satisfying the given criteria.

Code Implementation

def count_substrings_with_k_frequency_characters_i(input_string, target_frequency):
    string_length = len(input_string)
    substring_count = 0

    for start_index in range(string_length):
        character_frequencies = {}
        for end_index in range(start_index, string_length):
            current_char = input_string[end_index]
            character_frequencies[current_char] = character_frequencies.get(current_char, 0) + 1

            # Crucial check to identify chars with target frequency.
            chars_with_target_frequency_count = 0
            for char in character_frequencies:
                if character_frequencies[char] == target_frequency:
                    chars_with_target_frequency_count += 1

            # This condition is the core of the problem.
            is_valid_substring = False
            if chars_with_target_frequency_count == 1:
                is_valid_substring = True

            if is_valid_substring:
                substring_count += 1

    # Need to return final calculated result.
    return substring_count

Big(O) Analysis

Time Complexity
O(n²)The outer loop iterates through each of the n characters of the string as a potential start position. The inner loop expands the substring from the starting position to the right, potentially up to n characters. Within the inner loop, character frequencies are updated and checked, which takes constant time. Therefore, the dominant operations are driven by the nested loops, resulting in approximately n * n/2 operations in the worst case, simplifying to O(n²).
Space Complexity
O(1)The described algorithm utilizes a fixed-size frequency counter for characters within each substring, which is independent of the input string's length (N). Regardless of the substring's length, the character frequency counter stores counts for all possible characters, which is constant. Therefore, the auxiliary space used is constant and does not scale with the input size N.

Edge Cases

CaseHow to Handle
Null or empty string inputReturn 0 immediately, as there are no substrings to analyze.
String length is less than kReturn 0 immediately, as no substring of length at least k can exist.
k is zeroReturn the count of substrings with no repeating character that do not repeat, and of length at least 1.
k is negativeTreat k as if it were zero and handle it accordingly.
String contains only one distinct character, and k is greater than 1Return 0 because no substrings will have a character appearing exactly k times unless k is 1.
String contains characters with ASCII values outside the standard range (e.g., extended ASCII or Unicode)Ensure the character counting data structure (e.g., hash map or array) can accommodate the full range of possible character values.
Very long string causing potential integer overflow in substring counts or frequency countsUse a 64-bit integer type to store counts or consider a modular arithmetic approach if necessary.
k is larger than the length of the stringReturn 0, as it is impossible to have any characters with frequency equal to k.