Taro Logo

Longest Substring with At Least K Repeating Characters

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+2
More companies
Profile picture
Profile picture
36 views
Topics:
StringsRecursion

Given a string s and an integer k, return the length of the longest substring of s such that the frequency of each character in this substring is greater than or equal to k.

if no such substring exists, return 0.

Example 1:

Input: s = "aaabb", k = 3
Output: 3
Explanation: The longest substring is "aaa", as 'a' is repeated 3 times.

Example 2:

Input: s = "ababbc", k = 2
Output: 5
Explanation: The longest substring is "ababb", as 'a' is repeated 2 times and 'b' is repeated 3 times.

Constraints:

  • 1 <= s.length <= 104
  • s consists of only lowercase English letters.
  • 1 <= k <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range for the value of 'k', and what is the maximum possible length of the input string 's'?
  2. Can the input string 's' be empty or null? What should I return in that case?
  3. If no substring satisfies the condition (every character appears at least k times), what value should I return?
  4. Does the string 's' consist only of lowercase English letters, or can it contain other characters like uppercase letters, numbers, or special symbols?
  5. If there are multiple longest substrings that meet the criteria, can I return the length of any one of them?

Brute Force Solution

Approach

The brute force approach to this problem is to examine every possible substring of the given string. We'll then check if each of these substrings satisfies the condition that every character appears at least 'k' times. Finally, we'll pick the longest substring that meets this requirement.

Here's how the algorithm would work step-by-step:

  1. Start by considering all substrings of length 1 from the given string.
  2. Then consider all substrings of length 2, then length 3, and so on, up to the length of the entire string.
  3. For each substring we consider, count how many times each character appears in it.
  4. Check if every character that appears in the substring appears at least 'k' times.
  5. If a substring satisfies this condition and is longer than any other substring we've found so far that also meets the condition, remember this substring.
  6. After checking all possible substrings, the longest substring we remembered is the answer.

Code Implementation

def longest_substring_with_k_repeating_characters_brute_force(string, k_repeats):
    longest_substring_found = ""

    # Iterate through all possible substring lengths
    for substring_length in range(1, len(string) + 1):
        # Iterate through all possible start indices for a given length
        for start_index in range(len(string) - substring_length + 1):
            end_index = start_index + substring_length
            substring = string[start_index:end_index]

            # Count character frequencies
            character_counts = {}
            for char in substring:
                character_counts[char] = character_counts.get(char, 0) + 1

            # Check if every character appears at least k times
            meets_k_repeats_condition = True
            for char, count in character_counts.items():
                if count < k_repeats:
                    meets_k_repeats_condition = False
                    break

            # If the substring meets the criteria and is longer than the current longest
            if meets_k_repeats_condition:

                # Update the longest substring found so far
                if len(substring) > len(longest_substring_found):
                    longest_substring_found = substring

    return longest_substring_found

Big(O) Analysis

Time Complexity
O(n³)The described brute force approach iterates through all possible substrings of the input string. There are approximately n * (n+1) / 2 possible substrings, which simplifies to O(n²). For each of these substrings, we count the frequency of each character, which takes O(n) time in the worst case (where the substring is nearly unique characters). Therefore, the overall time complexity is O(n²) * O(n) = O(n³).
Space Complexity
O(1)The brute force approach, as described, primarily uses a sliding window defined by start and end indices to generate substrings. The dominant space usage arises from counting character frequencies within each substring to check if each character appears at least k times. While a hash map or array could be used for counting character frequencies, its size is bounded by the size of the character set, which is constant (26 for lowercase English letters, or a fixed number for any defined character set), regardless of the input string's length (N). Therefore, the space complexity is O(1).

Optimal Solution

Approach

The goal is to find the longest section of text where every character appears at least a certain number of times. The best approach is to split the text at problem spots and then check the smaller parts individually.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each character appears in the entire text.
  2. Identify the characters that appear fewer times than required. These are the problem characters.
  3. If there are no problem characters, the entire text is the answer.
  4. If there are problem characters, split the text into smaller pieces wherever these characters appear.
  5. Now, treat each of these smaller pieces as a new, smaller problem and repeat the process from the beginning.
  6. Keep doing this until you find pieces of text that either have no problem characters or are too short to possibly meet the length requirement.
  7. The longest valid piece you find during this process is the answer.

Code Implementation

def longestSubstring(input_string, required_repetitions):
    string_length = len(input_string)

    # Base case: if the string is too short,
    if string_length < required_repetitions:
        return 0

    character_counts = {}
    for char in input_string:
        character_counts[char] = character_counts.get(char, 0) + 1

    split_characters = []
    for char, count in character_counts.items():
        if count < required_repetitions:
            split_characters.append(char)

    # If no characters need splitting, the whole string is valid.
    if not split_characters:
        return string_length

    max_length = 0
    start = 0
    for index, char in enumerate(input_string):
        if char in split_characters:
            # Split the string and recursively check substrings.

            sub_string = input_string[start:index]
            sub_string_length = longestSubstring(sub_string, required_repetitions)
            max_length = max(max_length, sub_string_length)
            start = index + 1

    #Process the remaining substring after the last split character
    sub_string = input_string[start:]
    sub_string_length = longestSubstring(sub_string, required_repetitions)
    max_length = max(max_length, sub_string_length)

    return max_length

Big(O) Analysis

Time Complexity
O(n²)The algorithm's time complexity is dominated by the recursive splitting and processing of substrings. In the worst-case scenario, each character could potentially be a 'problem character', leading to a split at nearly every position in the string. Each split results in a recursive call that potentially iterates through the remaining substrings to count character frequencies and further split, resulting in nested iterations. Therefore, in the worst case, the algorithm might perform operations on substrings of varying lengths repeatedly, approximating n * n/2 operations. This simplifies to O(n²).
Space Complexity
O(N)The space complexity is dominated by the recursion depth. In the worst-case scenario, where the string is repeatedly split into very small subproblems, the recursion could go as deep as the length of the input string (N), leading to N stack frames. Each stack frame consumes a constant amount of space, but the total space used across all stack frames is proportional to N. The character count also uses O(1) space since the number of characters is limited, such as 26 for lowercase English alphabet, so it does not affect the overall space complexity.

Edge Cases

Empty string s
How to Handle:
Return 0 immediately, as there's no substring to evaluate.
k is less than or equal to 0
How to Handle:
Return the length of s, since any character appears at least 0 times, effectively making any substring valid.
k is greater than the length of s
How to Handle:
Return 0, since no substring of s can have characters appearing at least k times.
String s contains only one character repeated
How to Handle:
If the repetition count is at least k, return the length of s; otherwise, return 0.
String s contains all unique characters, and k > 1
How to Handle:
Return 0, since no character appears at least k times.
String s is very long and recursive solution could lead to stack overflow
How to Handle:
Consider an iterative or divide-and-conquer approach to avoid deep recursion.
Integer overflow if calculating lengths of substrings exceeds max integer value
How to Handle:
Ensure all length calculations are performed in a way that avoids overflow, possibly by returning early when a maximum length is reached or using a larger data type.
String contains unicode characters
How to Handle:
The character frequency map or array should be large enough to accommodate the full range of possible unicode characters.