Longest Substring with At Least K Repeating Characters

Medium

Google

Topics:

StringsRecursionSliding Windows

Given a string s and an integer k, return the length of the longest substring of s such that the frequency of each character in this substring is greater than or equal to k. If no such substring exists, return 0.

For example:

If s = "aaabb" and k = 3, the output should be 3 because the longest substring is "aaa", where 'a' is repeated 3 times.
If s = "ababbc" and k = 2, the output should be 5 because the longest substring is "ababb", where 'a' is repeated 2 times and 'b' is repeated 3 times.

Write a function to efficiently solve this problem. Consider edge cases and optimize for both time and space complexity. The length of s will be between 1 and 10,000, and k will be between 1 and 100,000. The string s consists of only lowercase English letters.

Longest Substring with At Least K Repeating Characters

Medium

Google

Topics:

StringsRecursionSliding Windows

For example:

If s = "aaabb" and k = 3, the output should be 3 because the longest substring is "aaa", where 'a' is repeated 3 times.
If s = "ababbc" and k = 2, the output should be 5 because the longest substring is "ababb", where 'a' is repeated 2 times and 'b' is repeated 3 times.

Solution

Naive Approach

The most straightforward approach is to generate all possible substrings of the given string s and then, for each substring, check if the frequency of each character in the substring is greater than or equal to k. The longest substring that satisfies this condition is the answer.

Algorithm

Generate all substrings of s.
For each substring, create a frequency map (dictionary) of characters.
Iterate through the frequency map. If any character has a frequency less than k, the substring is invalid.
If all characters in the substring have a frequency greater than or equal to k, update the maximum length.

Code (Python)

def longestSubstringNaive(s: str, k: int) -> int:
    n = len(s)
    max_len = 0
    for i in range(n):
        for j in range(i, n):
            sub = s[i:j+1]
            freq = {}
            for char in sub:
                freq[char] = freq.get(char, 0) + 1
            valid = True
            for char in freq:
                if freq[char] < k:
                    valid = False
                    break
            if valid:
                max_len = max(max_len, len(sub))
    return max_len

Time Complexity

The time complexity is O(N^3) where N is the length of the string s. Generating all substrings takes O(N^2) time, and checking the frequency of characters in each substring takes O(N) time in the worst case.

Space Complexity

The space complexity is O(N), where N is the length of the string s, due to the space used by the frequency map.

Optimal Approach (Divide and Conquer)

The optimal solution employs a divide and conquer strategy. The idea is to find a character in the string whose frequency is less than k. If such a character exists, it means that this character cannot be part of any valid substring. Therefore, we can split the string into substrings based on this character and recursively solve the problem for each substring.

Algorithm

Create a frequency map of characters in the string s.
Find a character c whose frequency is less than k.
If no such character exists, return the length of the string s.
If such a character c exists, split the string s into substrings using c as a delimiter.
Recursively call the function on each substring and return the maximum length among them.

Code (Python)

def longestSubstring(s: str, k: int) -> int:
    n = len(s)
    if n == 0:
        return 0
    
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    
    for char in freq:
        if freq[char] < k:
            max_len = 0
            for sub in s.split(char):
                max_len = max(max_len, longestSubstring(sub, k))
            return max_len
    
    return n

Time Complexity

The time complexity is O(N), where N is the length of the string s. In the worst case, the string may be split into many substrings, but each character is only visited a limited number of times, keeping the overall complexity linear. Splitting the string contributes a factor, but it doesn't alter the overall linear time complexity.

Space Complexity

The space complexity is O(N), where N is the length of the string s, due to the recursion depth. In the worst-case scenario, the recursion depth can be proportional to the length of the string if the string is split into very small substrings recursively. Additionally, the frequency map takes O(1) space since there are only 26 lowercase English letters, so it's a constant factor. The space usage for s.split creates new strings but are short-lived and don't affect the overall asymptotic space complexity.

Edge Cases

If the input string s is empty, the function should return 0.
If k is greater than the length of s, and if any character exists, the result will be 0 as any character will have a length less than k.
If no character appears less than k times, the whole string s is a valid substring, so the function should return the length of s.

Solution

Naive Approach

Algorithm

Generate all substrings of s.
For each substring, create a frequency map (dictionary) of characters.
Iterate through the frequency map. If any character has a frequency less than k, the substring is invalid.
If all characters in the substring have a frequency greater than or equal to k, update the maximum length.

Code (Python)

def longestSubstringNaive(s: str, k: int) -> int:
    n = len(s)
    max_len = 0
    for i in range(n):
        for j in range(i, n):
            sub = s[i:j+1]
            freq = {}
            for char in sub:
                freq[char] = freq.get(char, 0) + 1
            valid = True
            for char in freq:
                if freq[char] < k:
                    valid = False
                    break
            if valid:
                max_len = max(max_len, len(sub))
    return max_len

Time Complexity

Space Complexity

The space complexity is O(N), where N is the length of the string s, due to the space used by the frequency map.

Optimal Approach (Divide and Conquer)

Algorithm

Create a frequency map of characters in the string s.
Find a character c whose frequency is less than k.
If no such character exists, return the length of the string s.
If such a character c exists, split the string s into substrings using c as a delimiter.
Recursively call the function on each substring and return the maximum length among them.

Code (Python)

def longestSubstring(s: str, k: int) -> int:
    n = len(s)
    if n == 0:
        return 0
    
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    
    for char in freq:
        if freq[char] < k:
            max_len = 0
            for sub in s.split(char):
                max_len = max(max_len, longestSubstring(sub, k))
            return max_len
    
    return n

Time Complexity

Space Complexity

Edge Cases

If the input string s is empty, the function should return 0.
If k is greater than the length of s, and if any character exists, the result will be 0 as any character will have a length less than k.
If no character appears less than k times, the whole string s is a valid substring, so the function should return the length of s.