Taro Logo

Maximum Number of Occurrences of a Substring

Medium
Roblox logo
Roblox
1 view
Topics:
StringsSliding Windows

Given a string s, return the maximum number of occurrences of any substring under the following rules:

  • The number of unique characters in the substring must be less than or equal to maxLetters.
  • The substring size must be between minSize and maxSize inclusive.

Example 1:

Input: s = "aababcaab", maxLetters = 2, minSize = 3, maxSize = 4
Output: 2
Explanation: Substring "aab" has 2 occurrences in the original string.
It satisfies the conditions, 2 unique letters and size 3 (between minSize and maxSize).

Example 2:

Input: s = "aaaa", maxLetters = 1, minSize = 3, maxSize = 3
Output: 2
Explanation: Substring "aaa" occur 2 times in the string. It can overlap.

Constraints:

  • 1 <= s.length <= 105
  • 1 <= maxLetters <= 26
  • 1 <= minSize <= maxSize <= min(26, s.length)
  • s consists of only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the length of the input string `s` and the values of `maxLetters`, `minSize`, and `maxSize`?
  2. If no substring satisfies the conditions (number of unique letters <= `maxLetters` and `minSize` <= length <= `maxSize`), what should the function return?
  3. By 'substring', do you mean a contiguous sequence of characters within the string `s`?
  4. If multiple substrings have the same maximum number of occurrences, should I return the count of any one of them, or is there a specific one I should prioritize?
  5. Can `minSize` ever be greater than `maxSize`? If so, what is the expected behavior?

Brute Force Solution

Approach

The brute force method in this problem involves checking every single possible substring within a given string. We look at all possible lengths and starting positions to find repeating substrings. This approach guarantees finding the maximum occurrences but isn't the fastest.

Here's how the algorithm would work step-by-step:

  1. Consider every possible substring length, starting from the shortest possible length up to the longest.
  2. For each possible length, examine every substring of that length that can be found in the main string.
  3. Count how many times each of these substrings appears within the main string.
  4. Keep track of the substring that appears the most frequently, updating the count whenever you find a substring that occurs more often than the current most frequent one.
  5. Also, make sure that each potential substring meets the criteria specified by the constraints, if any exist (e.g. it contains at most a certain number of distinct characters). If it does not meet the criteria, skip it.
  6. After checking all possible substrings and lengths, the substring with the highest occurrence count is your answer.

Code Implementation

def max_occurrences_substring_brute_force(text, min_length, max_length, max_letters):
    maximum_occurrences = 0
    most_frequent_substring = ""

    # Iterate through possible substring lengths
    for substring_length in range(min_length, max_length + 1):

        # Iterate through possible starting positions
        for starting_position in range(len(text) - substring_length + 1):
            substring = text[starting_position:starting_position + substring_length]

            # Skip substrings that exceed the maximum allowed distinct letters
            if len(set(substring)) > max_letters:
                continue

            current_occurrences = 0
            # Count occurrences of the substring in the text
            for i in range(len(text) - substring_length + 1):
                if text[i:i + substring_length] == substring:
                    current_occurrences += 1

            # Update the maximum occurrences if necessary
            if current_occurrences > maximum_occurrences:
                maximum_occurrences = current_occurrences
                most_frequent_substring = substring

    return maximum_occurrences

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible substring lengths from 1 to n, where n is the length of the input string. For each length, it iterates through all possible starting positions of substrings of that length. Finally, for each substring, it counts its occurrences in the main string, which requires another iteration through the string. Therefore, the time complexity involves three nested loops, leading to O(n * n * n) = O(n^3) where n is the length of the original string.
Space Complexity
O(N)The brute force approach, as described, primarily uses a counter (or a dictionary) to store the frequency of each substring encountered. In the worst-case scenario, the number of distinct substrings can grow linearly with the length of the string, N. Therefore, the auxiliary space used to store these counts can be proportional to N. This leads to a space complexity of O(N).

Optimal Solution

Approach

To find the most frequent substring within a given string, we need a way to efficiently count occurrences. The optimal approach involves focusing on the possible substrings and smartly updating the count to avoid redundant calculations.

Here's how the algorithm would work step-by-step:

  1. Start by checking all the possible substrings within the main string that meet the given length and character limits.
  2. For each valid substring, count how many times it appears in the main string.
  3. Keep track of the maximum occurrence count seen so far.
  4. To avoid unnecessary counting, use a mechanism to remember previously counted substrings. If a substring has already been counted, simply retrieve its count instead of recalculating.
  5. Update the maximum occurrence count whenever a substring is found with a higher count.
  6. Return the final maximum occurrence count.

Code Implementation

def max_occurrence(text, min_length, max_length, max_letters):
    string_length = len(text)
    substring_counts = {}
    maximum_occurrences = 0

    for i in range(string_length):
        for current_length in range(min_length, min_length + 1):
            if i + current_length > string_length:
                break

            substring = text[i:i + current_length]

            unique_letters = len(set(substring))

            # Limit based on unique letters
            if unique_letters <= max_letters:
                if substring in substring_counts:
                    substring_counts[substring] += 1
                else:
                    # Count occurrences only if not previously counted
                    substring_counts[substring] = text.count(substring)

                maximum_occurrences = max(maximum_occurrences, substring_counts[substring])

    return maximum_occurrences

Big(O) Analysis

Time Complexity
O(n^2)The algorithm iterates through the string to extract all possible substrings meeting specified criteria. Let n be the length of the input string. Generating each substring takes O(1) time, but potentially n substrings can be checked. For each substring, we iterate through the original string to count its occurrences, which also takes O(n) time in the worst case. Therefore, in the worst-case scenario, we are performing an O(n) operation (counting occurrences) potentially n times, leading to a time complexity of O(n * n) or O(n^2).
Space Complexity
O(N)The provided solution uses a mechanism to remember previously counted substrings and their counts, which is typically implemented using a hash map or dictionary. In the worst case, where almost all possible substrings meeting the length and character limits are unique and appear at least once, the hash map could store counts for a significant portion of all possible substrings of the main string. This results in space complexity that grows linearly with the length of the string, denoted as N, because we might store a number of substring counts proportional to N. Therefore, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input stringReturn 0 since no substrings can be formed.
maxLetters is 0Return 0 since no substring with any unique characters can be valid.
minSize is greater than the string lengthReturn 0 as no substring of the required size can exist.
maxSize is less than minSizeReturn 0 as no substring can simultaneously satisfy both size constraints.
String contains only one unique character and maxLetters is 1Count occurrences of the substring of length minSize if minSize is less or equal to the string length, otherwise return 0.
All characters in the string are unique and the string length equals minSizeThe maximum occurrence is 1 for the whole string if the string length is also less than or equal to maxSize and less than or equal to maxLetters.
Integer overflow possibility when counting occurrencesUse appropriate data type to store the counts, such as long or int depending on problem constraints, but always consider the constraints.
String with very long length to test for memory constraints and efficiencyEfficiently process the string using sliding window and a hashmap to count occurrences, ensuring memory usage stays within acceptable limits.