Taro Logo

Maximum Number of Non-Overlapping Substrings

Medium
1 views
a month ago

Given a string s of lowercase letters, you need to find the maximum number of non-empty substrings of s that meet the following conditions:

  1. The substrings do not overlap, that is for any two substrings s[i..j] and s[x..y], either j < x or i > y is true.
  2. A substring that contains a certain character c must also contain all occurrences of c.

Find the maximum number of substrings that meet the above conditions. If there are multiple solutions with the same number of substrings, return the one with minimum total length. It can be shown that there exists a unique solution of minimum total length.

Notice that you can return the substrings in any order.

Example 1:

Input: s = "adefaddaccc"
Output: ["e","f","ccc"]

Example 2:

Input: s = "abbaccd"
Output: ["d","bb","cc"]

Constraints:

  • 1 <= s.length <= 10^5
  • s contains only lowercase English letters.
Sample Answer
def max_substrings(s):
    """Finds the maximum number of non-overlapping substrings of s meeting the given conditions."""
    n = len(s)
    if n == 0:
        return []

    # 1. Find the first and last occurrence of each character.
    first_occurrence = {}
    last_occurrence = {}
    for i, char in enumerate(s):
        if char not in first_occurrence:
            first_occurrence[char] = i
        last_occurrence[char] = i

    # 2. Merge intervals.
    intervals = []
    for char in first_occurrence:
        intervals.append([first_occurrence[char], last_occurrence[char]])

    i = 0
    while i < len(intervals):
        j = 0
        while j < len(intervals):
            if i != j:
                # Check if intervals overlap
                if intervals[j][0] >= intervals[i][0] and intervals[j][1] <= intervals[i][1]:
                    # Check if all occurrences of characters in intervals[j] are within intervals[i]
                    valid_merge = True
                    for char in first_occurrence:
                        if first_occurrence[char] >= intervals[j][0] and first_occurrence[char] <= intervals[j][1]:
                            if first_occurrence[char] < intervals[i][0] or first_occurrence[char] > intervals[i][1] or \
                               last_occurrence[char] < intervals[i][0] or last_occurrence[char] > intervals[i][1]:
                                valid_merge = False
                                break

                    if valid_merge:
                        intervals.pop(j)
                        if j < i:
                            i -= 1
                        j = -1  # Restart the inner loop
            if j >= 0:
                j += 1
        i += 1

    # Sort intervals by end position.
    intervals.sort(key=lambda x: x[1])

    # Select non-overlapping intervals.
    result = []
    end = -1
    for start, current_end in intervals:
        if start > end:
            result.append(s[start:current_end + 1])
            end = current_end

    return result


# Example usage
s = "adefaddaccc"
print(max_substrings(s))

s = "abbaccd"
print(max_substrings(s))

Explanation:

The code implements a greedy algorithm to find the maximum number of non-overlapping substrings satisfying the given conditions.

  1. Find First and Last Occurrences: It first finds the first and last occurrence of each character in the input string s. This information is crucial for determining the boundaries of potential substrings.
  2. Merge Intervals: It then constructs intervals based on these first and last occurrences. It iterates through the intervals and merges any overlapping intervals that satisfy the condition that if a character is contained within the smaller interval, then all occurrences of that character must also be present in the larger interval. This merging process ensures that any substring containing a character also contains all occurrences of that character.
  3. Sort Intervals: The intervals are sorted by their end positions to facilitate the greedy selection of non-overlapping intervals.
  4. Select Non-Overlapping Intervals: The algorithm iterates through the sorted intervals and selects those that do not overlap with previously selected intervals. The end variable keeps track of the end position of the last selected interval, ensuring that only non-overlapping intervals are chosen.

Time Complexity: O(N*M), where N is the length of the input string and M is the number of unique characters.

  • Finding first and last occurrences: O(N)
  • Merging Intervals: O(M^2 * N), where M is the number of unique characters. The nested loops iterate through all pairs of intervals, and the inner loop checks the character occurrences, which can take O(N) in the worst case.
  • Sorting Intervals: O(M log M)
  • Selecting Non-Overlapping Intervals: O(M)

Space Complexity: O(M), where M is the number of unique characters.

  • First and last occurrence dictionaries: O(M)
  • Intervals list: O(M)
  • Result list: O(M) in the worst case

Edge Cases:

  1. Empty string: Returns an empty list.
  2. String with only one character: Returns a list containing the input string.
  3. String with no repeating characters: Returns a list of individual characters.
  4. String where one character encompasses others: Handles cases like "adefaddaccc" where "a" encompasses "d, e, f, c".
  5. Overlapping intervals: Handles cases where intervals might overlap but need to be merged before substring selection.