You are given a string word.
Return the maximum number of non-intersecting substrings of word that are at least four characters long and start and end with the same letter.
Example 1:
Input: word = "abcdeafdef"
Output: 2
Explanation:
The two substrings are "abcdea" and "fdef".
Example 2:
Input: word = "bcdaaaab"
Output: 1
Explanation:
The only substring is "aaaa". Note that we cannot also choose "bcdaaaab" since it intersects with the other substring.
Constraints:
1 <= word.length <= 2 * 105word consists only of lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The basic idea is to be extremely thorough by checking every single possibility. We will first identify all the valid 'special' pieces of text, and then try every conceivable combination of these pieces to find the group that contains the most pieces without any of them overlapping.
Here's how the algorithm would work step-by-step:
def find_maximum_number_of_non_intersecting_substrings(input_string):
string_length = len(input_string)
all_substrings_with_indices = []
for start_index in range(string_length):
for end_index in range(start_index, string_length):
substring = input_string[start_index : end_index + 1]
all_substrings_with_indices.append((substring, start_index, end_index))
# Pre-calculating character bounds is crucial for efficiently checking the 'valid segment' rule.
character_first_occurrence = {}
character_last_occurrence = {}
for index, character in enumerate(input_string):
if character not in character_first_occurrence:
character_first_occurrence[character] = index
character_last_occurrence[character] = index
valid_segments = []
for substring, start_index, end_index in all_substrings_with_indices:
is_segment_valid = True
unique_chars_in_substring = set(substring)
for character in unique_chars_in_substring:
first_pos = character_first_occurrence[character]
last_pos = character_last_occurrence[character]
if not (first_pos >= start_index and last_pos <= end_index):
is_segment_valid = False
break
if is_segment_valid:
valid_segments.append((substring, start_index, end_index))
# To exhaustively check all groupings, we generate the powerset of all valid segments.
all_possible_combinations = []
number_of_valid_segments = len(valid_segments)
for combination_mask in range(1 << number_of_valid_segments):
current_combination = []
for item_index in range(number_of_valid_segments):
if (combination_mask >> item_index) & 1:
current_combination.append(valid_segments[item_index])
all_possible_combinations.append(current_combination)
best_combination_result = []
max_substring_count = 0
min_total_length_for_max_count = float('inf')
for combination in all_possible_combinations:
if not combination:
continue
# Any valid solution requires non-intersecting substrings, so we filter out all overlapping sets.
is_combination_overlapping = False
sorted_combination = sorted(combination, key=lambda segment: segment[1])
for segment_index in range(len(sorted_combination) - 1):
end_of_first_segment = sorted_combination[segment_index][2]
start_of_second_segment = sorted_combination[segment_index+1][1]
if end_of_first_segment >= start_of_second_segment:
is_combination_overlapping = True
break
if not is_combination_overlapping:
current_substring_count = len(combination)
# If two combinations have the same number of substrings, the tie is broken by total length.
if current_substring_count > max_substring_count:
max_substring_count = current_substring_count
current_total_length = sum(len(segment[0]) for segment in combination)
min_total_length_for_max_count = current_total_length
best_combination_result = [segment[0] for segment in combination]
elif current_substring_count == max_substring_count:
current_total_length = sum(len(segment[0]) for segment in combination)
if current_total_length < min_total_length_for_max_count:
min_total_length_for_max_count = current_total_length
best_combination_result = [segment[0] for segment in combination]
return best_combination_resultThe optimal strategy involves two main phases. First, we identify all possible 'self-contained' substrings, which are pieces of the string where every character within them appears only in that piece. Then, we cleverly select from these candidates by greedily picking the ones that finish the earliest, which ensures we can fit the maximum number of non-overlapping pieces.
Here's how the algorithm would work step-by-step:
def maxNumOfSubstrings(input_string: str) -> list[str]:
first_occurrence_map = {}
last_occurrence_map = {}
for index, character in enumerate(input_string):
if character not in first_occurrence_map:
first_occurrence_map[character] = index
last_occurrence_map[character] = index
# Pre-calculate the first and last appearance of each character to define substring boundaries.
candidate_intervals = []
string_length = len(input_string)
for start_index in range(string_length):
current_end_index = last_occurrence_map[input_string[start_index]]
is_valid_candidate = True
scan_index = start_index
while scan_index <= current_end_index:
if first_occurrence_map[input_string[scan_index]] < start_index:
is_valid_candidate = False
break
current_end_index = max(current_end_index, last_occurrence_map[input_string[scan_index]])
scan_index += 1
if is_valid_candidate:
# The loop verified this is a minimal 'self-contained' substring starting at this index.
candidate_intervals.append((start_index, current_end_index))
# Sorting by end position is the core of the greedy strategy to maximize remaining space.
candidate_intervals.sort(key=lambda interval: (interval[1], interval[1] - interval[0]))
result_substrings = []
last_chosen_end_index = -1
for start_of_interval, end_of_interval in candidate_intervals:
# Greedily select the earliest-ending interval that doesn't conflict with our last choice.
if start_of_interval > last_chosen_end_index:
result_substrings.append(input_string[start_of_interval : end_of_interval + 1])
last_chosen_end_index = end_of_interval
return result_substrings| Case | How to Handle |
|---|---|
| Input string is shorter than 4 characters | The solution returns 0 as no substring can satisfy the minimum length requirement of four. |
| Input string with all unique characters | The algorithm correctly returns 0 because no substring can be formed that starts and ends with the same character. |
| Input string consists of a single character repeated many times, e.g., 'aaaaaaaa' | The solution correctly identifies the maximum number of non-overlapping blocks of four, calculating a result of floor(length / 4). |
| Input containing nested valid substrings, such as 'abccba' where 'bccb' is inside 'abccba' | The dynamic programming state correctly ensures that intersecting substrings are not counted together, resulting in an output of 1. |
| Input with character repetitions too close to form a valid substring, e.g., 'abacaba' | The algorithm's length check correctly filters out these invalid short substrings, leading to a count of 0 if no valid ones exist. |
| Maximum length input of 2 * 10^5 characters | The linear time O(N) dynamic programming solution scales efficiently and avoids a timeout on large inputs. |
| Input where smaller substrings yield a better result than one large one, e.g., 'aaaabaaa' | The DP correctly maximizes the count by choosing two non-intersecting 'aaaa' substrings over the single longer intersecting one. |
| Input where no valid substrings can be formed despite meeting length and repetition criteria, e.g., 'aaabbb' | The solution correctly returns 0 as no pair of identical characters are at least 3 indices apart. |