Taro Logo

Maximum Repeating Substring

Easy
Amazon logo
Amazon
2 views
Topics:
Strings

For a string sequence, a string word is k-repeating if word concatenated k times is a substring of sequence. The word's maximum k-repeating value is the highest value k where word is k-repeating in sequence. If word is not a substring of sequence, word's maximum k-repeating value is 0.

Given strings sequence and word, return the maximum k-repeating value of word in sequence.

Example 1:

Input: sequence = "ababc", word = "ab"
Output: 2
Explanation: "abab" is a substring in "abab".

Example 2:

Input: sequence = "ababc", word = "ba"
Output: 1
Explanation: "ba" is a substring in "ababc". "baba" is not a substring in "ababc".

Example 3:

Input: sequence = "ababc", word = "ac"
Output: 0
Explanation: "ac" is not a substring in "ababc".

How would you approach solving this problem? Consider edge cases and optimize for time and space complexity.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the lengths of the 'sequence' and 'word' strings? Can they be empty or null?
  2. Are 'sequence' and 'word' case-sensitive, or should I perform a case-insensitive comparison?
  3. If the 'word' doesn't appear in 'sequence' at all, what should the return value be?
  4. Is 'word' guaranteed to be non-empty? What should I return if 'word' is an empty string?
  5. Are there any specific constraints on the characters allowed in 'sequence' and 'word' (e.g., only ASCII characters)?

Brute Force Solution

Approach

The brute force approach to finding the maximum repeating substring involves exhaustively checking all possible repetitions of the provided word within the given sequence. We'll examine every possible starting position and repetition count to see how many times the word fits consecutively. We keep track of the highest repetition count we find during this process.

Here's how the algorithm would work step-by-step:

  1. Start by checking if the word appears at the very beginning of the sequence.
  2. Then, see how many times the word can repeat consecutively from that starting point.
  3. Next, move one position forward in the sequence and repeat the process: check if the word appears starting there, and how many times it can repeat.
  4. Continue shifting the starting position one step at a time along the sequence, always checking for repetitions of the word.
  5. At each position, calculate the number of consecutive repetitions of the word.
  6. Compare the current number of repetitions with the highest number found so far.
  7. If the current count is higher, update the highest number of repetitions.
  8. After checking every possible starting position, the highest number of repetitions you've kept track of is your answer.

Code Implementation

def maximum_repeating_substring(sequence, word):
    maximum_repetitions = 0
    sequence_length = len(sequence)
    word_length = len(word)

    for start_index in range(sequence_length):
        # Iterate through each possible starting position in the sequence

        current_repetitions = 0
        current_index = start_index

        while current_index + word_length <= sequence_length:
            # Check if the word matches at the current position

            if sequence[current_index:current_index + word_length] == word:
                current_repetitions += 1
                current_index += word_length

            else:
                break

        # Update maximum_repetitions if the current repetitions are higher
        maximum_repetitions = max(maximum_repetitions, current_repetitions)

    return maximum_repetitions

Big(O) Analysis

Time Complexity
O(n*m)The outer loop iterates up to n - m + 1 times, where n is the length of the sequence and m is the length of the word. Inside the outer loop, we potentially compare the word to a substring of the sequence multiple times to find the maximum number of repetitions. The maximum number of comparisons in the inner loop is limited by n, which gives the length of the sequence. The time complexity approximates to (n-m+1)*k*m which is the same as O(n*m) because k is the number of times the word repeats, and k is bound by n/m. Therefore, we can remove all of the constants, which will give us O(n*m).
Space Complexity
O(1)The algorithm described only uses a few integer variables to keep track of the current position and the maximum repetition count. These variables consume a constant amount of memory regardless of the length of the sequence or the word. Therefore, the auxiliary space used by the algorithm is constant, independent of the input size N (where N could be the length of the sequence). This makes the space complexity O(1).

Optimal Solution

Approach

The key is to repeatedly search for the given word inside the larger text. Each time we find it, we increase our count. The search continues with the assumption that the substring could repeat multiple times consecutively.

Here's how the algorithm would work step-by-step:

  1. Start with a counter set to zero to keep track of how many times the word repeats.
  2. Create a string that initially contains just the repeating word.
  3. Check if the longer text contains this string.
  4. If it does, increase the counter by one and add another copy of the repeating word to the string.
  5. Repeat the check with the new, longer string until the longer text no longer contains the string or the string becomes too long to fit.
  6. The final counter value represents the maximum number of times the word repeats consecutively within the longer text.

Code Implementation

def maximum_repeating_substring(text, word):
    repeat_count = 0
    repeated_word = word

    # Increment the count if repeated_word is a substring of text
    while repeated_word in text:
        repeat_count += 1

        # Build a longer string by appending the word
        repeated_word += word

    return repeat_count

Big(O) Analysis

Time Complexity
O(n*m)The algorithm iteratively checks for the existence of increasingly larger repeating substrings within the given text. Each check involves a substring search, which can take O(n) time in the worst case, where n is the length of the text. The substring we are looking for grows by a factor of m (the length of the word). Since this happens repeatedly until we can't find the substring, the time complexity can approach O(n*m).
Space Complexity
O(M*K)The algorithm creates a string that initially contains the repeating word. In the worst case, the algorithm keeps appending the repeating word to this string until the string's length approaches that of the larger text. Let M be the length of the repeating word and K be the maximum number of times the repeating word is repeated consecutively within the longer text. The space used is therefore proportional to M*K. Hence, the auxiliary space complexity is O(M*K).

Edge Cases

CaseHow to Handle
Empty sequence or wordReturn 0, as no repeating substring exists in an empty string.
Word is null or sequence is nullThrow IllegalArgumentException or return -1 to indicate invalid input, depending on the function's specification.
Word is longer than the sequenceReturn 0 because the word cannot repeat within the sequence.
Sequence contains only the wordReturn 1, as the word appears once in the sequence.
The word is a single character and the sequence consists of the same character repeated many times.Return the length of the sequence since the single character word fully repeats.
The word does not appear in the sequence.Return 0, indicating that the word has a maximum repetition of zero.
Integer overflow if using multiplication in a naive search implementation (large sequence and word sizes).Use a method that avoids integer multiplication, such as a sliding window or KMP algorithm, to prevent overflow.
Sequence and word contain unicode charactersEnsure that the string comparison implementation correctly handles unicode characters by accounting for their multi-byte nature.