Taro Logo

Longest Repeating Substring

Medium
Coupang logo
Coupang
3 views
Topics:
StringsArrays

Given a string s, find the length of the longest repeating substring.

If there is no repeating substring, return 0.

Example 1:

Input: s = "banana"
Output: 3
Explanation: The longest repeating substrings are "ana" and "nan", which appear in the positions [1,3] and [2,4] respectively.

Example 2:

Input: s = "ababc"
Output: 2
Explanation: The longest repeating substring is "ab", which appears in the positions [0,2].

Example 3:

Input: s = "aa"
Output: 1

Constraints:

  • 1 <= s.length <= 2000
  • s consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected behavior if the input string is null or empty?
  2. Can the input string contain special characters, or is it limited to alphanumeric characters?
  3. If there are multiple repeating substrings of the same maximum length, which one should I return?
  4. What is the maximum possible length of the input string?
  5. By 'repeating substring', do you mean the substring must appear at least twice and those two appearances must not overlap?

Brute Force Solution

Approach

The brute force method for finding the longest repeating part of a word is like comparing every possible piece of the word with every other possible piece. We check all possible starting points and lengths to see if there's a repeat.

Here's how the algorithm would work step-by-step:

  1. Consider every possible piece of the word, starting with a piece of length one.
  2. For each piece, check if it appears again later in the word.
  3. If it does, remember its length.
  4. Now, consider pieces of length two, then three, and so on, checking for repeats each time.
  5. Each time you find a repeating piece, compare its length to the length of the longest repeating piece found so far.
  6. If the new repeating piece is longer, remember it.
  7. After checking all possible pieces and lengths, the longest repeating piece you remembered is the answer.

Code Implementation

def longest_repeating_substring_brute_force(input_string):
    longest_repeating_substring = ""
    string_length = len(input_string)

    for substring_length in range(1, string_length):
        for first_substring_start in range(string_length - substring_length + 1):
            first_substring = input_string[first_substring_start:first_substring_start + substring_length]

            # Start searching for repeats after the end of the first substring.
            for second_substring_start in range(first_substring_start + 1, string_length - substring_length + 1):
                second_substring = input_string[second_substring_start:second_substring_start + substring_length]

                # Check if the substrings match.
                if first_substring == second_substring:

                    # Update longest substring if current is longer.
                    if substring_length > len(longest_repeating_substring):
                        longest_repeating_substring = first_substring

    return longest_repeating_substring

Big(O) Analysis

Time Complexity
O(n^3)The outermost loop iterates through all possible substring lengths, from 1 up to n. The next nested loop iterates through all possible starting positions for a substring of a given length. The innermost operation compares the substring starting at the current position with all other possible substrings, which takes O(n) time in the worst case for each comparison. Therefore, we have nested loops resulting in n * n * n comparisons in the worst-case. This leads to approximately n^3 operations, which simplifies to O(n^3).
Space Complexity
O(1)The brute force approach, as described, primarily uses a few variables to keep track of the starting positions and lengths of substrings being compared. These variables (e.g., for loop indices, substring length) occupy a constant amount of space, irrespective of the input string's length (N). No auxiliary data structures that scale with the input size are employed. Therefore, the auxiliary space complexity remains constant.

Optimal Solution

Approach

The key is to avoid unnecessary comparisons by intelligently searching for matching substrings. We'll use a suffix tree or suffix array to efficiently organize all possible suffixes of the string. This allows us to quickly find the longest common prefix among the suffixes, which corresponds to the longest repeating substring.

Here's how the algorithm would work step-by-step:

  1. First, create a structured way to represent all possible endings of the string.
  2. Then, organize all these endings in a sorted manner.
  3. Next, compare adjacent endings to find the longest shared beginning sequence.
  4. The longest shared beginning sequence you found is your answer: the longest repeating substring.

Code Implementation

def longest_repeating_substring(input_string):
    string_length = len(input_string)
    suffixes = []
    for i in range(string_length):
        suffixes.append(input_string[i:])

    suffixes.sort()

    longest_substring = ""

    # Comparing adjacent suffixes to find the longest common prefix.
    for i in range(len(suffixes) - 1):
        current_substring = longest_common_prefix(suffixes[i], suffixes[i+1])
        if len(current_substring) > len(longest_substring):
            longest_substring = current_substring

    return longest_substring

def longest_common_prefix(string_one, string_two):
    prefix = ""
    minimum_length = min(len(string_one), len(string_two))
    # Finding common characters between two strings.
    for i in range(minimum_length):
        if string_one[i] == string_two[i]:
            prefix += string_one[i]
        else:
            break
    return prefix

# Creating test cases
input_string = "banana"

# Find the longest repeating substring
result = longest_repeating_substring(input_string)

Big(O) Analysis

Time Complexity
O(n log n)Constructing a suffix array typically takes O(n log n) time using algorithms like the prefix-doubling method or the DC3 algorithm, where n is the length of the input string. After the suffix array is built, comparing adjacent suffixes to find the longest common prefix involves iterating through the sorted suffix array, which takes O(n) time in the worst case, as we compare each adjacent pair of suffixes. The longest common prefix (LCP) computation for each adjacent pair can also be done efficiently in O(1) amortized time using LCP array construction techniques, also typically achieved during the suffix array construction. Thus, the dominant factor is the suffix array construction itself, leading to an overall time complexity of O(n log n).
Space Complexity
O(N)The suffix tree or suffix array, which is created to represent all possible endings of the string, requires auxiliary space proportional to the input string's length, N. Specifically, the suffix tree can, in the worst case, have a size proportional to the input string's length because each suffix needs to be stored. Therefore the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty string inputReturn an empty string since no substring can exist.
String with only one characterReturn an empty string since a repeating substring requires at least two characters.
String with all identical characters (e.g., 'aaaaa')The longest repeating substring will be the entire string minus one character.
Very long string inputEnsure the algorithm's time and space complexity is efficient enough to handle large inputs, considering memory constraints.
String with no repeating substring (e.g., 'abcdefgh')Return an empty string when no repeating substring is found.
Overlapping repeating substrings (e.g., 'abababa')Ensure the algorithm correctly identifies the longest, potentially overlapping, substring.
Case sensitivity (e.g., 'aB' vs 'ab')Clarify if the comparison is case-sensitive or case-insensitive and handle accordingly, potentially using toLowerCase() or toUpperCase().
String contains special characters or Unicode charactersEnsure the algorithm handles special characters and Unicode characters correctly, considering character encoding.