Given a string s
, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s
does not have a duplicated substring, the answer is ""
.
Example 1:
Input: s = "banana"
Output: "ana"
Example 2:
Input: s = "abcd"
Output: ""
Constraints:
2 <= s.length <= 3 * 10^4
s
consists of lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach to finding the longest repeated piece of a text is to try every possible piece and see if it appears again. We'll check progressively longer pieces until we've exhausted all the options. This ensures we find the absolute longest repeated piece, if one exists.
Here's how the algorithm would work step-by-step:
def longest_duplicate_substring_brute_force(text):
longest_duplicate = ""
# Iterate through possible substring lengths
for substring_length in range(1, len(text)):
for i in range(len(text) - substring_length + 1):
substring = text[i:i + substring_length]
# Search for the substring elsewhere in the text
for j in range(i + 1, len(text) - substring_length + 1):
if text[j:j + substring_length] == substring:
# Found a duplicate, check if it's the longest so far
if substring_length > len(longest_duplicate):
longest_duplicate = substring
return longest_duplicate
The most efficient way to find the longest repeating chunk of text is to avoid checking every possible chunk individually. Instead, we cleverly use a process that narrows down our search by guessing possible lengths and checking if a repeating chunk of that length exists.
Here's how the algorithm would work step-by-step:
def longest_duplicate_substring(text):
low_length = 1
high_length = len(text) - 1
longest_duplicate = ""
while low_length <= high_length:
middle_length = (low_length + high_length) // 2
duplicate_found = find_duplicate(text, middle_length)
# If we found a duplicate substring of middle_length
if duplicate_found:
longest_duplicate = duplicate_found
low_length = middle_length + 1
# Otherwise, reduce the length and search again
else:
high_length = middle_length - 1
return longest_duplicate
def find_duplicate(text, substring_length):
seen_substrings = set()
for i in range(len(text) - substring_length + 1):
substring = text[i:i + substring_length]
# Use a set to efficiently check for duplicates.
if substring in seen_substrings:
return substring
seen_substrings.add(substring)
return None
#The following function applies a Rabin-Karp algorithm
#with a rolling hash for duplicate detection. However,
#the above solution is within the constraints of the problem.
#Leaving it out to keep the logic and structure as requested
#def longest_duplicate_substring_rk(text):
# string_length = len(text)
# left_bound = 1
# right_bound = string_length
# longest_duplicate_substring = ""
#
# while left_bound <= right_bound:
# substring_length = left_bound + (right_bound - left_bound) // 2
# duplicate_substring = rabin_karp(text, substring_length)
#
# # If duplicate substring exists, increase the substring length.
# if duplicate_substring:
# longest_duplicate_substring = duplicate_substring
# left_bound = substring_length + 1
# # If no duplicate substring, decrease the substring length.
# else:
# right_bound = substring_length - 1
#
# return longest_duplicate_substring
#
#
#def rabin_karp(text, substring_length):
# string_length = len(text)
# base = 256 # Number of possible characters
# prime_modulus = 101 # Prime modulus for hash function
# highest_order = 1
#
# # Precompute base^(substring_length-1) % prime_modulus
# for _ in range(substring_length - 1):
# highest_order = (highest_order * base) % prime_modulus
#
# # Compute initial hash value
# hash_value = 0
# for i in range(substring_length):
# hash_value = (hash_value * base + ord(text[i])) % prime_modulus
#
# substring_hash_map = {hash_value: 0}
#
# # Slide the window through the text
# for i in range(1, string_length - substring_length + 1):
# # Remove leading digit, add trailing digit, recompute hash value
# hash_value = (hash_value - ord(text[i - 1]) * highest_order) % prime_modulus
# hash_value = (hash_value * base + ord(text[i + substring_length - 1])) % prime_modulus
#
# # Because hash_value can be negative
# if hash_value < 0:
# hash_value += prime_modulus
#
# # If hash value is in hash map, we have a duplicate
# if hash_value in substring_hash_map:
# return text[i: i + substring_length]
#
# substring_hash_map[hash_value] = i
#
# return None
Case | How to Handle |
---|---|
Empty string | Return an empty string immediately as there can be no duplicate substrings. |
String of length 1 | Return an empty string since a substring of length greater than 0 cannot be a duplicate. |
String with all the same characters (e.g., 'aaaa') | The algorithm should correctly identify the longest duplicate substring as 'aaa' for input 'aaaa'. |
Maximum string length (consider memory limits and potential integer overflow with rolling hash) | Ensure the rolling hash function and modulo are chosen carefully to avoid collisions and integer overflow, and consider using a 64-bit integer type. |
String with no duplicate substrings | The algorithm should return an empty string when no duplicate substring exists. |
String with overlapping duplicate substrings (e.g., 'abababa') | The binary search should correctly find the longest overlapping substring 'ababa'. |
String containing non-ASCII characters (Unicode) | The character encoding must be handled correctly to accurately calculate hash values for the rolling hash, potentially requiring wider character representation. |
String with very long duplicate substrings close to the string length | Binary search needs to handle the large lengths correctly without exceeding recursion limits or timing out. |