Taro Logo

Find Longest Special Substring That Occurs Thrice II

Medium
Rubrik logo
Rubrik
5 views
Topics:
ArraysTwo PointersSliding Windows

You are given a string s that consists of lowercase English letters.

A string is called special if it is made up of only a single character. For example, the string "abc" is not special, whereas the strings "ddd", "zz", and "f" are special.

Return the length of the longest special substring of s which occurs at least thrice, or -1 if no special substring occurs at least thrice.

A substring is a contiguous non-empty sequence of characters within a string.

Example 1:

Input: s = "aaaa"
Output: 2
Explanation: The longest special substring which occurs thrice is "aa": substrings "aaaa", "aaaa", and "aaaa".
It can be shown that the maximum length achievable is 2.

Example 2:

Input: s = "abcdef"
Output: -1
Explanation: There exists no special substring which occurs at least thrice. Hence return -1.

Example 3:

Input: s = "abcaba"
Output: 1
Explanation: The longest special substring which occurs thrice is "a": substrings "abcaba", "abcaba", and "abcaba".
It can be shown that the maximum length achievable is 1.

Constraints:

  • 3 <= s.length <= 5 * 105
  • s consists of only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string, and what characters can it contain?
  2. What should I return if no special substring occurs at least three times?
  3. Is overlapping allowed when counting the occurrences of the substring?
  4. Is the comparison case-sensitive, or should it be case-insensitive?
  5. If multiple special substrings of the same maximum length exist, which one should I return?

Brute Force Solution

Approach

The goal is to find the longest substring that appears at least three times within a larger string. The brute force approach checks every possible substring to see if it meets the criteria. It's like checking every grain of sand on a beach to see if it is gold.

Here's how the algorithm would work step-by-step:

  1. Consider every possible substring of the given string. Start with substrings of length one, then two, then three, and so on, until you reach the length of the entire string.
  2. For each substring, count how many times it appears in the original string. To do this, slide the substring across the original string, checking for a match at each position.
  3. If a substring appears at least three times, keep track of its length.
  4. After checking every possible substring, find the longest one that appeared at least three times.
  5. If no substring appears at least three times, then there's no answer.

Code Implementation

def find_longest_special_substring_thrice_brute_force(given_string):
    longest_substring_length = 0
    string_length = len(given_string)

    for substring_length in range(1, string_length + 1):
        for i in range(string_length - substring_length + 1):
            substring = given_string[i:i + substring_length]
            occurrence_count = 0

            # Count occurrences of substring
            for j in range(string_length - substring_length + 1):
                if given_string[j:j + substring_length] == substring:
                    occurrence_count += 1

            # Check for at least three occurrences
            if occurrence_count >= 3:

                #Update the length of the longest string
                longest_substring_length = max(longest_substring_length, substring_length)

    if longest_substring_length > 0:
        return longest_substring_length
    else:
        return -1

Big(O) Analysis

Time Complexity
O(n^3)The algorithm considers all possible substrings. There are approximately n^2 such substrings (n possible start positions and for each, on average, n/2 possible lengths). For each substring, we search the main string to count its occurrences. Searching for a substring of length 'k' within a string of length 'n' takes O(n*k) time in the worst case. Since 'k' can be up to 'n', in the worst case each substring occurrence count takes O(n^2) time. Thus, overall, the complexity becomes O(n^2) (for all substrings) * O(n) (for calculating the occurrences, considering average substring length is n/2), which simplifies to O(n^3).
Space Complexity
O(1)The provided solution iterates through substrings and counts their occurrences. The primary space usage comes from storing a count of how many times a particular substring appears and potentially the length of the longest substring found so far. These variables take constant space regardless of the input string's size N. Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The core idea is to avoid checking every possible substring. Instead, we strategically search for and count valid substrings, focusing on those most likely to be the longest, using a sliding window approach and clever substring counting.

Here's how the algorithm would work step-by-step:

  1. Start by considering potential substring lengths, from longest possible to shortest. This is like guessing the answer and checking if it's right.
  2. For each length we're considering, slide a window of that size across the entire string.
  3. In each window, check if the substring within that window appears at least three times in the entire string, allowing overlaps.
  4. To efficiently check for occurrences, maintain a way to quickly compare the current window's substring with all other substrings of the same length in the string. Think of pre-computing and storing information to make comparisons fast.
  5. If a substring of a particular length appears three or more times, we have a potential solution. Immediately return this length since we are checking lengths from longest to shortest, we find the longest one this way.
  6. If no substring length yields at least three occurrences after checking all lengths, then there is no solution and return that no solution exists.

Code Implementation

def find_longest_special_substring(input_string):
    string_length = len(input_string)

    # Iterate through potential lengths, longest to shortest
    for substring_length in range(string_length, 0, -1):
        for i in range(string_length - substring_length + 1):
            substring = input_string[i:i + substring_length]
            occurrence_count = 0

            # Count occurrences of the substring, allowing overlaps
            for j in range(string_length - substring_length + 1):
                if input_string[j:j + substring_length] == substring:
                    occurrence_count += 1

            # If substring appears at least thrice, return its length
            if occurrence_count >= 3:
                return substring_length

    # No substring meets the criteria
    return -1

Big(O) Analysis

Time Complexity
O(n^2)The algorithm iterates through potential substring lengths in descending order, with the outer loop taking at most n iterations. For each substring length, a sliding window of that length is used to traverse the string. Within each window, the algorithm searches for occurrences of the current substring, which, in the worst-case, could involve comparing the current substring against all other substrings of the same length, taking O(n) time. Thus, in the worst case, for each length, we potentially scan the whole string, so the total time is roughly O(n*n), simplified to O(n^2).
Space Complexity
O(N)The algorithm pre-computes and stores information to efficiently compare substrings, implying the use of an auxiliary data structure such as a hash map or an array. The size of this data structure is directly proportional to the number of possible substrings, which can be up to N, where N is the length of the input string. Thus, the auxiliary space required grows linearly with the input size, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Empty stringReturn an empty string since no substring can be found.
String with length less than the minimum required length of the substring occurring thriceReturn an empty string as it's impossible to find a substring occurring thrice with insufficient length.
String containing only one distinct character (e.g., 'aaaa')Handle this by carefully checking substring lengths and counts to avoid overcounting substrings that overlap.
String where no substring occurs three times.The algorithm should correctly identify this and return an empty string.
Very long string, potential for memory issues or performance bottlenecks.Use an efficient data structure like a sliding window with a frequency map to avoid generating and storing all possible substrings; also be mindful of potential memory overflows.
Overlapping occurrences of the same substringUse a non-overlapping approach by keeping track of where occurrences begin and end, such that if we find a match we jump forward to ensure new search does not contain same characters.
String contains special characters or Unicode charactersEnsure the substring comparison and frequency counting methods correctly handle these characters without errors.
Integer overflow when calculating substring hash values (if a hashing approach is used)Use appropriate modulo operations and/or larger data types (like long) to prevent integer overflow during hash calculations.