Taro Logo

Last Substring in Lexicographical Order

Hard
Amazon logo
Amazon
1 view
Topics:
Strings

Given a string s, return the last substring of s in lexicographical order.

For example:

Example 1:

Input: s = "abab"
Output: "bab"
Explanation: The substrings are ["a", "ab", "aba", "abab", "b", "ba", "bab"]. The lexicographically maximum substring is "bab".

Example 2:

Input: s = "leetcode"
Output: "tcode"

Constraints:

  • 1 <= s.length <= 4 * 10^5
  • s contains only lowercase English letters.

Can you provide an algorithm with optimal time and space complexity to solve this problem? Explain your solution and the time/space complexity analysis.

Solution


Brute Force Approach

A naive approach involves generating all possible substrings of the given string s and then comparing them lexicographically to find the last (largest) substring.

Algorithm:

  1. Generate all substrings of s.
  2. Initialize max_substring to an empty string.
  3. Iterate through the list of substrings:
    • If the current substring is lexicographically greater than max_substring, update max_substring.
  4. Return max_substring.

Code (Python):

def last_substring_brute_force(s):
    max_substring = ""
    for i in range(len(s)):
        for j in range(i + 1, len(s) + 1):
            substring = s[i:j]
            if substring > max_substring:
                max_substring = substring
    return max_substring

Time Complexity: O(n^3), where n is the length of the string s. Generating all substrings takes O(n^2) time, and comparing each substring takes O(n) in the worst case.

Space Complexity: O(n^2), because, in the worst-case scenario, storing all the substrings would require O(n^2) space.

Optimal Approach (Suffix Comparison)

Instead of generating all possible substrings, we can compare suffixes of the string directly. This is more efficient.

Algorithm:

  1. Initialize max_index to 0.
  2. Iterate through the string s from index 1 to n-1:
    • Compare the suffix starting at max_index with the suffix starting at the current index i.
    • If the suffix at i is greater, update max_index to i.
  3. Return the suffix of s starting at max_index.

Code (Python):

def last_substring_optimal(s):
    max_index = 0
    for i in range(1, len(s)):
        if s[i:] > s[max_index:]:
            max_index = i
    return s[max_index:]

Time Complexity: O(n^2) in the worst case. The loop iterates n times, and the string comparison s[i:] > s[max_index:] can take up to O(n) time in the worst case (e.g., when the suffixes are very similar). However, on average, this is more efficient than the brute force because not all substrings are generated and compared. There is a linear solution as well, which is described in the next approach.

Space Complexity: O(1). We only use a constant amount of extra space.

Linear Time Solution (Suffix Array concept)

This method employs a two-pointer technique to determine the lexicographically largest suffix. It's inspired by the construction of suffix arrays but doesn't build the entire array.

Algorithm:

  1. Initialize two pointers, i and j, to 0 and 1 respectively.
  2. While j is within the bounds of the string:
    • Compare s[i + offset] with s[j + offset] where offset starts from 0.
    • If i + offset or j + offset reaches the end of the string, or s[i + offset] != s[j + offset]:
      • If s[i + offset] > s[j + offset] or j + offset reached end of string, advance j to j + offset + 1.
      • Else, set i to j, advance j to i + 1.
      • Reset offset to 0.
    • Else increment offset.
  3. Return the substring starting from i.

Code (Python):

def last_substring_linear(s):
    i, j, offset = 0, 1, 0
    n = len(s)
    while j + offset < n:
        if s[i + offset] == s[j + offset]:
            offset += 1
        else:
            if s[i + offset] > s[j + offset]:
                j += offset + 1
            else:
                i = j
                j = i + 1
            offset = 0
        if j >= n:
          break
    return s[i:]

Time Complexity: O(n). Although there is a nested while structure implicitly through the offset, in effect, each character is visited at most a constant number of times.

Space Complexity: O(1). We use only constant extra space.

Edge Cases

  • Empty String: The problem statement specifies that the string length is at least 1, so an empty string input is not possible.
  • String with one character: The algorithm should correctly return the single-character string.
  • String with all the same characters: e.g., "aaaa". The algorithm should return the entire string.
  • Lexicographically increasing or decreasing string: e.g., "abcde" or "edcba". The algorithm must handle these cases correctly.