Smallest Subsequence of Distinct Characters

Medium
7 days ago

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1: Input: s = "bcabc" Output: "abc"

Example 2: Input: s = "cbacdcbc" Output: "acdb"

Constraints: 1 <= s.length <= 1000 s consists of lowercase English letters.

Note: This question is the same as 316: https://leetcode.com/problems/remove-duplicate-letters/

Sample Answer
def smallestSubsequence(s: str) -> str:
    """Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once."""
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    stack = []
    visited = set()

    for i, char in enumerate(s):
        if char not in visited:
            while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
                visited.remove(stack[-1])
                stack.pop()

            stack.append(char)
            visited.add(char)

    return "".join(stack)

# Example usage
s = "bcabc"
result = smallestSubsequence(s)
print(f"The lexicographically smallest subsequence of '{s}' is: {result}")

s = "cbacdcbc"
result = smallestSubsequence(s)
print(f"The lexicographically smallest subsequence of '{s}' is: {result}")

Explanation:

The function smallestSubsequence(s) finds the lexicographically smallest subsequence of a string s that contains all distinct characters exactly once. Here's a breakdown of how it works:

  1. Last Occurrence Dictionary:
  • The last_occurrence dictionary stores the index of the last occurrence of each character in the input string s. This is crucial for making informed decisions about whether to potentially remove a character from the stack later.
  1. Stack Initialization:
  • stack: A list used as a stack to build the subsequence.
  • visited: A set to keep track of characters currently in the stack.
  1. Iterating Through the String:
  • The code iterates through the input string s using a for loop, with i as the index and char as the character at that index.
  1. Checking for Visited Characters:
  • if char not in visited:: This condition ensures that each distinct character is added to the subsequence only once.
  1. Maintaining Lexicographical Order:
  • The while loop is the core of the algorithm. It checks three conditions:
    • stack: Ensures the stack is not empty.
    • char < stack[-1]: Checks if the current character char is lexicographically smaller than the last character in the stack.
    • i < last_occurrence[stack[-1]]: Checks if the last character in the stack appears again later in the string. If it does, it's safe to potentially remove it to see if a better (lexicographically smaller) subsequence can be formed.
  • If all three conditions are met, the last character from the stack is removed (popped), and it's also removed from the visited set because it's no longer in the current subsequence.
  1. Adding Characters to the Stack:
  • Once the while loop finishes (meaning no more characters need to be removed from the stack to maintain the lexicographical order), the current character char is added to the stack and marked as visited.
  1. Returning the Result:
  • Finally, the characters in the stack are joined into a string using "".join(stack), and this string (the lexicographically smallest subsequence) is returned.

Example:

Let's trace the execution with s = "bcabc"

  1. last_occurrence = {'b': 3, 'c': 4, 'a': 2}
  2. stack = [], visited = set()
  3. i = 0, char = 'b': char not in visited is true. Stack is empty, so 'b' is added. stack = ['b'], visited = {'b'}
  4. i = 1, char = 'c': char not in visited is true. char < stack[-1] ('c' < 'b') is false. 'c' is added. stack = ['b', 'c'], visited = {'b', 'c'}
  5. i = 2, char = 'a': char not in visited is true. char < stack[-1] ('a' < 'c') is true. i < last_occurrence[stack[-1]] (2 < 4) is true. 'c' is removed. stack = ['b'], visited = {'b'}. char < stack[-1] ('a' < 'b') is true. i < last_occurrence[stack[-1]] (2 < 3) is true. 'b' is removed. stack = [], visited = set(). 'a' is added. stack = ['a'], visited = {'a'}
  6. i = 3, char = 'b': char not in visited is true. Stack is empty, so 'b' is added. stack = ['a', 'b'], visited = {'a', 'b'}
  7. i = 4, char = 'c': char not in visited is true. char < stack[-1] ('c' < 'b') is false. 'c' is added. stack = ['a', 'b', 'c'], visited = {'a', 'b', 'c'}
  8. The loop finishes.
  9. The function returns "abc"

Time and Space Complexity:

Time Complexity

The time complexity is O(N), where N is the length of the string s. This is because the algorithm iterates through the string once to populate the last_occurrence dictionary and then iterates through the string again to build the subsequence using the stack. The while loop inside the second iteration might seem like it could increase the time complexity, but in reality, each character is added to and removed from the stack at most once, so the while loop operations are still bounded by O(N).

Space Complexity

The space complexity is O(K), where K is the number of distinct characters in the string s. In the worst case, K can be equal to N (if all characters are distinct), but in many cases, K will be smaller than N. The last_occurrence dictionary stores the last index of each distinct character, and the stack and visited set store at most K characters. Since K is bounded by the number of distinct characters (which in the case of lowercase English letters is at most 26), the space complexity can also be considered O(1) in terms of the size of the input string, as it does not scale linearly with the input size.