Smallest Subsequence of Distinct Characters

Medium
7 days ago

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

For example:

  • If s = "bcabc", the output should be "abc"
  • If s = "cbacdcbc", the output should be "acdb"

Constraints:

  • 1 <= s.length <= 1000
  • s consists of lowercase English letters.
Sample Answer
def smallestSubsequence(s: str) -> str:
    """Finds the lexicographically smallest subsequence of s containing all distinct characters exactly once."""

    # Step 1: Count the last occurrence of each character.
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    # Step 2: Use a stack to maintain the subsequence.
    stack = []
    seen = set()

    for i, char in enumerate(s):
        # If the character is already in the stack, skip it.
        if char in seen:
            continue

        # Pop any larger characters from the stack that appear later in the string.
        while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
            removed_char = stack.pop()
            seen.remove(removed_char)

        # Add the current character to the stack and the seen set.
        stack.append(char)
        seen.add(char)

    # Step 3: Return the subsequence.
    return ''.join(stack)

# Example Usage
s = "bcabc"
result = smallestSubsequence(s)
print(f"Smallest subsequence of '{s}': {result}")

s = "cbacdcbc"
result = smallestSubsequence(s)
print(f"Smallest subsequence of '{s}': {result}")

Explanation:

The function smallestSubsequence(s) efficiently identifies and returns the lexicographically smallest subsequence of a given string s, ensuring that each distinct character appears exactly once. The approach combines a greedy algorithm with a stack data structure.

  1. Last Occurrence Mapping: The algorithm begins by mapping each character in the string to its last index using a dictionary called last_occurrence. This preprocessing step is crucial for making informed decisions later on.
  2. Stack-Based Subsequence Construction: It employs a stack stack to build the subsequence and a set seen to track characters currently in the stack. By iterating through the input string s, it checks if the current character char is already in seen. If so, it skips to the next character.
  3. Greedy Optimization: If char is not in seen, the algorithm greedily optimizes the subsequence by popping any characters from the stack that are larger than char and have later occurrences in s. This ensures the lexicographically smallest order.
  4. Adding Characters: The current character char is then added to the stack and the seen set.
  5. Final Subsequence: Finally, the stack is converted into a string, which is the desired smallest subsequence.

Example Walkthrough:

Let's walk through the example where s = "cbacdcbc":

  1. Initialization: last_occurrence becomes {'c': 7, 'b': 6, 'a': 2, 'd': 4}. The stack stack and set seen are initialized empty.
  2. First 'c' (index 0): 'c' is added to stack and seen. Stack: ['c'], Seen: {'c'}
  3. 'b' (index 1): 'b' < 'c' and 'c' appears later in the string. 'c' is popped. 'b' is added. Stack: ['b'], Seen: {'b'}
  4. 'a' (index 2): 'a' < 'b' and 'b' appears later. 'b' is popped. 'a' is added. Stack: ['a'], Seen: {'a'}
  5. First 'c' (index 3): 'c' > 'a'. 'c' is added. Stack: ['a', 'c'], Seen: {'a', 'c'}
  6. 'd' (index 4): 'd' > 'c'. 'd' is added. Stack: ['a', 'c', 'd'], Seen: {'a', 'c', 'd'}
  7. Second 'c' (index 5): 'c' is already in seen. Skipped.
  8. Second 'b' (index 6): 'b' < 'd' and 'd' appears later. 'd' is popped. 'b' < 'c' and 'c' appears later. 'c' is popped. 'b' is added. Stack: ['a', 'b'], Seen: {'a', 'b'}
  9. Third 'c' (index 7): 'c' > 'b'. 'c' is added. Stack: ['a', 'b', 'c'], Seen: {'a', 'b', 'c'}

The final result is 'acdb'.

Time Complexity

  • O(N) where N is the length of the string s. The algorithm iterates through the string a constant number of times (twice): once to construct the last occurrence map and once to construct the stack. The stack operations (push and pop) take O(1) time on average.

Space Complexity

  • O(K) where K is the number of distinct characters in the string s. In the worst-case scenario, the last_occurrence dictionary and the stack will store all distinct characters of the string. For lowercase English letters, K is at most 26, making it O(1) in the context of the problem constraints.