Taro Logo

Smallest Subsequence of Distinct Characters

Medium
Google logo
Google
0 views
Topics:
StringsStacksGreedy Algorithms

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Constraints:

  • 1 <= s.length <= 1000
  • s consists of lowercase English letters.

Solution


Naive Approach

A brute force approach would involve generating all possible subsequences of the input string s, filtering out those that do not contain all distinct characters exactly once, and then finding the lexicographically smallest among the remaining subsequences. This approach is highly inefficient because the number of subsequences grows exponentially with the length of the string.

Complexity Analysis

  • Time Complexity: O(2^n * n), where n is the length of the string. Generating all subsequences takes O(2^n), and for each subsequence, we need to check if it contains all distinct characters exactly once, which takes O(n).
  • Space Complexity: O(n), where n is the length of the string, to store the subsequences.

Optimal Approach: Using Stack and Greedy Algorithm

A more efficient approach involves using a stack and a greedy algorithm. The idea is to iterate through the string and maintain a stack of characters that form the lexicographically smallest subsequence seen so far. For each character, we check if it's already in the stack. If not, we compare it with the top of the stack. If the current character is smaller than the top of the stack and the top of the stack appears later in the string, we pop the top of the stack and push the current character. We repeat this process until the stack is empty or the current character is not smaller than the top of the stack or the top of the stack does not appear later in the string.

Algorithm

  1. Create a last_occurrence map to store the last index of each character in the string.
  2. Create a seen set to keep track of characters currently in the stack.
  3. Initialize an empty stack stack.
  4. Iterate through the string s with index i and character char:
    • If char is in seen, skip it.
    • While stack is not empty, the top of the stack is greater than char, and last_occurrence[stack[-1]] > i:
      • Remove the top of the stack from seen.
      • Pop the top of the stack.
    • Add char to seen.
    • Push char onto the stack.
  5. Return the string formed by the characters in the stack.

Code

def smallestSubsequence(s: str) -> str:
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    stack = []
    seen = set()

    for i, char in enumerate(s):
        if char in seen:
            continue

        while stack and stack[-1] > char and last_occurrence[stack[-1]] > i:
            removed_char = stack.pop()
            seen.remove(removed_char)

        stack.append(char)
        seen.add(char)

    return "".join(stack)

Example

Let's trace the execution with the input s = "cbacdcbc"

  1. last_occurrence = {'c': 7, 'b': 6, 'a': 2, 'd': 4}
  2. stack = [], seen = set()
  3. i = 0, char = 'c'. Stack is empty, so stack.append('c'), seen = {'c'}
  4. i = 1, char = 'b'. 'c' > 'b' and last_occurrence['c'] > 1, so pop 'c'. stack = [], seen = {}. stack.append('b'), seen = {'b'}
  5. i = 2, char = 'a'. 'b' > 'a' and last_occurrence['b'] > 2, so pop 'b'. stack = [], seen = {}. stack.append('a'), seen = {'a'}
  6. i = 3, char = 'c'. Stack is not empty but 'a' < 'c'. stack.append('c'), seen = {'a', 'c'}
  7. i = 4, char = 'd'. Stack is not empty but 'c' < 'd'. stack.append('d'), seen = {'a', 'c', 'd'}
  8. i = 5, char = 'c'. char in seen, skip.
  9. i = 6, char = 'b'. 'd' > 'b' and last_occurrence['d'] > 6, so pop 'd'. stack = ['a', 'c'], seen = {'a', 'c'}. 'c' > 'b' and last_occurrence['c'] > 6, so pop 'c'. stack = ['a'], seen = {'a'}. stack.append('b'), seen = {'a', 'b'}
  10. i = 7, char = 'c'. char in seen, skip.
  11. Return "".join(['a', 'b', 'c', 'd']) = "acdb"

Complexity Analysis

  • Time Complexity: O(n), where n is the length of the string. We iterate through the string once. The while loop inside the for loop runs at most n times in total because each character can be pushed and popped from the stack at most once.
  • Space Complexity: O(n), where n is the length of the string, to store the last_occurrence map, seen set, and the stack.

Edge Cases

  • Empty string: The algorithm will return an empty string.
  • String with only one distinct character: The algorithm will return the string with that character.
  • String with duplicate characters: The algorithm will correctly remove duplicate characters and return the lexicographically smallest subsequence.