Taro Logo

Smallest Subsequence of Distinct Characters

Medium
3 views
2 months ago

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1:

Input: s = "bcabc" Output: "abc"

Example 2:

Input: s = "cbacdcbc" Output: "acdb"

Constraints:

  • 1 <= s.length <= 1000
  • s consists of lowercase English letters.
Sample Answer
class Solution:
    def smallestSubsequence(self, s: str) -> str:
        last_occurrence = {}
        for i, char in enumerate(s):
            last_occurrence[char] = i
        
        stack = []
        visited = set()
        
        for i, char in enumerate(s):
            if char not in visited:
                while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
                    visited.remove(stack[-1])
                    stack.pop()
                
                stack.append(char)
                visited.add(char)
        
        return ''.join(stack)

Explanation:

The problem asks us to find the lexicographically smallest subsequence of a given string s that contains all distinct characters of s exactly once.

1. Naive Approach (Brute Force):

  • Generate all possible subsequences of the string.
  • Filter subsequences to retain only those that contain all distinct characters exactly once.
  • Find the lexicographically smallest among the filtered subsequences.
  • This approach has exponential time complexity and is not feasible for larger input strings.

2. Optimal Approach (Greedy with Stack):

  • The optimal solution employs a greedy approach along with a stack data structure to efficiently build the desired subsequence.
  • The algorithm iterates through the input string s, maintaining a stack that represents the potential subsequence.
  • A last_occurrence dictionary stores the last index of each character in s. This information is crucial for making informed decisions during the construction of the subsequence.
  • A visited set keeps track of characters that are already present in the stack, ensuring that each character appears only once in the subsequence.

Here's a step-by-step breakdown of the algorithm:

  1. Initialization:

    • Create a last_occurrence dictionary to store the last index of each character in s.
    • Initialize an empty stack stack to build the subsequence.
    • Initialize an empty set visited to keep track of visited characters.
  2. Iteration:

    • Iterate through the input string s with index i and character char.
  3. Character Visit Check:

    • If the current character char is not in the visited set, proceed with the following steps:
  4. Stack Optimization:

    • While the stack is not empty, the current character char is lexicographically smaller than the last character in the stack, and the current index i is less than the last occurrence index of the last character in the stack:
      • Remove the last character from the stack and also remove it from the visited set.
    • This step ensures that we maintain the lexicographically smallest subsequence by removing larger characters from the stack when we encounter a smaller character later in the string.
  5. Stack Update:

    • Append the current character char to the stack.
    • Add the current character char to the visited set.
  6. Result:

    • After iterating through the entire string s, the stack contains the lexicographically smallest subsequence. Join the characters in the stack to form the result string.
class Solution:
    def smallestSubsequence(self, s: str) -> str:
        last_occurrence = {}
        for i, char in enumerate(s):
            last_occurrence[char] = i
        
        stack = []
        visited = set()
        
        for i, char in enumerate(s):
            if char not in visited:
                while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
                    visited.remove(stack[-1])
                    stack.pop()
                
                stack.append(char)
                visited.add(char)
        
        return ''.join(stack)

Example

Let's walk through the s = "cbacdcbc" example.

  1. last_occurrence becomes {'c': 7, 'b': 6, 'a': 2, 'd': 4}.
  2. We iterate through s:
    • c: stack is empty, so we push c. stack = ['c'], visited = {'c'}.
    • b: b < c and 6 < 7, so we pop c. stack = ['b'], visited = {'b'}.
    • a: a < b and 2 < 6, so we pop b. stack = ['a'], visited = {'a'}.
    • c: c is not in visited. stack = ['a', 'c'], visited = {'a', 'c'}.
    • d: stack = ['a', 'c', 'd'], visited = {'a', 'c', 'd'}.
    • c: c is in visited.
    • b: b < d and 6 < 4 is false. stack = ['a', 'c', 'd', 'b'], visited = {'a', 'c', 'd', 'b'}.
    • c: b < c and 6 < 7. stack = ['a', 'c', 'd', 'b'], visited = {'a', 'c', 'd', 'b'}.
  3. Return ''.join(stack), which is acdb.

Big(O) Run-time Analysis:

  • The outer loop iterates through the string s once, which takes O(n) time, where n is the length of s.
  • The last_occurrence dictionary is constructed in O(n) time.
  • The while loop inside the outer loop can potentially iterate through the entire stack in the worst case. However, each element can be added to and removed from the stack at most once. Therefore, the amortized time complexity of the while loop is O(1).
  • Overall, the time complexity of the algorithm is O(n) because the outer loop dominates the run-time.

Big(O) Space Usage Analysis:

  • The last_occurrence dictionary stores the last index of each distinct character in s. In the worst case, all characters in s are distinct, so the dictionary takes O(k) space, where k is the number of distinct characters in s. Since the string s consists of lowercase English letters, k can be at most 26. Therefore, the space used by the dictionary is O(1).
  • The stack stores the characters in the subsequence. In the worst case, the stack can contain all distinct characters in s, so it takes O(k) space, which is O(1) in this case.
  • The visited set stores the characters that are currently in the stack. In the worst case, the set can contain all distinct characters in s, so it takes O(k) space, which is O(1) in this case.
  • Overall, the space complexity of the algorithm is O(1) because the space used by the dictionary, stack, and set are bounded by the number of distinct characters in s, which is at most 26.

Edge Cases:

  1. Empty String:
    • If the input string s is empty, the algorithm should return an empty string.
  2. String with One Character:
    • If the input string s contains only one character, the algorithm should return the same string.
  3. String with Duplicate Characters:
    • If the input string s contains duplicate characters, the algorithm should remove the duplicates while maintaining the lexicographically smallest order.
  4. String with Already Sorted Characters:
    • If the input string s is already sorted in lexicographical order, the algorithm should return the same string.