Given a string s
, return the lexicographically smallest subsequence of s
that contains all the distinct characters of s
exactly once.
Example 1: Input: s = "bcabc" Output: "abc"
Example 2: Input: s = "cbacdcbc" Output: "acdb"
Constraints: 1 <= s.length <= 1000 s consists of lowercase English letters.
Note: This question is the same as 316: https://leetcode.com/problems/remove-duplicate-letters/
def smallestSubsequence(s: str) -> str:
"""Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once."""
last_occurrence = {}
for i, char in enumerate(s):
last_occurrence[char] = i
stack = []
visited = set()
for i, char in enumerate(s):
if char not in visited:
while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
visited.remove(stack[-1])
stack.pop()
stack.append(char)
visited.add(char)
return "".join(stack)
# Example usage
s = "bcabc"
result = smallestSubsequence(s)
print(f"The lexicographically smallest subsequence of '{s}' is: {result}")
s = "cbacdcbc"
result = smallestSubsequence(s)
print(f"The lexicographically smallest subsequence of '{s}' is: {result}")
The function smallestSubsequence(s)
finds the lexicographically smallest subsequence of a string s
that contains all distinct characters exactly once. Here's a breakdown of how it works:
last_occurrence
dictionary stores the index of the last occurrence of each character in the input string s
. This is crucial for making informed decisions about whether to potentially remove a character from the stack later.stack
: A list used as a stack to build the subsequence.visited
: A set to keep track of characters currently in the stack.s
using a for
loop, with i
as the index and char
as the character at that index.if char not in visited:
: This condition ensures that each distinct character is added to the subsequence only once.while
loop is the core of the algorithm. It checks three conditions:
stack
: Ensures the stack is not empty.char < stack[-1]
: Checks if the current character char
is lexicographically smaller than the last character in the stack.i < last_occurrence[stack[-1]]
: Checks if the last character in the stack appears again later in the string. If it does, it's safe to potentially remove it to see if a better (lexicographically smaller) subsequence can be formed.visited
set because it's no longer in the current subsequence.while
loop finishes (meaning no more characters need to be removed from the stack to maintain the lexicographical order), the current character char
is added to the stack and marked as visited."".join(stack)
, and this string (the lexicographically smallest subsequence) is returned.Let's trace the execution with s = "bcabc"
last_occurrence = {'b': 3, 'c': 4, 'a': 2}
stack = []
, visited = set()
i = 0
, char = 'b'
: char not in visited
is true. Stack is empty, so 'b' is added. stack = ['b']
, visited = {'b'}
i = 1
, char = 'c'
: char not in visited
is true. char < stack[-1]
('c' < 'b') is false. 'c' is added. stack = ['b', 'c']
, visited = {'b', 'c'}
i = 2
, char = 'a'
: char not in visited
is true. char < stack[-1]
('a' < 'c') is true. i < last_occurrence[stack[-1]]
(2 < 4) is true. 'c' is removed. stack = ['b']
, visited = {'b'}
. char < stack[-1]
('a' < 'b') is true. i < last_occurrence[stack[-1]]
(2 < 3) is true. 'b' is removed. stack = []
, visited = set()
. 'a' is added. stack = ['a']
, visited = {'a'}
i = 3
, char = 'b'
: char not in visited
is true. Stack is empty, so 'b' is added. stack = ['a', 'b']
, visited = {'a', 'b'}
i = 4
, char = 'c'
: char not in visited
is true. char < stack[-1]
('c' < 'b') is false. 'c' is added. stack = ['a', 'b', 'c']
, visited = {'a', 'b', 'c'}
"abc"
The time complexity is O(N), where N is the length of the string s
. This is because the algorithm iterates through the string once to populate the last_occurrence
dictionary and then iterates through the string again to build the subsequence using the stack. The while loop inside the second iteration might seem like it could increase the time complexity, but in reality, each character is added to and removed from the stack at most once, so the while loop operations are still bounded by O(N).
The space complexity is O(K), where K is the number of distinct characters in the string s
. In the worst case, K can be equal to N (if all characters are distinct), but in many cases, K will be smaller than N. The last_occurrence
dictionary stores the last index of each distinct character, and the stack
and visited
set store at most K characters. Since K is bounded by the number of distinct characters (which in the case of lowercase English letters is at most 26), the space complexity can also be considered O(1) in terms of the size of the input string, as it does not scale linearly with the input size.