Smallest Subsequence of Distinct Characters

Medium

Google

0 views

Topics:

StringsStacksGreedy Algorithms

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Constraints:

1 <= s.length <= 1000
s consists of lowercase English letters.

Smallest Subsequence of Distinct Characters

Medium

Google

0 views

Topics:

StringsStacksGreedy Algorithms

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Constraints:

1 <= s.length <= 1000
s consists of lowercase English letters.

Solution

Naive Approach

A brute force approach would involve generating all possible subsequences of the input string s, filtering out those that do not contain all distinct characters exactly once, and then finding the lexicographically smallest among the remaining subsequences. This approach is highly inefficient because the number of subsequences grows exponentially with the length of the string.

Complexity Analysis

Time Complexity: O(2^n * n), where n is the length of the string. Generating all subsequences takes O(2^n), and for each subsequence, we need to check if it contains all distinct characters exactly once, which takes O(n).
Space Complexity: O(n), where n is the length of the string, to store the subsequences.

Optimal Approach: Using Stack and Greedy Algorithm

A more efficient approach involves using a stack and a greedy algorithm. The idea is to iterate through the string and maintain a stack of characters that form the lexicographically smallest subsequence seen so far. For each character, we check if it's already in the stack. If not, we compare it with the top of the stack. If the current character is smaller than the top of the stack and the top of the stack appears later in the string, we pop the top of the stack and push the current character. We repeat this process until the stack is empty or the current character is not smaller than the top of the stack or the top of the stack does not appear later in the string.

Algorithm

Create a last_occurrence map to store the last index of each character in the string.
Create a seen set to keep track of characters currently in the stack.
Initialize an empty stack stack.
Iterate through the string s with index i and character char:
- If char is in seen, skip it.
- While stack is not empty, the top of the stack is greater than char, and last_occurrence[stack[-1]] > i:
  - Remove the top of the stack from seen.
  - Pop the top of the stack.
- Add char to seen.
- Push char onto the stack.
Return the string formed by the characters in the stack.

Code

def smallestSubsequence(s: str) -> str:
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    stack = []
    seen = set()

    for i, char in enumerate(s):
        if char in seen:
            continue

        while stack and stack[-1] > char and last_occurrence[stack[-1]] > i:
            removed_char = stack.pop()
            seen.remove(removed_char)

        stack.append(char)
        seen.add(char)

    return "".join(stack)

Example

Let's trace the execution with the input s = "cbacdcbc"

last_occurrence = {'c': 7, 'b': 6, 'a': 2, 'd': 4}
stack = [], seen = set()
i = 0, char = 'c'. Stack is empty, so stack.append('c'), seen = {'c'}
i = 1, char = 'b'. 'c' > 'b' and last_occurrence['c'] > 1, so pop 'c'. stack = [], seen = {}. stack.append('b'), seen = {'b'}
i = 2, char = 'a'. 'b' > 'a' and last_occurrence['b'] > 2, so pop 'b'. stack = [], seen = {}. stack.append('a'), seen = {'a'}
i = 3, char = 'c'. Stack is not empty but 'a' < 'c'. stack.append('c'), seen = {'a', 'c'}
i = 4, char = 'd'. Stack is not empty but 'c' < 'd'. stack.append('d'), seen = {'a', 'c', 'd'}
i = 5, char = 'c'. char in seen, skip.
i = 6, char = 'b'. 'd' > 'b' and last_occurrence['d'] > 6, so pop 'd'. stack = ['a', 'c'], seen = {'a', 'c'}. 'c' > 'b' and last_occurrence['c'] > 6, so pop 'c'. stack = ['a'], seen = {'a'}. stack.append('b'), seen = {'a', 'b'}
i = 7, char = 'c'. char in seen, skip.
Return "".join(['a', 'b', 'c', 'd']) = "acdb"

Complexity Analysis

Time Complexity: O(n), where n is the length of the string. We iterate through the string once. The while loop inside the for loop runs at most n times in total because each character can be pushed and popped from the stack at most once.
Space Complexity: O(n), where n is the length of the string, to store the last_occurrence map, seen set, and the stack.

Edge Cases

Empty string: The algorithm will return an empty string.
String with only one distinct character: The algorithm will return the string with that character.
String with duplicate characters: The algorithm will correctly remove duplicate characters and return the lexicographically smallest subsequence.

Solution

Naive Approach

Complexity Analysis

Time Complexity: O(2^n * n), where n is the length of the string. Generating all subsequences takes O(2^n), and for each subsequence, we need to check if it contains all distinct characters exactly once, which takes O(n).
Space Complexity: O(n), where n is the length of the string, to store the subsequences.

Optimal Approach: Using Stack and Greedy Algorithm

Algorithm

Create a last_occurrence map to store the last index of each character in the string.
Create a seen set to keep track of characters currently in the stack.
Initialize an empty stack stack.
Iterate through the string s with index i and character char:
- If char is in seen, skip it.
- While stack is not empty, the top of the stack is greater than char, and last_occurrence[stack[-1]] > i:
  - Remove the top of the stack from seen.
  - Pop the top of the stack.
- Add char to seen.
- Push char onto the stack.
Return the string formed by the characters in the stack.

Code

def smallestSubsequence(s: str) -> str:
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    stack = []
    seen = set()

    for i, char in enumerate(s):
        if char in seen:
            continue

        while stack and stack[-1] > char and last_occurrence[stack[-1]] > i:
            removed_char = stack.pop()
            seen.remove(removed_char)

        stack.append(char)
        seen.add(char)

    return "".join(stack)

Example

Let's trace the execution with the input s = "cbacdcbc"

last_occurrence = {'c': 7, 'b': 6, 'a': 2, 'd': 4}
stack = [], seen = set()
i = 0, char = 'c'. Stack is empty, so stack.append('c'), seen = {'c'}
i = 1, char = 'b'. 'c' > 'b' and last_occurrence['c'] > 1, so pop 'c'. stack = [], seen = {}. stack.append('b'), seen = {'b'}
i = 2, char = 'a'. 'b' > 'a' and last_occurrence['b'] > 2, so pop 'b'. stack = [], seen = {}. stack.append('a'), seen = {'a'}
i = 3, char = 'c'. Stack is not empty but 'a' < 'c'. stack.append('c'), seen = {'a', 'c'}
i = 4, char = 'd'. Stack is not empty but 'c' < 'd'. stack.append('d'), seen = {'a', 'c', 'd'}
i = 5, char = 'c'. char in seen, skip.
i = 6, char = 'b'. 'd' > 'b' and last_occurrence['d'] > 6, so pop 'd'. stack = ['a', 'c'], seen = {'a', 'c'}. 'c' > 'b' and last_occurrence['c'] > 6, so pop 'c'. stack = ['a'], seen = {'a'}. stack.append('b'), seen = {'a', 'b'}
i = 7, char = 'c'. char in seen, skip.
Return "".join(['a', 'b', 'c', 'd']) = "acdb"

Complexity Analysis

Time Complexity: O(n), where n is the length of the string. We iterate through the string once. The while loop inside the for loop runs at most n times in total because each character can be pushed and popped from the stack at most once.
Space Complexity: O(n), where n is the length of the string, to store the last_occurrence map, seen set, and the stack.

Edge Cases

Empty string: The algorithm will return an empty string.
String with only one distinct character: The algorithm will return the string with that character.
String with duplicate characters: The algorithm will correctly remove duplicate characters and return the lexicographically smallest subsequence.