Taro Logo

Remove Duplicate Letters

Medium
1 views
2 months ago

Given a string s, remove duplicate letters so that every letter appears once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.

Example 1:

Input: s = "bcabc" Output: "abc"

Example 2:

Input: s = "cbacdcbc" Output: "acdb"

Constraints:

  • 1 <= s.length <= 10^4
  • s consists of lowercase English letters.

Note: This question is the same as 1081: https://leetcode.com/problems/smallest-subsequence-of-distinct-characters/

Sample Answer
## Remove Duplicate Letters

This problem asks us to remove duplicate letters from a string such that each letter appears only once, and the resulting string is the smallest in lexicographical order among all possible results.

### 1. Brute Force Solution (and why it's not ideal)

A brute-force approach would involve generating all possible subsequences of the given string, filtering out those that contain duplicate letters, and then finding the smallest subsequence in lexicographical order. However, this approach is highly inefficient because the number of subsequences grows exponentially with the length of the input string. This makes it impractical for even moderately sized strings.

### 2. Optimal Solution: Greedy Approach with Stack

A more efficient solution involves using a greedy approach combined with a stack. The idea is to maintain a stack of characters that represent a candidate for the smallest lexicographical subsequence. As we iterate through the input string, we make the following decisions:

1.  If the current character is not in the stack, we check if it can replace any of the characters already in the stack.
2.  A character in the stack can be replaced if it is greater than the current character and it appears later in the string.
3.  We use a `last_occurrence` map to efficiently determine if a character appears later in the string.

Here's the Python code:

```python
def remove_duplicate_letters(s: str) -> str:
    last_occurrence = {}
    for i, char in enumerate(s):
        last_occurrence[char] = i

    stack = []
    seen = set()

    for i, char in enumerate(s):
        if char not in seen:
            while stack and char < stack[-1] and i < last_occurrence[stack[-1]]:
                seen.remove(stack.pop())
            stack.append(char)
            seen.add(char)

    return ''.join(stack)

# Example usage:
s = "bcabc"
result = remove_duplicate_letters(s)
print(f"Input: {s}, Output: {result}")  # Output: abc

s = "cbacdcbc"
result = remove_duplicate_letters(s)
print(f"Input: {s}, Output: {result}")  # Output: acdb

3. Big(O) Run-time Analysis

The run-time complexity of this solution is O(n), where n is the length of the input string s. This is because we iterate through the string once to populate the last_occurrence map and then iterate through it again to process each character. The stack operations (push and pop) take constant time. The while loop inside the main loop might seem like it would increase the complexity, but each character can only be added and removed from the stack once. Thus, the while loop contributes at most O(n) operations over the entire execution of the algorithm.

4. Big(O) Space Usage Analysis

The space complexity of this solution is O(1), because the last_occurrence map, the stack, and the seen set can contain at most 26 characters (the number of lowercase English letters). Therefore, the space used by these data structures is constant and does not depend on the size of the input string.

5. Edge Cases and Considerations

  • Empty String: If the input string is empty, the function should return an empty string. The code handles this case correctly.
  • String with only one distinct character: If the input string contains only one distinct character repeated multiple times (e.g., "aaaa"), the function should return that character once (e.g., "a"). The code handles this correctly due to the seen set.
  • String with already unique characters: If the input string already contains unique characters in lexicographical order (e.g., "abc"), the function should return the same string. The code handles this correctly.
  • Large Input String: The algorithm is efficient enough to handle large input strings (up to 104 as stated in the constraints) without significant performance issues.