Taro Logo

Smallest Subsequence of Distinct Characters

Medium
ByteDance logo
ByteDance
2 views
Topics:
StringsGreedy AlgorithmsStacks

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Constraints:

  • 1 <= s.length <= 1000
  • s consists of lowercase English letters.
Note: This question is the same as 316: https://leetcode.com/problems/remove-duplicate-letters/

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string s?
  2. If the input string `s` is empty, what should the function return?
  3. Is the input string `s` guaranteed to contain all lowercase English letters at least once?
  4. If there are multiple lexicographically smallest subsequences that satisfy the conditions, is any one of them acceptable, or should I return a specific one?
  5. Can you provide a few more examples with expected outputs, especially for edge cases or unusual inputs?

Brute Force Solution

Approach

The brute force approach to finding the smallest subsequence with distinct characters means we're going to try every single possibility. We'll generate every possible subsequence, check if it meets our criteria (being distinct), and keep track of the best one we've found so far.

Here's how the algorithm would work step-by-step:

  1. First, create every possible combination of characters from the original string, including the empty one.
  2. For each of these combinations, check if all the characters in it are unique; that is, no character is repeated.
  3. If a combination has only unique characters, compare its length to the length of the shortest unique combination we've seen so far.
  4. If the current combination is shorter than the shortest one we've seen and contains unique characters, save it as the new shortest.
  5. After checking all possible combinations, return the shortest unique combination that we found.

Code Implementation

def smallest_subsequence_distinct_brute_force(input_string):
    shortest_unique_subsequence = ""

    # Generate all possible subsequences.
    for i in range(1 << len(input_string)):
        subsequence = ""
        for j in range(len(input_string)):
            if (i >> j) & 1:
                subsequence += input_string[j]

        # Check if the subsequence contains distinct characters.
        if len(set(subsequence)) == len(subsequence):
            #Ensure only distinct characters are taken into account

            if shortest_unique_subsequence == "" or len(subsequence) < len(shortest_unique_subsequence):
                # Update the shortest subsequence.
                shortest_unique_subsequence = subsequence

    return shortest_unique_subsequence

Big(O) Analysis

Time Complexity
O(2^n * n)Generating all possible subsequences from a string of length n takes O(2^n) time because each character can either be included or excluded in a subsequence. For each subsequence generated, we need to check if its characters are distinct. Checking for distinct characters in a subsequence of maximum length n can take O(n) time (e.g., using a set or by iterating and comparing). Thus, the overall time complexity is O(2^n * n), where 2^n is for generating subsequences and n is for checking distinct characters in each subsequence.
Space Complexity
O(2^N)The algorithm generates all possible subsequences. In the worst-case scenario, a string of length N has 2^N subsequences (including the empty subsequence). Each subsequence can take up to N characters in memory. Consequently, although not all subsequences are held in memory simultaneously, the recursive call stack could potentially reach a depth reflecting the number of subsequences generated when constructing them or when evaluating their uniqueness, leading to a space complexity proportional to the number of subsequences. Thus the space used is O(2^N) because we potentially need to store a large number of candidate subsequences.

Optimal Solution

Approach

To find the smallest subsequence with unique characters, we use a clever way to build the subsequence one character at a time, ensuring we pick the right characters in the right order. This approach avoids exploring all possible subsequences by making smart choices based on the input string's remaining characters.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the original string.
  2. Go through the string from left to right. Keep track of which letters are already in our subsequence.
  3. If we see a new letter that isn't already in our subsequence, we check if we can make our subsequence better by removing letters from the end.
  4. We can remove a letter from the end of our subsequence if it appears later in the string and if we want to make space to put the new letter.
  5. Keep doing this until we can't improve our subsequence anymore or we've processed the entire string.
  6. The resulting subsequence will be the smallest one containing all unique characters from the original string, in the correct order.

Code Implementation

def smallestSubsequence(input_string):
    letter_counts = {}
    for letter in input_string:
        letter_counts[letter] = letter_counts.get(letter, 0) + 1

    stack_of_letters = []
    seen_letters = set()

    for letter in input_string:
        letter_counts[letter] -= 1

        if letter in seen_letters:
            continue

        # This ensures the smallest lexicographical order
        while stack_of_letters and letter < stack_of_letters[-1] and letter_counts[stack_of_letters[-1]] > 0:
            seen_letters.remove(stack_of_letters[-1])
            stack_of_letters.pop()

        # Keep track of what's in our potential result
        stack_of_letters.append(letter)
        seen_letters.add(letter)

    return ''.join(stack_of_letters)

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n once to count the frequency of each character. It then iterates through the string again, potentially adding characters to a stack (or similar data structure) and removing characters from the stack based on the character frequencies. The operations inside the second loop, such as checking if a character is in the stack and removing characters from the stack, take constant time on average because, in the worst case, the stack can contain all unique characters (at most 26 for lowercase English letters), effectively making the stack operations O(1) on average. Therefore, the overall time complexity is dominated by the two linear scans of the input string, resulting in O(n).
Space Complexity
O(1)The space complexity is primarily determined by the data structures used to store the counts of each character, the presence of a character in the subsequence, and the subsequence itself. The character counts are stored in a data structure of fixed size, dependent on the alphabet size (e.g., 26 for lowercase English letters). The subsequence also contains at most the same number of characters as the alphabet. Thus, the space used is independent of the input string length N, leading to a constant space complexity.

Edge Cases

CaseHow to Handle
Empty input stringReturn an empty string, as there are no characters to include in the subsequence.
Null input stringTreat as empty string or throw IllegalArgumentException, depending on requirements.
Input string with only one distinct characterReturn the input string itself, as it already satisfies the conditions.
Input string with all the same characters (e.g., 'aaaa')Return the single character, as only one instance is needed.
Input string with characters in reverse lexicographical order (e.g., 'zyx')The solution should correctly identify and maintain the correct order, so the result should be 'xyz'.
Input string with characters already in lexicographical order (e.g., 'abc')Return the input string, as it already satisfies the lexicographical order.
Very long input string (performance considerations)The solution must have linear time complexity in relation to the length of the input string, making sure that operations such as substring searches and sorting can be executed efficiently.
Input string with all lowercase letters of the alphabet includedThe algorithm correctly handles all distinct characters without omission; the output should include all 26 characters in the correct order.