Taro Logo

Remove Duplicate Letters

Medium
Bloomberg LP logo
Bloomberg LP
1 view
Topics:
StringsGreedy AlgorithmsStacks

Given a string s, remove duplicate letters so that every letter appears once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Constraints:

  • 1 <= s.length <= 104
  • s consists of lowercase English letters.

Note: This question is the same as 1081: https://leetcode.com/problems/smallest-subsequence-of-distinct-characters/

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of characters that might appear in the input string? Is it limited to lowercase English letters, or could it include uppercase letters, numbers, or special characters?
  2. If the input string is empty, what should I return?
  3. If there are multiple possible lexicographically smallest subsequences, should I return any one of them, or is there a specific tie-breaking rule?
  4. Is the input string guaranteed to contain at least one of each character that I need to remove duplicates from?
  5. Can you provide a specific example where a greedy approach might fail, so I can better understand the nuances of the problem?

Brute Force Solution

Approach

The brute force strategy for removing duplicate letters involves exploring every single possible combination of letters from the input. We aim to find the lexicographically smallest string that contains each unique letter exactly once. This approach is essentially trying out all possibilities and seeing which one fits the criteria and is the best.

Here's how the algorithm would work step-by-step:

  1. First, determine the unique letters that exist in the original input.
  2. Then, generate all possible arrangements (permutations) of these unique letters.
  3. For each arrangement, check if it's a valid candidate. A valid candidate means that for every letter in the original input, the arrangement's letters appear in the same relative order.
  4. If an arrangement is valid, compare it to the best valid arrangement found so far. The goal is to find the arrangement that comes earliest in dictionary order (lexicographically smallest).
  5. Continue this process, considering every single permutation of unique characters.
  6. Once all possibilities have been checked, the smallest lexicographical string that satisfies the condition of maintaining relative order and only containing unique characters is the solution.

Code Implementation

def remove_duplicate_letters_brute_force(input_string):

    unique_characters = sorted(list(set(input_string)))

    import itertools
    all_permutations = list(itertools.permutations(unique_characters))

    best_valid_arrangement = None

    for permutation in all_permutations:
        arrangement_string = ''.join(permutation)
        is_valid = True

        # Check if current arrangement is valid
        input_index = 0
        arrangement_index = 0
        while input_index < len(input_string) and arrangement_index < len(arrangement_string):
            if input_string[input_index] == arrangement_string[arrangement_index]:
                arrangement_index += 1

            input_index += 1

        # If the permutation doesn't appear in order, it's not valid
        if arrangement_index != len(arrangement_string):
            is_valid = False

        if is_valid:
            #Find the lexicographically smallest permutation
            if best_valid_arrangement is None or arrangement_string < best_valid_arrangement:
                best_valid_arrangement = arrangement_string

    return best_valid_arrangement

Big(O) Analysis

Time Complexity
O(n!)Let 'n' be the length of the input string and 'k' be the number of unique characters in the input string. The algorithm first generates all possible permutations of these 'k' unique characters. Generating all permutations takes O(k!) time. Then, for each permutation, the algorithm checks if it is a valid candidate, which requires iterating through the original string of length 'n' to verify the relative order of characters; this takes O(n) time. Since we have to iterate through each of the permutations of 'k' elements, that makes the validation check take O(n * k!). Finally, the algorithm also has to perform a lexicographical comparison, which, in the worst case, would require comparing the lengths of the strings being checked, i.e., of 'k' length. In totality, the time complexity is dominated by the permutation generation step, so the overall time complexity is approximately O(n * k!), which simplifies to O(n!).
Space Complexity
O(N!)The brute force solution generates all possible permutations of the unique letters in the input string. If there are K unique letters, the number of permutations can be up to K!. In the worst case, where all letters are unique, K would equal N, the length of the input string. Generating and storing these permutations requires storing lists or strings in memory and each permutation will be of length K. Since the number of permutations is N! in the worst case, the auxiliary space is dominated by the storage of these permutations, resulting in O(N!).

Optimal Solution

Approach

The goal is to find the smallest string with all unique letters from the input, keeping the original order as much as possible. We achieve this by making 'greedy' decisions, prioritizing letters that appear later in the string if we have a choice. This ensures we don't prematurely remove letters needed later.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the string.
  2. Go through the string one letter at a time.
  3. For each letter, check if it's already in our result string.
  4. If it's not, compare it to the last letter in our result string.
  5. If the current letter is earlier in the alphabet than the last letter in the result AND the last letter appears again later in the string, remove the last letter from the result. Keep doing this until the condition isn't true anymore.
  6. Add the current letter to our result string.
  7. Decrease the count of the current letter because we've now used it.
  8. Repeat for all letters in the input string. The resulting string will have all unique letters in the best possible order.

Code Implementation

def remove_duplicate_letters(input_string):
    letter_counts = {}
    for letter in input_string:
        letter_counts[letter] = letter_counts.get(letter, 0) + 1

    result_stack = []
    seen_letters = set()

    for letter in input_string:
        # If letter not already in result, add it while maintaining order.
        if letter not in seen_letters:
            while result_stack and letter < result_stack[-1] and letter_counts[result_stack[-1]] > 0:
                # Remove larger letter if it appears later to obtain optimal ordering.

                removed_letter = result_stack.pop()
                seen_letters.remove(removed_letter)

            result_stack.append(letter)
            seen_letters.add(letter)

        letter_counts[letter] -= 1

    return ''.join(result_stack)

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string once, which takes O(n) time, where n is the length of the string. Inside this loop, there's a while loop that potentially removes characters from the result string. However, each character is added to the result string at most once and removed at most once. Therefore, the while loop's total operations across all iterations of the outer loop is bounded by n. The character count update also takes O(1) time. Hence, the overall time complexity is dominated by the single pass through the string, resulting in O(n).
Space Complexity
O(1)The auxiliary space is determined by the 'count' of each letter, the 'result string', and the 'seen' set. The 'count' array stores the frequency of each character, which is at most 26 (number of lowercase letters in the alphabet), making it constant space. The 'seen' set also stores at most 26 letters. The 'result string' holds unique characters and thus also has at most 26 characters. Therefore, the space used is constant and independent of the input string length N.

Edge Cases

CaseHow to Handle
Empty string inputReturn an empty string immediately as there are no characters to process.
String with only one unique characterReturn the string as is since there are no duplicates to remove.
String with all the same characters (e.g., 'aaaa')Return the single character once, as all are duplicates.
String with characters in reverse lexicographical order (e.g., 'zyx')The algorithm should still produce a valid result by choosing the rightmost smallest character.
String with a large number of duplicate characters clustered togetherThe algorithm needs to efficiently handle these clusters without excessive backtracking.
String containing all lowercase letters of the alphabet in reverse order ('zyxwvutsrqponmlkjihgfedcba')This tests the algorithm's ability to handle the worst-case scenario, which might expose performance issues.
String containing only duplicate characters except for one at the endEnsure the unique character is not incorrectly skipped or excluded.
String of maximum allowed length (consider memory constraints and algorithm complexity)Verify the chosen data structures (e.g., stacks, frequency maps) can handle maximum input size without errors such as stack overflow or out of memory.