Taro Logo

Minimum Deletions to Make Character Frequencies Unique

Medium
Google logo
Google
1 view
Topics:
StringsGreedy Algorithms

A string s is called good if there are no two different characters in s that have the same frequency.

Given a string s, return the minimum number of characters you need to delete to make s good.

The frequency of a character in a string is the number of times it appears in the string. For example, in the string "aab", the frequency of 'a' is 2, while the frequency of 'b' is 1.

Example 1:

Input: s = "aab"
Output: 0
Explanation: s is already good.

Example 2:

Input: s = "aaabbbcc"
Output: 2
Explanation: You can delete two 'b's resulting in the good string "aaabcc".
Another way it to delete one 'b' and one 'c' resulting in the good string "aaabbc".

Example 3:

Input: s = "ceabaacb"
Output: 2
Explanation: You can delete both 'c's resulting in the good string "eabaab".
Note that we only care about characters that are still in the string at the end (i.e. frequency of 0 is ignored).

Constraints:

  • 1 <= s.length <= 105
  • s contains only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string `s`?
  2. Is the input string `s` guaranteed to only contain lowercase English letters?
  3. If the input string is empty, should I return 0?
  4. If multiple deletion strategies lead to the same minimum number of deletions, is any one of them acceptable?
  5. Can you provide a more concrete example, especially one where multiple characters have the same initial frequency?

Brute Force Solution

Approach

The brute force approach to making character frequencies unique involves trying every possible combination of deletions. We generate all possible deletion scenarios and then check if the resulting string has unique character frequencies.

Here's how the algorithm would work step-by-step:

  1. Start with the original string and calculate the frequency of each character.
  2. Check if all frequencies are already unique. If they are, we need zero deletions and are done.
  3. If the frequencies are not unique, try deleting one character at a time from every possible location in the string, recalculating the frequencies, and checking for uniqueness.
  4. If deleting one character doesn't lead to unique frequencies, try deleting two characters at a time in every possible combination, recalculating frequencies, and checking for uniqueness.
  5. Continue this process, deleting three characters, then four, and so on, considering all possible combinations of deletions at each step.
  6. Keep track of the minimum number of deletions required to achieve unique frequencies among all the combinations we try.
  7. Eventually, you will find a combination of deletions that results in unique frequencies. The number of deletions in that combination is your answer.

Code Implementation

def min_deletions_brute_force(input_string):
    minimum_deletions = float('inf')
    string_length = len(input_string)

    for number_of_deletions in range(string_length + 1):
        for indices_to_delete in combinations(range(string_length), number_of_deletions):
            modified_string = ''.join([char for index, char in enumerate(input_string) if index not in indices_to_delete])
            frequencies = {}
            for char in modified_string:
                frequencies[char] = frequencies.get(char, 0) + 1

            if len(set(frequencies.values())) == len(frequencies):
                # Check if frequencies are unique.

                minimum_deletions = min(minimum_deletions, number_of_deletions)

    return minimum_deletions

from itertools import combinations

Big(O) Analysis

Time Complexity
O(2^n * n)The brute force approach iterates through all possible subsets of the string to simulate deleting characters. There are 2^n possible subsets for a string of length n. For each subset, we need to recalculate the character frequencies, which takes O(n) time. Therefore, the overall time complexity is O(2^n * n), as we explore all possible deletion combinations and recalculate frequencies for each.
Space Complexity
O(N)The brute force approach requires storing the frequency of each character in the string, which can be done using a hash map or an array of size proportional to the number of possible characters. Also, since we are generating all possible combinations of deletions, we need to store intermediate strings. In the worst case, we may need to store frequencies and strings with sizes close to the input string's length (N). Therefore, the auxiliary space used scales linearly with the input size N.

Optimal Solution

Approach

The goal is to find the fewest removals to make all character counts different. The clever strategy is to count each character, then gradually reduce counts if needed, deleting only what's required to achieve uniqueness.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the word.
  2. Next, imagine organizing these counts from largest to smallest.
  3. Start with the biggest count. Check if any other count is the same.
  4. If there's a duplicate, reduce the larger of the duplicate counts by one, and note that you made a deletion.
  5. Repeat this process, comparing each count to all the counts that come after it. Continue reducing counts and tracking deletions until all counts are unique.
  6. The total number of reductions made is the fewest number of letters you need to delete to make the character counts unique.

Code Implementation

def minimum_deletions_to_make_character_frequencies_unique(input_string):
    character_counts = {}
    for char in input_string:
        character_counts[char] = character_counts.get(char, 0) + 1

    frequencies = sorted(character_counts.values(), reverse=True)
    deletion_count = 0

    # Iterate through the frequencies to ensure uniqueness.
    for i in range(len(frequencies)): 
        for j in range(i + 1, len(frequencies)): 

            # Reduce current frequency if it's not unique.
            while frequencies[i] != 0 and frequencies[i] == frequencies[j]:

                frequencies[i] -= 1
                deletion_count += 1

                #Re-sort to maintain order after decrement.
                frequencies.sort(reverse=True)
                break

    return deletion_count

Big(O) Analysis

Time Complexity
O(n^2)The initial character counting takes O(n) time, where n is the length of the input string. After counting, we iterate through the frequencies, which, in the worst case, could have up to n unique frequencies (e.g., 'a' repeated n times). For each frequency, we compare it with the remaining frequencies to check for duplicates. This nested comparison contributes the dominant factor to the overall runtime. Consequently, the repeated comparison results in approximately n * (n-1) / 2 operations, which simplifies to O(n^2).
Space Complexity
O(1)The algorithm primarily uses a count of each character, which can be stored in a fixed-size array or hash map since the character set is limited (e.g., lowercase English alphabet, 26 characters). The rest of the operations described involve modifying these counts and tracking a deletion count. Therefore, the auxiliary space needed does not scale with the input string's length, N, but depends on the fixed size of the character set. Consequently, the space complexity is constant.

Edge Cases

CaseHow to Handle
Empty input stringReturn 0 as no deletions are needed for an empty string.
String with a single characterReturn 0 as a single character string has unique frequency.
String where all characters are the same (e.g., 'aaaa')The algorithm should iteratively decrement the frequency until it becomes unique, accumulating deletions.
String with already unique character frequencies (e.g., 'abc')Return 0 as no deletions are needed.
Maximum length string (considering memory constraints and potential integer overflow when counting frequencies)Use appropriate data types (e.g., int or long) to prevent overflow and ensure the frequency array can handle the length.
String with a large number of distinct characters each with a high frequencyThe algorithm should handle numerous reductions in frequency without exceeding time limit (optimize to avoid O(n^2)).
String where the frequencies of some characters are already zero before any deletionsThe algorithm should still correctly process other characters, treating the zero frequencies as unique and not requiring deletion.
String requiring many deletions to achieve unique frequenciesEnsure the algorithm does not time out due to excessive iterations to achieve unique frequencies.