Taro Logo

Sort Characters By Frequency

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+10
More companies
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
Profile picture
27 views
Topics:
StringsArrays

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

Example 1:

Input: s = "tree"
Output: "eert"
Explanation: 'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc"
Explanation: Both 'c' and 'a' appear three times, so both "cccaaa" and "aaaccc" are valid answers.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input: s = "Aabb"
Output: "bbAa"
Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

Constraints:

  • 1 <= s.length <= 5 * 105
  • s consists of uppercase and lowercase English letters and digits.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible characters in the string, and is it limited to ASCII or does it include Unicode characters?
  2. Is the case of the characters significant (i.e., is 'A' different from 'a')?
  3. If there are multiple characters with the same frequency, what is the desired order among them in the sorted string?
  4. What should the function return if the input string is null or empty?
  5. Are there any constraints on the length of the input string s?

Brute Force Solution

Approach

The brute force approach to sorting characters by frequency involves counting how often each character appears and then trying every possible arrangement of the characters. We generate all possible strings and pick the one that matches the frequency count and is sorted correctly. This is the most basic and computationally expensive way to solve the problem.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each unique character appears in the input.
  2. Then, generate every possible string using the characters from the input, making sure to use each character the correct number of times based on the counts from the first step.
  3. For each of these generated strings, check if it is sorted such that the characters that appear more often come before the characters that appear less often.
  4. If a generated string satisfies the frequency-based sorting condition, then that string is a possible solution.
  5. From all the possible solutions, arbitrarily pick one since the problem is expected to return any string sorted correctly.

Code Implementation

def sort_characters_by_frequency_brute_force(input_string):
    character_counts = {}
    for character in input_string:
        character_counts[character] = character_counts.get(character, 0) + 1

    unique_characters = list(character_counts.keys())
    all_possible_strings = []

    def generate_strings(current_string, remaining_counts):
        if not remaining_counts:
            all_possible_strings.append(current_string)
            return

        for character in unique_characters:
            if remaining_counts.get(character, 0) > 0:
                new_counts = remaining_counts.copy()
                new_counts[character] -= 1
                if new_counts[character] == 0:
                    del new_counts[character]
                
                generate_strings(current_string + character, new_counts)

    generate_strings("", character_counts)

    possible_solutions = []

    # We must check that the generated string is correctly sorted by frequency
    for possible_string in all_possible_strings:
        is_sorted_correctly = True
        for i in range(len(possible_string) - 1):
            first_character = possible_string[i]
            second_character = possible_string[i+1]
            if character_counts[first_character] < character_counts[second_character]:
                is_sorted_correctly = False
                break

        if is_sorted_correctly:
            possible_solutions.append(possible_string)

    # Any string that is sorted correctly is a valid response
    if possible_solutions:
        return possible_solutions[0]
    else:
        return ""

Big(O) Analysis

Time Complexity
O(n!)Counting character frequencies takes O(n) time, where n is the length of the input string. However, generating all possible strings from the input has a much larger cost. In the worst case, if all characters are distinct, we generate n! permutations. Checking each generated string for frequency-based sorting takes O(n) time. Therefore, the dominant operation is generating the permutations, leading to a time complexity of O(n!).
Space Complexity
O(N!)The brute force approach counts character frequencies, requiring O(K) space where K is the number of unique characters (K <= N). However, the algorithm's primary space consumer is generating all possible string permutations, which can grow up to N! in the worst case, requiring significant memory to store these intermediate strings. Checking if each generated string is sorted correctly does not add significant extra space, so the dominant factor is the storage of all possible string permutations. Thus, the space complexity is dominated by the permutations, leading to O(N!).

Optimal Solution

Approach

To sort characters by frequency, we first count how often each character appears. Then, we use this information to arrange the characters, placing the most frequent ones first.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each character appears in the given text.
  2. Next, make a list of the characters sorted from the most frequent to the least frequent, like arranging them in a popularity contest.
  3. Finally, build a new string by adding each character to the string as many times as it appeared in the original text, but now in the order from most frequent to least frequent.

Code Implementation

def sort_characters_by_frequency(input_string):
    character_frequencies = {}
    for char in input_string:
        character_frequencies[char] = character_frequencies.get(char, 0) + 1

    # Sort characters by frequency, most frequent first.
    sorted_characters = sorted(character_frequencies.items(), key=lambda item: item[1], reverse=True)

    result_string = ''
    # Build the result string based on the sorted character frequencies.
    for char, frequency in sorted_characters:
        result_string += char * frequency

    return result_string

Big(O) Analysis

Time Complexity
O(n log n)First, counting character frequencies involves iterating through the input string of length n, resulting in O(n) time complexity. Then, sorting the characters based on their frequencies requires a sorting algorithm. Using an efficient sorting algorithm like merge sort or heap sort results in O(k log k) time complexity, where k is the number of unique characters (at most n). Since k <= n, this step is O(n log n) in the worst case. Finally, constructing the sorted string takes O(n) time. Thus, the overall time complexity is dominated by the sorting step, giving us O(n log n).
Space Complexity
O(N)The algorithm first counts character frequencies, requiring a hash map (or array) to store these counts. In the worst case, all characters in the input string are unique, requiring space proportional to the input size N to store the character counts. Additionally, constructing the sorted string involves creating a new string whose length is also proportional to N. Therefore, the auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn an empty string or throw an IllegalArgumentException, depending on requirements, handling invalid inputs gracefully.
String with only one distinct characterThe algorithm should correctly return the input string, as it's already sorted by frequency.
String with maximum possible length (considering memory constraints)Ensure the frequency map and output string buffer can accommodate the maximum length without memory issues or integer overflows.
String with all identical charactersThe algorithm should correctly return the string, repeating the character n times where n is the length.
String with a wide range of ASCII/Unicode charactersEnsure the frequency map can handle the entire character set without collisions or out-of-bounds errors.
String with mixed case characters (e.g., 'A' and 'a')Consider whether case sensitivity is required; if not, normalize the case before counting frequencies.
Multiple characters with the same frequencyThe algorithm should sort them according to the original input's order, or document that order is undefined.
String containing whitespace charactersWhitespace characters should be treated like any other character, counted and sorted accordingly.