Taro Logo

Sort Characters By Frequency

Medium
Amazon logo
Amazon
2 views
Topics:
StringsArrays

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

For example:

  • If s = "tree", the output should be "eert". 'e' appears twice, while 'r' and 't' appear once. 'e' must appear before both 'r' and 't'. "eetr" is also a valid answer.
  • If s = "cccaaa", the output should be "aaaccc". Both 'c' and 'a' appear three times, so both "cccaaa" and "aaaccc" are valid. Note that "cacaca" is incorrect, as the same characters must be together.
  • If s = "Aabb", the output should be "bbAa". 'b' appears twice and 'A' and 'a' appear once. Note that 'A' and 'a' are treated as two different characters.

Write a function that takes a string as input and returns the frequency-sorted string as output.

What are the time and space complexities of your solution? How would you handle edge cases like an empty string or a string with only unique characters?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected character set (ASCII, extended ASCII, Unicode)?
  2. What should I return if the input string is null or empty?
  3. If multiple characters have the same frequency, is the order among them important in the output?
  4. Is the input case-sensitive, and should the output preserve the original case of the characters?
  5. What is the maximum length of the input string I should expect?

Brute Force Solution

Approach

The basic idea is to count how many times each letter appears, then try all possible arrangements of the letters based on their counts. Since that is inefficient, we will start by placing the most frequent characters first to build our output, followed by less frequent characters.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter shows up in the input.
  2. Then, find the letter that appears most often.
  3. Add that letter to the result as many times as it appeared.
  4. Next, find the letter that appears second most often.
  5. Add that letter to the result as many times as it appeared.
  6. Keep doing this, going from the most frequent letter to the least frequent letter.
  7. The final result is your sorted string.

Code Implementation

def sort_characters_by_frequency_brute_force(input_string):
    character_counts = {}
    for character in input_string:
        character_counts[character] = character_counts.get(character, 0) + 1

    result = ""
    while character_counts:
        most_frequent_character = None
        max_frequency = 0

        # Find the character with the highest frequency
        for character, frequency in character_counts.items():
            if frequency > max_frequency:
                most_frequent_character = character
                max_frequency = frequency

        # Append the most frequent char to result
        if most_frequent_character:
            result += most_frequent_character * character_counts[most_frequent_character]

            # Remove the most frequent char from counts
            del character_counts[most_frequent_character]

    return result

Big(O) Analysis

Time Complexity
O(n log n)Counting character frequencies involves iterating through the input string of length n, which takes O(n) time. Finding the most frequent character repeatedly, assuming we use a heap (priority queue) of size up to n (the number of distinct characters), takes O(log n) for each insertion or removal. Since we process each character's frequency, we might perform these heap operations up to n times. Building the result string involves appending characters, which takes O(n) time in total. Therefore, the dominant operation is repeatedly finding the most frequent character from the character counts which equates to O(n log n). Consequently, the overall time complexity becomes O(n log n).
Space Complexity
O(N)The algorithm utilizes a hash map (or similar data structure) to count the frequency of each character in the input string of length N. This hash map can potentially store each unique character in the input string along with its frequency, leading to a space usage proportional to the number of unique characters. In the worst-case scenario where all characters are unique, the hash map will store N character-frequency pairs. Therefore, the auxiliary space used is O(N).

Optimal Solution

Approach

The goal is to rearrange the letters in a word or phrase based on how often each letter appears. We want to end up with the most frequent letters appearing first, followed by the less frequent ones. The core idea is to count each letter's occurrences and then put them back together in the order of those counts, from highest to lowest.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the given word or phrase. It's like taking a survey of all the letters.
  2. Next, organize the letters based on their counts. The letter that appeared most often goes first, the next most frequent letter goes second, and so on.
  3. Finally, build the new word or phrase. Start by repeating the most frequent letter its corresponding number of times, then add the next most frequent letter its corresponding number of times, and keep going until you've used all the letters.
  4. The resulting word or phrase will have its letters sorted by frequency, with the most common letters at the beginning.

Code Implementation

def sort_characters_by_frequency(input_string):
    character_counts = {}
    for char in input_string:
        character_counts[char] = character_counts.get(char, 0) + 1

    # We sort character counts to build the result string correctly.
    sorted_character_counts = sorted(character_counts.items(), key=lambda item: item[1], reverse=True)

    result_string = ''

    # Now build the string based on frequency
    for char, count in sorted_character_counts:
        result_string += char * count

    return result_string

Big(O) Analysis

Time Complexity
O(n log n)First, counting the frequency of each character involves iterating through the input string of size n, resulting in O(n) time complexity. Next, sorting the characters based on their frequencies using an efficient sorting algorithm like mergesort or heapsort takes O(k log k) time, where k is the number of distinct characters (at most n). Building the final sorted string involves iterating through the sorted characters and appending them to the result, which takes O(n) time. Since k is bounded by n, the dominant term is O(n log n) from the sorting step.
Space Complexity
O(N)The algorithm first counts the frequency of each character using a hash map or similar data structure. In the worst case, all characters in the input string are unique, requiring storage for N character-count pairs, where N is the length of the input string. Subsequently, constructing the sorted string requires another auxiliary space proportional to N to store the rearranged characters. Therefore, the overall auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn an empty string or throw an IllegalArgumentException, depending on the requirements.
Input string with only one characterReturn the input string as is since it's already sorted by frequency.
Input string with all identical charactersReturn the input string as is since all characters have the same frequency.
Maximum input string length exceeding available memoryThe solution might need to be optimized for memory usage, potentially using streaming or external sorting techniques.
Input string with Unicode charactersEnsure the character counting and sorting logic correctly handles Unicode characters and their frequencies.
Input string with a large number of distinct charactersThe frequency map should scale efficiently to accommodate a wide range of characters without significant performance degradation.
Multiple characters with the same frequencyThe relative order of these characters with equal frequency is not specified, so any order is valid.
Integer overflow when counting character frequenciesUse a data type with sufficient range, such as long, or implement a check to prevent overflow.