Taro Logo

Sort Characters By Frequency

#140 Most AskedMedium
14 views
Topics:
StringsArrays

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

Example 1:

Input: s = "tree"
Output: "eert"
Explanation: 'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc"
Explanation: Both 'c' and 'a' appear three times, so both "cccaaa" and "aaaccc" are valid answers.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input: s = "Aabb"
Output: "bbAa"
Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

Constraints:

  • 1 <= s.length <= 5 * 105
  • s consists of uppercase and lowercase English letters and digits.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of characters in the input string? For example, is it limited to lowercase English letters, or can it include uppercase letters, numbers, and symbols?
  2. What should I do in the case of an empty string or a null input?
  3. Could you clarify the expected output format for characters with the same frequency? For instance, if 'a' and 'b' both appear twice, is 'aabb' and 'bbaa' equally valid?
  4. How large can the input string be? Are there any constraints on its length that I should be mindful of for performance?
  5. Does the character set include multi-byte characters, or can I assume we are working with standard ASCII or a similar single-byte character set?

Brute Force Solution

Approach

The brute force idea is to generate every single possible arrangement of the characters from the original text. Then, for each arrangement, we check if it's sorted correctly based on character counts and pick the one that is.

Here's how the algorithm would work step-by-step:

  1. First, create a list of every possible way the characters in the original text can be ordered.
  2. Go through these arrangements one by one.
  3. For each arrangement, check if it follows the sorting rule: are all identical characters grouped together?
  4. If they are not grouped together, discard this arrangement as it is not valid.
  5. If they are grouped together, count how many times each character appears in its group.
  6. Then, check if these groups are ordered from most frequent to least frequent.
  7. If an arrangement satisfies both the grouping and the frequency sorting rules, it is a potential solution.
  8. The first valid arrangement you find is your answer, since there might be multiple correct ways to order characters with the same frequency.

Code Implementation

import collections
import itertools

def sort_chars_by_frequency_brute_force(input_string):
    character_list = list(input_string)
    possible_arrangements = set(itertools.permutations(character_list))

    for arrangement_tuple in possible_arrangements:
        current_arrangement = "".join(arrangement_tuple)
        is_arrangement_valid = True
        last_seen_character = None
        last_seen_character_count = float('inf')
        current_group_count = 0
        current_group_character = None

        # First, iterate to check if characters are grouped and sorted by frequency.

        for character in current_arrangement:
            if character != current_group_character:
                if current_group_character is not None:
                    if current_group_count > last_seen_character_count:
                        is_arrangement_valid = False
                        break
                    last_seen_character_count = current_group_count
                current_group_character = character
                current_group_count = 1
            else:
                current_group_count += 1

        if not is_arrangement_valid:
            continue
        
        # This check handles the frequency comparison for the very last character group.

        if current_group_count > last_seen_character_count:
            is_arrangement_valid = False

        if not is_arrangement_valid:
            continue

        # A second pass is needed to confirm all instances of a character are in one group.

        all_characters_are_grouped = True
        seen_characters = set()
        for character in current_arrangement:
            if character in seen_characters and character != last_seen_character:
                all_characters_are_grouped = False
                break
            seen_characters.add(character)
            last_seen_character = character

        # If this arrangement satisfies all conditions, it's a valid answer.

        if all_characters_are_grouped:
            return current_arrangement

    return ""

Big(O) Analysis

Time Complexity
O(n! * n)The dominant cost comes from generating every possible arrangement, or permutation, of the input string of length n, which results in n! (n factorial) possibilities. For each of these n! permutations, we must then iterate through the string of length n to validate if it meets the sorting criteria, such as checking if characters are grouped and if those groups are sorted by frequency. This leads to a total number of operations proportional to n! * n, which simplifies to O(n! * n). This approach is computationally infeasible for all but the smallest input sizes.
Space Complexity
O(N * N!)The dominant factor in space complexity is step 1, which requires storing a list of every possible arrangement of the input string. For a string of length N, there are N! (N factorial) permutations, and each permutation itself has a length of N. Therefore, the space needed to hold all these arrangements is proportional to N multiplied by N!, which is represented as O(N * N!). The memory used for checking each individual arrangement is insignificant compared to this initial storage requirement.

Optimal Solution

Approach

The most efficient way to solve this is to first count how many times each character appears in the original text. Then, we can use these counts to arrange the characters, putting the most frequent ones first.

Here's how the algorithm would work step-by-step:

  1. First, go through the text and tally up the count for every unique character. It's like making a list of each character and how many times you saw it.
  2. Next, take these characters and sort them based on their counts, from highest to lowest. If two characters have the same count, their order doesn't matter.
  3. Finally, build the new, sorted text. For each character in your sorted list, add it to the result as many times as its count.
  4. Start with the character that appeared most often, add all of its occurrences, then move to the next most frequent, and so on until you've used all the characters.

Code Implementation

class Solution:
    def frequencySort(self, input_string: str) -> str:
        # First, we need a way to store the count of each character in the input string.
        character_frequency_map = {}
        for character in input_string:
            character_frequency_map[character] = character_frequency_map.get(character, 0) + 1

        # Sort the characters based on their frequency in descending order to prioritize common ones.
        sorted_characters = sorted(character_frequency_map, key=character_frequency_map.get, reverse=True)

        # Build the final string by appending each character repeated by its frequency count.
        result_string_builder = []
        for character in sorted_characters:
            count = character_frequency_map[character]
            result_string_builder.append(character * count)
        
        return "".join(result_string_builder)

Big(O) Analysis

Time Complexity
O(n log k)The time complexity is primarily driven by two main operations. First, we iterate through the input string of length n to count the frequency of each character, which takes O(n) time. Let k be the number of unique characters. The second and most significant step is sorting these k unique characters based on their frequencies, which typically takes O(k log k) time. Since k is at most n, the total time complexity is O(n + k log k), which simplifies to O(n log k) because in the worst case k can be proportional to n, making the sort dominant, or if k is a small constant, it simplifies to O(n).
Space Complexity
O(k)The primary auxiliary space is used for a hash map to store the frequency counts of each character. In the worst-case scenario, where all characters in the input string are unique, this map will require space proportional to the number of unique characters, which we'll denote as 'k'. The subsequent sorting step, if done in place or using a structure based on these unique characters, also depends on 'k', not the total length of the string 'N'. Therefore, the space complexity is determined by the size of the character set, resulting in O(k).

Edge Cases

Empty string
How to Handle:
The algorithm should return an empty string immediately as there are no characters to process.
Null input string
How to Handle:
The solution should handle this gracefully, typically by throwing an exception or returning an empty string, depending on language conventions.
String with a single character
How to Handle:
The algorithm will correctly return the original string since it is the only character and thus has the highest frequency.
String with all identical characters
How to Handle:
The algorithm will count a single character with a frequency equal to the string length and return the original string.
String with all unique characters
How to Handle:
All characters will have a frequency of 1, so any permutation of the original string is a valid output.
String with multiple characters having the same frequency
How to Handle:
The problem states relative order doesn't matter, so the algorithm can group these characters in any order, leading to multiple valid answers.
String with mixed-case characters like 'a' and 'A'
How to Handle:
The solution must treat uppercase and lowercase letters as distinct characters with their own separate frequencies.
String containing non-alphanumeric characters or numbers
How to Handle:
These characters should be treated just like any other, with their frequencies counted and sorted accordingly.
0/237 completed