Taro Logo

Sort Characters By Frequency

Medium
4 views
2 months ago

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

Example 1:

Input: s = "tree"
Output: "eert"
Explanation: 'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc"
Explanation: Both 'c' and 'a' appear three times, so both "cccaaa" and "aaaccc" are valid answers.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input: s = "Aabb"
Output: "bbAa"
Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

What data structures and algorithms would you use to implement this, and what is the time and space complexity of your solution? Be sure to consider edge cases and alternative approaches.

Sample Answer

Frequency Sort

Problem Description

Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.

Return the sorted string. If there are multiple answers, return any of them.

Example 1:

Input: s = "tree"
Output: "eert"
Explanation: 'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc"
Explanation: Both 'c' and 'a' appear three times, so both "cccaaa" and "aaaccc" are valid answers.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input: s = "Aabb"
Output: "bbAa"
Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

Naive Solution

A naive solution would involve counting the frequency of each character in the string, then sorting the characters based on their frequencies and concatenating them to form the final string. This can be achieved using a hash map to store character frequencies, followed by sorting the entries of the hash map based on value (frequency). Finally, iterate through the sorted frequencies to build the output string.

import java.util.*;

class Solution {
    public String frequencySort(String s) {
        Map<Character, Integer> charFrequencyMap = new HashMap<>();
        for (char c : s.toCharArray()) {
            charFrequencyMap.put(c, charFrequencyMap.getOrDefault(c, 0) + 1);
        }

        List<Map.Entry<Character, Integer>> list = new ArrayList<>(charFrequencyMap.entrySet());
        list.sort((a, b) -> b.getValue() - a.getValue());

        StringBuilder sortedString = new StringBuilder();
        for (Map.Entry<Character, Integer> entry : list) {
            char character = entry.getKey();
            int frequency = entry.getValue();
            for (int i = 0; i < frequency; i++) {
                sortedString.append(character);
            }
        }

        return sortedString.toString();
    }
}

Optimal Solution

The optimal solution also involves counting the frequency of characters, but instead of sorting the entries using Collections.sort, we can use a bucket sort approach. This takes advantage of the fact that the frequencies are bounded by the length of the string. Create an array of lists where the index represents the frequency, and the list at that index stores the characters with that frequency. Then iterate through the array in reverse order to build the sorted string.

import java.util.*;

class Solution {
    public String frequencySort(String s) {
        Map<Character, Integer> charFrequencyMap = new HashMap<>();
        for (char c : s.toCharArray()) {
            charFrequencyMap.put(c, charFrequencyMap.getOrDefault(c, 0) + 1);
        }

        List<Character>[] bucket = new List[s.length() + 1];
        for (char c : charFrequencyMap.keySet()) {
            int frequency = charFrequencyMap.get(c);
            if (bucket[frequency] == null) {
                bucket[frequency] = new ArrayList<>();
            }
            bucket[frequency].add(c);
        }

        StringBuilder sortedString = new StringBuilder();
        for (int i = bucket.length - 1; i >= 0; i--) {
            if (bucket[i] != null) {
                for (char c : bucket[i]) {
                    for (int j = 0; j < i; j++) {
                        sortedString.append(c);
                    }
                }
            }
        }

        return sortedString.toString();
    }
}

Big(O) Run-time Analysis

  • Naive Solution:
    • Counting frequencies: O(n), where n is the length of the string.
    • Sorting the entries: O(k log k), where k is the number of distinct characters. In the worst case, k can be equal to n.
    • Building the sorted string: O(n)
    • Total: O(n + k log k + n) which simplifies to O(n + k log k). If k is close to n, it becomes O(n log n).
  • Optimal Solution:
    • Counting frequencies: O(n), where n is the length of the string.
    • Creating buckets: O(k), where k is the number of distinct characters. Again, in worst case it could be O(n).
    • Iterating through buckets and building the string: O(n)
    • Total: O(n + k + n) which simplifies to O(n).

Big(O) Space Usage Analysis

  • Naive Solution:
    • HashMap: O(k), where k is the number of distinct characters.
    • ArrayList: O(k)
    • StringBuilder: O(n)
    • Total: O(k + k + n) which simplifies to O(n), since k <= n.
  • Optimal Solution:
    • HashMap: O(k), where k is the number of distinct characters.
    • Bucket array: O(n+1) which is O(n)
    • StringBuilder: O(n)
    • Total: O(k + n + n) which simplifies to O(n), since k <= n.

Edge Cases

  1. Empty String: If the input string is empty, return an empty string.
  2. String with all same characters: For example, "aaaa". The output should be the same string.
  3. String with distinct characters: For example, "abc". The output can be any permutation of "abc".
  4. Large Input String: Ensure that the code handles very large input strings efficiently, particularly with regard to memory usage.
  5. String with Unicode characters: The code should be able to handle Unicode characters correctly. The given solutions should already handle this as the char type in Java can store unicode characters.