Taro Logo

Top K Frequent Elements #12 Most Asked

Medium
12 views
Topics:
ArraysStacksDynamic ProgrammingGreedy Algorithms

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:

Input: nums = [1], k = 1
Output: [1]

Constraints:

  • 1 <= nums.length <= 105
  • -104 <= nums[i] <= 104
  • k is in the range [1, the number of unique elements in the array].
  • It is guaranteed that the answer is unique.

Follow up: Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the value of `k`?
  2. Can the input array `nums` contain negative numbers, zeros, or duplicate values?
  3. Is `k` always a valid positive integer, and is it guaranteed to be less than or equal to the number of unique elements in `nums`?
  4. If multiple elements have the same frequency and would qualify for the top k, how should I decide which ones to include?
  5. What should be returned if the input array is empty or null?

Brute Force Solution

Approach

The brute force method for finding the most frequent items involves counting how often each item appears in a list. Then, we simply pick out the items with the highest counts. This is done by comparing each item's count against every other item's count.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each unique item appears in the entire list.
  2. Next, make a list of all the unique items we found.
  3. For each of the unique items, go through the entire list of counts and see which items have counts that are greater than or equal to it.
  4. Keep track of the top K most frequent items you encounter in the prior step, and return that list.

Code Implementation

def top_k_frequent_elements_brute_force(numbers, k_most_frequent):
    item_counts = {}
    for number in numbers:
        if number in item_counts:
            item_counts[number] += 1
        else:
            item_counts[number] = 1

    unique_items = list(item_counts.keys())

    top_k_frequent = []

    for item_to_check in unique_items:
        frequency_count = item_counts[item_to_check]
        is_among_top_k = True

        # Check if the current item is frequent enough to be in the top k
        for other_item in unique_items:
            if item_counts[other_item] > frequency_count:
                is_among_top_k = False
                break

        if is_among_top_k:
            top_k_frequent.append(item_to_check)

    # Limit the result to the top k elements
    return top_k_frequent[:k_most_frequent]

Big(O) Analysis

Time Complexity
O(n²)Counting the frequency of each element in the input array of size 'n' takes O(n) time. Constructing the list of unique elements also takes O(n) time in the worst case where all elements are unique. The nested loop structure iterates through each unique element (at most 'n') and compares its count against all other elements' counts (again, at most 'n'). Therefore, the dominant cost is the nested comparison of counts, resulting in approximately n * n operations. This simplifies to O(n²).
Space Complexity
O(N)The algorithm first counts the frequency of each unique item, requiring a data structure (like a hash map or dictionary) to store these counts. In the worst case, all N elements in the input list are unique, leading to N entries in this frequency counter. A list of unique items is also created, which in the worst case could also contain all N elements. Therefore, the auxiliary space used is proportional to the number of elements in the input list, resulting in O(N) space complexity.

Optimal Solution

Approach

To find the most frequent elements efficiently, we use a clever method to count how often each element appears and then quickly identify the top ones. We organize elements based on their frequency, allowing us to grab the most frequent ones quickly.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each element appears in the input.
  2. Then, organize the elements into groups based on their frequency. Think of it like creating buckets where elements with the same frequency go into the same bucket.
  3. Start with the bucket containing the highest frequency elements, and add those elements to your result.
  4. Continue adding elements from buckets with decreasing frequencies until you have the required number of top frequent elements.

Code Implementation

def top_k_frequent(numbers, k_most_frequent):
    element_frequency = {}
    for number in numbers:
        element_frequency[number] = element_frequency.get(number, 0) + 1

    # Create buckets to store elements based on frequency.
    frequency_buckets = [[] for _ in range(len(numbers) + 1)]
    for number, frequency in element_frequency.items():
        frequency_buckets[frequency].append(number)

    result = []

    # Iterate from highest frequency to lowest.
    for frequency in range(len(numbers), 0, -1):
        if frequency_buckets[frequency]:

            # Add elements to result until we have k elements.
            for number in frequency_buckets[frequency]:
                result.append(number)
                if len(result) == k_most_frequent:
                    return result

    return result

Big(O) Analysis

Time Complexity
O(n)The first step involves counting the frequency of each element in the input array of size n, which takes O(n) time. The next step of organizing elements into buckets based on their frequency also takes O(n) time in the worst case, as each element could have a distinct frequency. Finally, iterating through the buckets to collect the top k elements is upper bounded by O(n) because in the worst case, we might need to examine all buckets. Therefore, the overall time complexity is dominated by the linear operations resulting in O(n).
Space Complexity
O(N)The algorithm uses a hash map to store the frequency of each element, which could potentially store all N elements of the input array. Additionally, it uses a list of lists (buckets) where the index represents the frequency and the value is a list of elements with that frequency. In the worst case, all elements are unique, leading to N buckets. Thus, the auxiliary space used is proportional to N, resulting in a space complexity of O(N).

Edge Cases

Input array is empty or null
How to Handle:
The solution should handle this gracefully by returning an empty list, as there are no elements to count.
The value of k is equal to the number of unique elements in the array
How to Handle:
The solution should return all the unique elements, as they are all part of the top k most frequent.
The value of k is 1
How to Handle:
The algorithm correctly identifies and returns the single most frequent element in the array.
All elements in the array are identical
How to Handle:
The frequency map will contain only one entry, and the solution will correctly return that single element.
All elements in the array are unique
How to Handle:
The solution must return any k elements from the input array, since all elements have a frequency of one.
Multiple elements have the same frequency, creating a tie for the kth spot
How to Handle:
The algorithm can return any of the tied elements to complete the set of k, as the problem allows any valid answer.
The input array contains negative numbers and zeros
How to Handle:
A hash map correctly handles any integer value as a key, so negative numbers and zeros are processed just like positive ones.
The input array is very large, potentially causing memory issues
How to Handle:
A streaming or partial processing approach might be needed if the frequency map becomes too large to fit in memory.
0/0 completed