Taro Logo

Top K Frequent Elements

Medium
Disney logo
Disney
0 views
Topics:
Arrays

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:

Input: nums = [1], k = 1
Output: [1]

Constraints:

  • 1 <= nums.length <= 105
  • -104 <= nums[i] <= 104
  • k is in the range [1, the number of unique elements in the array].
  • It is guaranteed that the answer is unique.

Follow up: Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the value of `k`?
  2. Can the input array `nums` contain negative numbers, zeros, or duplicate values?
  3. Is `k` always a valid positive integer, and is it guaranteed to be less than or equal to the number of unique elements in `nums`?
  4. If multiple elements have the same frequency and would qualify for the top k, how should I decide which ones to include?
  5. What should be returned if the input array is empty or null?

Brute Force Solution

Approach

The brute force method for finding the most frequent items involves counting how often each item appears in a list. Then, we simply pick out the items with the highest counts. This is done by comparing each item's count against every other item's count.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each unique item appears in the entire list.
  2. Next, make a list of all the unique items we found.
  3. For each of the unique items, go through the entire list of counts and see which items have counts that are greater than or equal to it.
  4. Keep track of the top K most frequent items you encounter in the prior step, and return that list.

Code Implementation

def top_k_frequent_elements_brute_force(numbers, k_most_frequent):
    item_counts = {}
    for number in numbers:
        if number in item_counts:
            item_counts[number] += 1
        else:
            item_counts[number] = 1

    unique_items = list(item_counts.keys())

    top_k_frequent = []

    for item_to_check in unique_items:
        frequency_count = item_counts[item_to_check]
        is_among_top_k = True

        # Check if the current item is frequent enough to be in the top k
        for other_item in unique_items:
            if item_counts[other_item] > frequency_count:
                is_among_top_k = False
                break

        if is_among_top_k:
            top_k_frequent.append(item_to_check)

    # Limit the result to the top k elements
    return top_k_frequent[:k_most_frequent]

Big(O) Analysis

Time Complexity
O(n²)Counting the frequency of each element in the input array of size 'n' takes O(n) time. Constructing the list of unique elements also takes O(n) time in the worst case where all elements are unique. The nested loop structure iterates through each unique element (at most 'n') and compares its count against all other elements' counts (again, at most 'n'). Therefore, the dominant cost is the nested comparison of counts, resulting in approximately n * n operations. This simplifies to O(n²).
Space Complexity
O(N)The algorithm first counts the frequency of each unique item, requiring a data structure (like a hash map or dictionary) to store these counts. In the worst case, all N elements in the input list are unique, leading to N entries in this frequency counter. A list of unique items is also created, which in the worst case could also contain all N elements. Therefore, the auxiliary space used is proportional to the number of elements in the input list, resulting in O(N) space complexity.

Optimal Solution

Approach

To find the most frequent elements efficiently, we use a clever method to count how often each element appears and then quickly identify the top ones. We organize elements based on their frequency, allowing us to grab the most frequent ones quickly.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each element appears in the input.
  2. Then, organize the elements into groups based on their frequency. Think of it like creating buckets where elements with the same frequency go into the same bucket.
  3. Start with the bucket containing the highest frequency elements, and add those elements to your result.
  4. Continue adding elements from buckets with decreasing frequencies until you have the required number of top frequent elements.

Code Implementation

def top_k_frequent(numbers, k_most_frequent):
    element_frequency = {}
    for number in numbers:
        element_frequency[number] = element_frequency.get(number, 0) + 1

    # Create buckets to store elements based on frequency.
    frequency_buckets = [[] for _ in range(len(numbers) + 1)]
    for number, frequency in element_frequency.items():
        frequency_buckets[frequency].append(number)

    result = []

    # Iterate from highest frequency to lowest.
    for frequency in range(len(numbers), 0, -1):
        if frequency_buckets[frequency]:

            # Add elements to result until we have k elements.
            for number in frequency_buckets[frequency]:
                result.append(number)
                if len(result) == k_most_frequent:
                    return result

    return result

Big(O) Analysis

Time Complexity
O(n)The first step involves counting the frequency of each element in the input array of size n, which takes O(n) time. The next step of organizing elements into buckets based on their frequency also takes O(n) time in the worst case, as each element could have a distinct frequency. Finally, iterating through the buckets to collect the top k elements is upper bounded by O(n) because in the worst case, we might need to examine all buckets. Therefore, the overall time complexity is dominated by the linear operations resulting in O(n).
Space Complexity
O(N)The algorithm uses a hash map to store the frequency of each element, which could potentially store all N elements of the input array. Additionally, it uses a list of lists (buckets) where the index represents the frequency and the value is a list of elements with that frequency. In the worst case, all elements are unique, leading to N buckets. Thus, the auxiliary space used is proportional to N, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Empty input arrayReturn an empty list as there are no elements to process.
k is zeroReturn an empty list as we need to return the top 0 elements.
k is larger than the number of unique elements in the input arrayReturn all unique elements in the array as the top k elements.
Input array contains only one elementReturn the single element in a list if k >= 1, or an empty list if k is zero.
All elements in the input array are the sameReturn the element in a list if k >= 1, or an empty list if k is zero.
Input array contains negative numbers, zeros, and positive numbersFrequency counting methods handle all integer values without issue.
Large input array with a small k, making priority queue inefficientBucket sort can be more efficient in this case, providing O(n) time complexity.
Duplicate frequencies exist: multiple elements have the same frequency and could qualify for the top kThe algorithm should return any k elements that satisfy the frequency requirement, as the problem states that the order doesn't matter.