Taro Logo

Top K Frequent Elements

Medium
Netflix logo
Netflix
6 views
Topics:
ArraysStacksGreedy Algorithms

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order. For example:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]
Input: nums = [1], k = 1
Output: [1]

Constraints:

  • 1 <= nums.length <= 10^5
  • -10^4 <= nums[i] <= 10^4
  • k is in the range [1, the number of unique elements in the array]. It is guaranteed that the answer is unique.

Can you design an algorithm that solves this problem with a time complexity better than O(n log n), where n is the array's size? Describe the algorithm, explain its time and space complexity, and handle potential edge cases. Provide example code with comments.

Solution


Top K Frequent Elements

Problem Description

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Naive Approach

  1. Count Frequencies: Iterate through the array and use a hash map (dictionary) to store the frequency of each element.
  2. Sort by Frequency: Sort the hash map entries (element-frequency pairs) based on the frequencies in descending order.
  3. Select Top K: Take the first k elements from the sorted list.

Code (Python)

def topKFrequent_naive(nums, k):
    counts = {}
    for n in nums:
        counts[n] = counts.get(n, 0) + 1
    
    sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)
    
    result = [item[0] for item in sorted_counts[:k]]
    return result

Time Complexity: O(n log n) due to sorting the frequencies.

Space Complexity: O(n) to store the frequencies in the hash map.

Optimal Approach: Using a Heap (Priority Queue)

  1. Count Frequencies: Same as the naive approach, use a hash map to store the frequency of each element.
  2. Heapify: Create a min-heap (priority queue) of size k. Iterate through the hash map entries. If the current element's frequency is greater than the smallest frequency in the heap, remove the smallest element and add the current element to the heap.
  3. Extract Top K: The heap now contains the k most frequent elements. Extract these elements from the heap.

Code (Python)

import heapq

def topKFrequent(nums, k):
    counts = {}
    for n in nums:
        counts[n] = counts.get(n, 0) + 1

    heap = []
    for num, freq in counts.items():
        if len(heap) == k:
            if freq > heap[0][0]:
                heapq.heapreplace(heap, (freq, num))
        else:
            heapq.heappush(heap, (freq, num))

    result = [num for freq, num in heap]
    return result

Time Complexity: O(n log k), where n is the number of elements in nums.

  • Counting frequencies: O(n)
  • Building and maintaining the heap: O(n log k)
  • Extracting elements: O(k). However, since this is dominated by O(n log k), we can ignore it.

Space Complexity: O(n) for the hash map to store the frequencies and O(k) for the heap.

Edge Cases

  • Empty Array: If the input array is empty, return an empty list.
  • k = 0: If k is 0, return an empty list.
  • k = Number of Unique Elements: If k is equal to the number of unique elements, return all unique elements.
  • All elements are same: Check for all elements are same and k = 1

Explanation

The heap-based approach is more efficient than the sorting-based approach when k is significantly smaller than n. The heap allows us to maintain the k most frequent elements seen so far without needing to sort all the frequencies. The heapq module in Python provides an efficient implementation of a min-heap.