Taro Logo

Top K Frequent Elements

Medium
eBay logo
eBay
1 view
Topics:
ArraysStringsStacksGreedy Algorithms

You are given an integer array nums and an integer k. Your task is to find and return the k most frequent elements in the array. The order of the returned elements does not matter.

Example 1:

  • Input: nums = [1,1,1,2,2,3], k = 2
  • Output: [1,2] (or [2,1], the order doesn't matter)
    • Explanation: The number 1 appears three times, the number 2 appears twice, and the number 3 appears once. The two most frequent elements are 1 and 2.

Example 2:

  • Input: nums = [1], k = 1
  • Output: [1]
    • Explanation: The only element in the array is 1, and since k=1, we return it.

Constraints:

  • 1 <= nums.length <= 10^5
  • -10^4 <= nums[i] <= 10^4
  • k is always valid, meaning 1 <= k <= (number of unique elements in the array).
  • The answer is guaranteed to be unique.

Follow-up: Can you design an algorithm with a time complexity better than O(n log n), where n is the size of the input array?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values within the `nums` array? Can I assume a reasonable maximum value?
  2. Is `k` always a valid value within the range of 1 to the number of unique elements in `nums`? What should I return if `k` is larger than the number of unique elements?
  3. If multiple elements have the same frequency and including all of them would result in more than `k` elements, how should I decide which `k` elements to return?
  4. Can the input array `nums` be empty or null?
  5. What data type should I return the list of top K frequent elements as?

Brute Force Solution

Approach

The goal is to find the most frequently occurring items from a larger set. The brute force way means we count each item one by one, comparing against every other item. Then we sort them based on how often they appeared and pick the top ones.

Here's how the algorithm would work step-by-step:

  1. First, we go through our entire collection of items.
  2. For each unique item we find, we go back through the entire collection and count how many times that specific item appears.
  3. We save that count alongside the item, so we know how frequent it is.
  4. Once we've done this for every unique item, we have a list of items and their frequencies.
  5. Now, we need to arrange this list from the most frequent to the least frequent.
  6. Finally, after arranging the list, we simply pick the top 'K' items from the beginning of the list. These are our most frequent elements.

Code Implementation

def top_k_frequent_elements_brute_force(numbers, k_most_frequent):
    element_counts = []
    unique_numbers = []

    # Iterate through the input to determine unique elements
    for number in numbers:
        if number not in unique_numbers:
            unique_numbers.append(number)

    # Count frequency of each unique element
    for unique_number in unique_numbers:
        frequency = 0

        # Iterate through the list to determine count
        for number in numbers:
            if number == unique_number:
                frequency += 1

        element_counts.append((unique_number, frequency))

    # Sort element counts based on frequency
    element_counts.sort(key=lambda item: item[1], reverse=True)

    # Return the top 'k' frequent elements
    top_k_elements = []

    # Limit based on k value
    for i in range(min(k_most_frequent, len(element_counts))):
        top_k_elements.append(element_counts[i][0])

    return top_k_elements

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through the input array of size n to identify unique elements. For each unique element, it iterates through the entire array again to count its occurrences. This nested loop structure results in a time complexity where for each of the n (or approximately n) unique elements, we perform another n operations to count its frequency. Therefore, the total number of operations approximates n * n, leading to a time complexity of O(n²).
Space Complexity
O(N)The algorithm first iterates through the collection of items (where N is the total number of items) to identify unique items. In the worst-case scenario, all items are unique. Then, for each unique item, it counts its frequency and stores the item and its corresponding count. This requires storing, at most, N unique items along with their frequencies. Finally, sorting these items by frequency will, in most implementations, require an additional temporary space proportional to N. Therefore, the overall auxiliary space complexity is O(N).

Optimal Solution

Approach

To find the most frequent items, we want to count how many times each one appears and then pick the top few. We use a system to sort items by their count, letting us quickly grab the most frequent ones without checking every item repeatedly.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each number appears in the list. For example, if the number '5' appears three times, we remember that.
  2. Next, imagine sorting the numbers based on how many times they appeared, from most frequent to least frequent.
  3. Now that they are sorted, simply pick the first 'K' numbers from the sorted list. These are your top 'K' most frequent numbers.

Code Implementation

def top_k_frequent(numbers, k_most_frequent):
    element_counts = {}
    for number in numbers:
        element_counts[number] = element_counts.get(number, 0) + 1

    # Bucket sort based on frequency, handling edge cases.
    frequency_buckets = [[] for _ in range(len(numbers) + 1)]
    for number, count in element_counts.items():
        frequency_buckets[count].append(number)

    top_k_elements = []
    # Iterate backwards to get most frequent elements first.
    for count in range(len(numbers), 0, -1):
        for number in frequency_buckets[count]:
            top_k_elements.append(number)
            # Stop once we have the k most frequent elements.
            if len(top_k_elements) == k_most_frequent:
                return top_k_elements

    return top_k_elements

Big(O) Analysis

Time Complexity
O(n log n)The first step involves counting the frequency of each number, which takes O(n) time where n is the size of the input array. Subsequently, we sort the numbers based on their frequencies. A typical sorting algorithm (like merge sort or heap sort) takes O(m log m) time, where m is the number of unique elements. In the worst case, m is equal to n, leading to O(n log n) complexity for sorting. Finally, selecting the top K elements takes O(K) time. Because O(n log n) dominates O(n) and O(K), the overall time complexity is O(n log n).
Space Complexity
O(N)The solution first counts the frequency of each number in the input list using a hash map (or dictionary). In the worst case, where all numbers are distinct, the hash map will store N key-value pairs, where N is the number of elements in the input list. Therefore, the auxiliary space required for the frequency map is proportional to N. Furthermore, the sorting step, though not explicitly described with a specific data structure in the plain English explanation, implies potentially storing the elements and their frequencies to facilitate sorting which can take up to O(N) space. Thus the overall space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input arrayReturn an empty list immediately as there are no elements to count frequencies for.
k is zeroReturn an empty list since we want to find zero most frequent elements.
k is larger than the number of unique elementsReturn all unique elements, as that's the maximum number of frequent elements available.
Array contains only one unique elementReturn the array containing only that element if k>=1, otherwise return empty array if k=0.
All elements in the array are the sameReturn the element within a list of size 1 if k>=1, otherwise return empty array if k=0.
Large input array exceeding memory limits for frequency mapConsider using external memory or approximate frequency counting algorithms if memory is a constraint.
Negative numbers present in the arrayThe solution should handle negative numbers correctly since hash map can store negative numbers as keys.
Extreme boundary integer values in the array (e.g., Integer.MAX_VALUE, Integer.MIN_VALUE)The solution must handle extreme values without causing integer overflow issues in the frequency counting or comparison steps.