Taro Logo

Top K Frequent Words

Medium
Google logo
Google
3 views
Topics:
ArraysStringsGreedy Algorithms

Given an array of strings words and an integer k, return the k most frequent strings. Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

For example:

  • words = ["i","love","leetcode","i","love","coding"], k = 2 should return ["i","love"]
  • words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4 should return ["the","is","sunny","day"]

Could you solve it in O(n log(k)) time and O(n) extra space?

Solution


Naive Solution

  1. Count the frequency of each word using a hash map.
  2. Sort the hash map by frequency in descending order. If frequencies are the same, sort lexicographically.
  3. Return the top k words from the sorted list.

Code (Python):

from collections import Counter

def topKFrequent(words, k):
    count = Counter(words)
    sorted_words = sorted(count.items(), key=lambda x: (-x[1], x[0]))
    return [word for word, freq in sorted_words[:k]]

Time Complexity: O(n log n) due to sorting, where n is the number of words.

Space Complexity: O(n) to store the counts in the hash map.

Optimal Solution

Using a min-heap (priority queue) to store the k most frequent words.

  1. Count the frequency of each word using a hash map.
  2. Create a min-heap of size k. The heap will store tuples of (frequency, word). We use a min-heap because we want to easily discard elements with lower frequency.
  3. Iterate through the hash map. For each word and its frequency:
    • If the heap size is less than k, add the word to the heap.
    • Otherwise, if the word's frequency is greater than the frequency of the word at the top of the heap, remove the top element and add the current word to the heap. If the frequencies are equal compare lexicographically and proceed only if the new word comes earlier.
  4. After iterating through all the words, the heap will contain the k most frequent words. Convert the heap to a list and sort it, then reverse it to get the words in the desired order.

Code (Python):

import heapq
from collections import Counter

def topKFrequent(words, k):
    count = Counter(words)
    heap = []
    for word, freq in count.items():
        heapq.heappush(heap, (freq, word))
        if len(heap) > k:
            heapq.heappop(heap)

    result = [word for freq, word in heap]
    result.sort(key=lambda w: (-count[w], w))

    return result[:k]

Code (Python - Improved):

import heapq
from collections import Counter

def topKFrequent(words, k):
    count = Counter(words)
    heap = []
    for word, freq in count.items():
        heapq.heappush(heap, (-freq, word))  # Negate frequency for max-heap behavior

    result = [heapq.heappop(heap)[1] for _ in range(k)]
    return result

Time Complexity: O(n log k), where n is the number of words. Building the counter takes O(n), and pushing/popping from the heap takes O(log k) and we do this at most n times.

Space Complexity: O(n) to store the counts in the hash map and O(k) for the heap.

Edge Cases

  • Empty input array: Return an empty list.
  • k is greater than the number of unique words: Return all unique words.
  • All words have the same frequency: Return the first k words in lexicographical order.