Given an array of strings words
and an integer k
, return the k
most frequent strings. Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.
For example:
words = ["i","love","leetcode","i","love","coding"], k = 2. The expected output is ["i","love"]. "i" and "love" are the two most frequent words. Note that "i" comes before "love" due to a lower alphabetical order.
words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4. The expected output is ["the","is","sunny","day"]. "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.
Consider edge cases such as:
words
is empty?k
is zero, negative, or larger than the number of unique words in words
?How would you optimize the solution to achieve O(n log k) time complexity?
A straightforward approach involves counting the frequency of each word and then sorting the words based on frequency and lexicographical order. This can be achieved using a hash map to store word frequencies and then sorting the map's entries.
k
words from the sorted list.from collections import Counter
def topKFrequent(words, k):
word_counts = Counter(words)
sorted_words = sorted(word_counts.items(), key=lambda x: (-x[1], x[0]))
return [word for word, count in sorted_words[:k]]
An optimized approach involves using a min-heap to keep track of the k
most frequent words. This approach can achieve O(n log k) time complexity.
k
. The heap will store (frequency, word) tuples. Since we want to prioritize higher frequency words, and break ties lexicographically, we'll store the negative frequency. Python's heapq
is a min-heap, so smaller values are prioritized.import heapq
from collections import Counter
def topKFrequent(words, k):
word_counts = Counter(words)
heap = []
for word, count in word_counts.items():
heapq.heappush(heap, (-count, word))
result = []
for _ in range(k):
count, word = heapq.heappop(heap)
result.append(word)
return result
k
.words
is empty, return an empty list.k
is greater than the number of unique words, return all unique words sorted by frequency and lexicographical order.