Given an array of strings words
and an integer k
, return the k
most frequent strings.
Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.
For example:
words = ["i","love","leetcode","i","love","coding"], k = 2
["i","love"]
words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
["the","is","sunny","day"]
Could you solve it in O(n log(k))
time and O(n)
extra space?
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force method to find the most frequent words involves checking each word, counting its occurrences, and comparing those counts. We essentially look at all the words one by one to find the top few most frequent.
Here's how the algorithm would work step-by-step:
def top_k_frequent_words_brute_force(words, k_most_frequent):
word_counts = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
# Convert the dictionary to a list of tuples
word_frequency_list = list(word_counts.items())
# Sort by frequency.
word_frequency_list.sort(key=lambda item: item[1], reverse=True)
# Extract the top k words.
top_k_words = [word for word, count in word_frequency_list[:k_most_frequent]]
return top_k_words
To find the most frequent words, we'll first count how often each word appears. Then, we'll use a special data structure to quickly retrieve the top K words based on their frequency, efficiently handling ties in frequency by comparing the words alphabetically.
Here's how the algorithm would work step-by-step:
import heapq
def top_k_frequent(words, k):
word_counts = {}
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
# Use a min-heap to keep track of the top k frequent words.
min_heap = []
for word, count in word_counts.items():
heapq.heappush(min_heap, (count, word))
# Maintain heap size of k
if len(min_heap) > k:
heapq.heappop(min_heap)
# Sort to handle ties, ensuring alphabetical order for equal frequencies
top_k_words = sorted([word for count, word in min_heap], key=lambda x: (word_counts[x], x), reverse=True)
# Sort the words based on frequency and then alphabetically
top_k_words.sort(key=lambda word: (-word_counts[word], word))
return top_k_words[:k]
Case | How to Handle |
---|---|
words is null or empty | Return an empty list immediately since there are no words to count. |
k is zero or negative | Return an empty list as we cannot return a negative or zero number of elements. |
k is larger than the number of unique words in words | Return all the unique words sorted by frequency and then lexicographically. |
words contains only one unique word repeated multiple times | The map will contain only one entry; sort will correctly handle this giving the single word. |
All words in words have the same frequency | The sort will rely entirely on lexicographical order in this case. |
words contains very long strings which could impact memory | The program's memory usage will increase, but the algorithm remains functionally correct assuming memory is available. |
Large input array size for words causing potential memory issues | Ensure efficient data structures are used (HashMap, PriorityQueue) and consider memory limitations. |
Integer overflow in frequency counts | Use a larger data type (e.g., long) for storing frequencies if needed to prevent integer overflow. |