Top K Frequent Words

Medium

Top K Frequent Words

Medium

Solution

Naive Approach

A straightforward approach involves counting the frequency of each word and then sorting the words based on frequency and lexicographical order. This can be achieved using a hash map to store word frequencies and then sorting the map's entries.

Count Frequencies: Use a hash map (dictionary) to store each word and its frequency.
Sort: Convert the map into a list of (word, frequency) pairs. Sort this list based on the frequency in descending order. If frequencies are the same, sort lexicographically in ascending order.
Extract Top K: Take the first k words from the sorted list.

Code (Python)

from collections import Counter

def topKFrequent(words, k):
    word_counts = Counter(words)
    sorted_words = sorted(word_counts.items(), key=lambda x: (-x[1], x[0]))
    return [word for word, count in sorted_words[:k]]

Time Complexity

O(n) to count word frequencies.
O(n log n) to sort the words.
Therefore, the overall time complexity is O(n log n).

Space Complexity

O(n) to store the word counts in the hash map.
O(n) to store the sorted list of words.
Overall, the space complexity is O(n).

Optimal Approach: Using a Heap

An optimized approach involves using a min-heap to keep track of the k most frequent words. This approach can achieve O(n log k) time complexity.

Count Frequencies: Use a hash map (dictionary) to store each word and its frequency.
Heapify: Create a min-heap of size k. The heap will store (frequency, word) tuples. Since we want to prioritize higher frequency words, and break ties lexicographically, we'll store the negative frequency. Python's heapq is a min-heap, so smaller values are prioritized.
Iterate and Maintain Heap: Iterate through the word counts. If the current word's frequency is greater than the smallest frequency in the heap, or if the frequency is equal but the word is lexicographically smaller, replace the smallest element in the heap with the current word.
Extract Top K: Pop elements from the heap to get the top k frequent words, reversing the order to obtain the desired sorted output.

Code (Python)

import heapq
from collections import Counter

def topKFrequent(words, k):
    word_counts = Counter(words)

    heap = []
    for word, count in word_counts.items():
        heapq.heappush(heap, (-count, word))

    result = []
    for _ in range(k):
        count, word = heapq.heappop(heap)
        result.append(word)

    return result

Time Complexity

O(n) to count word frequencies.
O(n log k) to maintain the heap of size k.
Therefore, the overall time complexity is O(n log k).

Space Complexity

O(n) to store the word counts in the hash map.
O(k) to store the heap.
Overall, the space complexity is O(n).

Edge Cases

Empty Input: If the input array words is empty, return an empty list.
k > Number of Unique Words: If k is greater than the number of unique words, return all unique words sorted by frequency and lexicographical order.
All Words Same Frequency: When all words have the same frequency, ensure the lexicographical order is correctly maintained.

Solution

Naive Approach

Count Frequencies: Use a hash map (dictionary) to store each word and its frequency.
Sort: Convert the map into a list of (word, frequency) pairs. Sort this list based on the frequency in descending order. If frequencies are the same, sort lexicographically in ascending order.
Extract Top K: Take the first k words from the sorted list.

Code (Python)

from collections import Counter

def topKFrequent(words, k):
    word_counts = Counter(words)
    sorted_words = sorted(word_counts.items(), key=lambda x: (-x[1], x[0]))
    return [word for word, count in sorted_words[:k]]

Time Complexity

O(n) to count word frequencies.
O(n log n) to sort the words.
Therefore, the overall time complexity is O(n log n).

Space Complexity

O(n) to store the word counts in the hash map.
O(n) to store the sorted list of words.
Overall, the space complexity is O(n).

Optimal Approach: Using a Heap

An optimized approach involves using a min-heap to keep track of the k most frequent words. This approach can achieve O(n log k) time complexity.

Count Frequencies: Use a hash map (dictionary) to store each word and its frequency.
Heapify: Create a min-heap of size k. The heap will store (frequency, word) tuples. Since we want to prioritize higher frequency words, and break ties lexicographically, we'll store the negative frequency. Python's heapq is a min-heap, so smaller values are prioritized.
Iterate and Maintain Heap: Iterate through the word counts. If the current word's frequency is greater than the smallest frequency in the heap, or if the frequency is equal but the word is lexicographically smaller, replace the smallest element in the heap with the current word.
Extract Top K: Pop elements from the heap to get the top k frequent words, reversing the order to obtain the desired sorted output.

Code (Python)

import heapq
from collections import Counter

def topKFrequent(words, k):
    word_counts = Counter(words)

    heap = []
    for word, count in word_counts.items():
        heapq.heappush(heap, (-count, word))

    result = []
    for _ in range(k):
        count, word = heapq.heappop(heap)
        result.append(word)

    return result

Time Complexity

O(n) to count word frequencies.
O(n log k) to maintain the heap of size k.
Therefore, the overall time complexity is O(n log k).

Space Complexity

O(n) to store the word counts in the hash map.
O(k) to store the heap.
Overall, the space complexity is O(n).

Edge Cases

Empty Input: If the input array words is empty, return an empty list.
k > Number of Unique Words: If k is greater than the number of unique words, return all unique words sorted by frequency and lexicographical order.
All Words Same Frequency: When all words have the same frequency, ensure the lexicographical order is correctly maintained.