Kth Largest Element in an Array

Medium

Google

2 views

Topics:

ArraysGreedy Algorithms

Given an integer array nums and an integer k, return the kth largest element in the array. Note that it is the kth largest element in the sorted order, not the kth distinct element. Can you solve it without sorting?

For example:

nums = [3,2,1,5,6,4], k = 2 Output: 5
nums = [3,2,3,1,2,4,5,5,6], k = 4 Output: 4

Write an efficient algorithm to solve this problem. Consider various approaches and their time and space complexities. What are the edge cases to consider?

Kth Largest Element in an Array

Medium

Google

2 views

Topics:

ArraysGreedy Algorithms

For example:

nums = [3,2,1,5,6,4], k = 2 Output: 5
nums = [3,2,3,1,2,4,5,5,6], k = 4 Output: 4

Write an efficient algorithm to solve this problem. Consider various approaches and their time and space complexities. What are the edge cases to consider?

Solution

Finding the Kth Largest Element in an Array

Here's how to find the kth largest element in an array. We'll start with a naive approach and then move to a more efficient solution.

1. Naive Approach: Sorting

The simplest approach is to sort the array in descending order and then return the element at index k-1. This is because the kth largest element will be at that position after sorting.

Example:

nums = [3, 2, 1, 5, 6, 4], k = 2

Sort nums in descending order: [6, 5, 4, 3, 2, 1]
Return the element at index k-1 = 2-1 = 1, which is 5.

Code (Python):

def find_kth_largest_naive(nums, k):
    nums.sort(reverse=True)
    return nums[k-1]

Time Complexity: O(n log n) due to sorting.

Space Complexity: O(1) or O(n), depending on the sorting algorithm implementation. In-place sorts like heapsort use O(1), while others like merge sort use O(n).

2. Optimal Approach: Using a Min-Heap

We can use a min-heap data structure to efficiently find the kth largest element. The idea is to maintain a min-heap of size k. Iterate through the array:

For the first k elements, add them to the min-heap.
For the remaining elements, if an element is greater than the root of the min-heap (the smallest element in the heap), remove the root and add the current element.

After processing all elements, the root of the min-heap will be the kth largest element.

Example:

nums = [3, 2, 1, 5, 6, 4], k = 2

Initialize a min-heap with the first k=2 elements: [3, 2]. The min-heap will be [2, 3].
Iterate through the remaining elements:
- 5: 5 > 2, so remove 2 and add 5. Heap: [3, 5].
- 6: 6 > 3, so remove 3 and add 6. Heap: [5, 6].
- 4: 4 < 5, so do nothing.
The root of the min-heap is 5, which is the 2nd largest element.

Code (Python):

import heapq

def find_kth_largest(nums, k):
    min_heap = []
    for num in nums:
        if len(min_heap) < k:
            heapq.heappush(min_heap, num)
        elif num > min_heap[0]:
            heapq.heapreplace(min_heap, num)
    return min_heap[0]

Time Complexity: O(n log k). Building the initial heap takes O(k log k). Iterating through the rest of the array takes O((n-k) log k) in the worst case, for a total of O(k log k + (n-k) log k) = O(n log k).

Space Complexity: O(k) to store the min-heap.

3. Edge Cases

Empty Array: If the input array is empty, we should throw an exception or return None, depending on the problem's requirements.
k Out of Bounds: If k is less than 1 or greater than the length of the array, we should throw an exception or return None.
Duplicate Elements: The algorithm should work correctly with duplicate elements in the array, as demonstrated in Example 2.

4. Quickselect (Average O(n) Time)

For the average case, the Quickselect algorithm provides an O(n) time solution. It's based on the partitioning logic of Quicksort, but instead of recursively sorting both sides of the partition, it only recurses into the side that contains the kth largest element. This avoids unnecessary sorting, leading to better performance on average. However, the worst-case time complexity is O(n^2), which occurs when the pivot is consistently the smallest or largest element.

import random

def find_kth_largest_quickselect(nums, k):
    def partition(l, r):
        pivot_index = random.randint(l, r)
        pivot = nums[pivot_index]
        nums[r], nums[pivot_index] = nums[pivot_index], nums[r]

        i = l
        for j in range(l, r):
            if nums[j] < pivot:
                nums[i], nums[j] = nums[j], nums[i]
                i += 1

        nums[i], nums[r] = nums[r], nums[i]
        return i

    def quickselect(l, r, k_smallest):
        if l == r:
            return nums[l]

        partition_index = partition(l, r)

        if k_smallest == partition_index:
            return nums[k_smallest]
        elif k_smallest < partition_index:
            return quickselect(l, partition_index - 1, k_smallest)
        else:
            return quickselect(partition_index + 1, r, k_smallest)

    return quickselect(0, len(nums) - 1, len(nums) - k)

Time Complexity: Average: O(n), Worst: O(n^2) Space Complexity: O(1) - in-place

Solution

Finding the Kth Largest Element in an Array

Here's how to find the kth largest element in an array. We'll start with a naive approach and then move to a more efficient solution.

1. Naive Approach: Sorting

The simplest approach is to sort the array in descending order and then return the element at index k-1. This is because the kth largest element will be at that position after sorting.

Example:

nums = [3, 2, 1, 5, 6, 4], k = 2

Sort nums in descending order: [6, 5, 4, 3, 2, 1]
Return the element at index k-1 = 2-1 = 1, which is 5.

Code (Python):

def find_kth_largest_naive(nums, k):
    nums.sort(reverse=True)
    return nums[k-1]

Time Complexity: O(n log n) due to sorting.

Space Complexity: O(1) or O(n), depending on the sorting algorithm implementation. In-place sorts like heapsort use O(1), while others like merge sort use O(n).

2. Optimal Approach: Using a Min-Heap

We can use a min-heap data structure to efficiently find the kth largest element. The idea is to maintain a min-heap of size k. Iterate through the array:

For the first k elements, add them to the min-heap.
For the remaining elements, if an element is greater than the root of the min-heap (the smallest element in the heap), remove the root and add the current element.

After processing all elements, the root of the min-heap will be the kth largest element.

Example:

nums = [3, 2, 1, 5, 6, 4], k = 2

Initialize a min-heap with the first k=2 elements: [3, 2]. The min-heap will be [2, 3].
Iterate through the remaining elements:
- 5: 5 > 2, so remove 2 and add 5. Heap: [3, 5].
- 6: 6 > 3, so remove 3 and add 6. Heap: [5, 6].
- 4: 4 < 5, so do nothing.
The root of the min-heap is 5, which is the 2nd largest element.

Code (Python):

import heapq

def find_kth_largest(nums, k):
    min_heap = []
    for num in nums:
        if len(min_heap) < k:
            heapq.heappush(min_heap, num)
        elif num > min_heap[0]:
            heapq.heapreplace(min_heap, num)
    return min_heap[0]

Space Complexity: O(k) to store the min-heap.

3. Edge Cases

Empty Array: If the input array is empty, we should throw an exception or return None, depending on the problem's requirements.
k Out of Bounds: If k is less than 1 or greater than the length of the array, we should throw an exception or return None.
Duplicate Elements: The algorithm should work correctly with duplicate elements in the array, as demonstrated in Example 2.

4. Quickselect (Average O(n) Time)

import random

def find_kth_largest_quickselect(nums, k):
    def partition(l, r):
        pivot_index = random.randint(l, r)
        pivot = nums[pivot_index]
        nums[r], nums[pivot_index] = nums[pivot_index], nums[r]

        i = l
        for j in range(l, r):
            if nums[j] < pivot:
                nums[i], nums[j] = nums[j], nums[i]
                i += 1

        nums[i], nums[r] = nums[r], nums[i]
        return i

    def quickselect(l, r, k_smallest):
        if l == r:
            return nums[l]

        partition_index = partition(l, r)

        if k_smallest == partition_index:
            return nums[k_smallest]
        elif k_smallest < partition_index:
            return quickselect(l, partition_index - 1, k_smallest)
        else:
            return quickselect(partition_index + 1, r, k_smallest)

    return quickselect(0, len(nums) - 1, len(nums) - k)

Time Complexity: Average: O(n), Worst: O(n^2) Space Complexity: O(1) - in-place