Taro Logo

Kth Largest Element in a Stream #1 Most Asked

Easy
8 views
Topics:
ArraysGreedy Algorithms

You are part of a university admissions office and need to keep track of the kth highest test score from applicants in real-time. This helps to determine cut-off marks for interviews and admissions dynamically as new applicants submit their scores.

You are tasked to implement a class which, for a given integer k, maintains a stream of test scores and continuously returns the kth highest test score after a new score has been submitted. More specifically, we are looking for the kth highest score in the sorted list of all scores.

Implement the KthLargest class:

  • KthLargest(int k, int[] nums) Initializes the object with the integer k and the stream of test scores nums.
  • int add(int val) Adds a new test score val to the stream and returns the element representing the kth largest element in the pool of test scores so far.

Example 1:

Input:
["KthLargest", "add", "add", "add", "add", "add"]
[[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]]

Output: [null, 4, 5, 5, 8, 8]

Explanation:

KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]);
kthLargest.add(3); // return 4
kthLargest.add(5); // return 5
kthLargest.add(10); // return 5
kthLargest.add(9); // return 8
kthLargest.add(4); // return 8

Example 2:

Input:
["KthLargest", "add", "add", "add", "add"]
[[4, [7, 7, 7, 7, 8, 3]], [2], [10], [9], [9]]

Output: [null, 7, 7, 7, 8]

Explanation:

KthLargest kthLargest = new KthLargest(4, [7, 7, 7, 7, 8, 3]);
kthLargest.add(2); // return 7
kthLargest.add(10); // return 7
kthLargest.add(9); // return 7
kthLargest.add(9); // return 8

Constraints:

  • 0 <= nums.length <= 104
  • 1 <= k <= nums.length + 1
  • -104 <= nums[i] <= 104
  • -104 <= val <= 104
  • At most 104 calls will be made to add.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the value of 'k' relative to the size of the input array 'nums'? Specifically, is 'k' always guaranteed to be a valid value (i.e., 1 <= k <= length of the stream)?
  2. What is the range of possible values for the integers in the input array 'nums' and the input value 'val' passed to the add method?
  3. If the stream has fewer than 'k' elements, what should the 'add' method return?
  4. Can I assume that the input array 'nums' is already sorted, or do I need to handle unsorted input during initialization?
  5. Should I be concerned about memory usage, particularly if the stream of numbers grows very large over time?

Brute Force Solution

Approach

To find the kth largest element in a stream using a brute force approach, we essentially keep track of all the elements we've seen so far. Whenever a new element comes in, we consider all the elements seen so far and find the kth largest among them.

Here's how the algorithm would work step-by-step:

  1. Keep a list of all the numbers that have come in.
  2. When a new number arrives, add it to your list.
  3. Then, arrange all the numbers in your list from largest to smallest.
  4. Pick the number that is in the kth position from the beginning of the sorted list. That's your answer for that moment.

Code Implementation

class KthLargest:

    def __init__(self, k_value, initial_numbers):
        self.k_value = k_value
        self.all_numbers = initial_numbers

    def add(self, new_number):
        # Add the new number to the list of all numbers seen.
        self.all_numbers.append(new_number)

        # Sort the list to easily find the kth largest element
        self.all_numbers.sort(reverse=True)

        # Return the kth largest element.
        return self.all_numbers[self.k_value - 1]

Big(O) Analysis

Time Complexity
O(n log n)We maintain a list of all elements seen so far, which grows to a maximum size of n, where n is the total number of elements in the stream. For each new element added, we sort the entire list to find the kth largest element. Sorting a list of n elements takes O(n log n) time using an efficient sorting algorithm like merge sort or quicksort. Therefore, the time complexity for each call is dominated by the sorting step, leading to an overall time complexity of O(n log n) per element, resulting in O(n log n) for each call to find the kth largest.
Space Complexity
O(N)The algorithm maintains a list of all numbers seen so far. In the worst-case scenario, where we process a stream of N numbers, the list will store all N numbers. Therefore, the auxiliary space used by the list grows linearly with the input size N. This leads to a space complexity of O(N).

Optimal Solution

Approach

The challenge is to efficiently track the kth largest number as new numbers arrive. Instead of sorting all numbers every time, we maintain only the k largest values seen so far in a special ordered structure.

Here's how the algorithm would work step-by-step:

  1. Create a container that automatically keeps its elements in sorted order, and makes it easy to remove the smallest element and add new elements.
  2. When a new number comes in, compare it to the current kth largest number. If the new number is bigger, it has a chance to be among the k largest.
  3. If the new number is bigger than the current kth largest, add it to the container. Then, if the container has more than k elements, remove the smallest one.
  4. The container will always hold the k largest numbers seen so far. The smallest number in this container is always the kth largest number overall.
  5. When asked for the kth largest number, simply return the smallest number in the container.

Code Implementation

import heapq

class KthLargest:

    def __init__(self, k_value: int, initial_numbers: list[int]):
        self.k_value = k_value
        self.min_heap = initial_numbers
        heapq.heapify(self.min_heap)

        # Ensure heap contains only the k largest
        while len(self.min_heap) > k_value:
            heapq.heappop(self.min_heap)

    def add(self, new_number: int) -> int:
        # Only add if larger than smallest element
        if len(self.min_heap) < self.k_value or new_number > self.min_heap[0]:
            heapq.heappush(self.min_heap, new_number)

            # Maintain size of k
            if len(self.min_heap) > self.k_value:
                heapq.heappop(self.min_heap)

            # Heap root will be the kth largest
            return self.min_heap[0]

        return self.min_heap[0]

Big(O) Analysis

Time Complexity
O(n log k)The algorithm processes a stream of n numbers. For each number, it potentially inserts it into a container holding at most k elements. The insertion and potential removal of the smallest element in the container (if the container size exceeds k) takes O(log k) time if using a suitable data structure like a min-heap or a self-balancing binary search tree. Therefore, the overall time complexity is O(n log k), where n is the number of elements in the stream and k is the desired kth largest element.
Space Complexity
O(k)The solution uses a container to store the k largest elements seen so far. This container, such as a min-heap or sorted list, consumes memory proportional to k, where k is the desired kth largest element. While N elements arrive in the stream, the container's maximum size is bounded by k. Therefore, the auxiliary space required is directly related to the value of k, and not dependent on N, the total number of elements in the stream. This space usage approximates to O(k).

Edge Cases

k is zero or negative
How to Handle:
Throw an IllegalArgumentException or return null since it is impossible to find a kth largest element when k is non-positive.
nums is null or empty and add is called multiple times before reaching k elements
How to Handle:
Maintain a priority queue and return null or the smallest element in the queue until the queue size equals k.
k is greater than the initial size of nums
How to Handle:
Initialize the priority queue with the available elements and proceed with the add operations, returning null or the smallest element in the queue if the number of added elements + initial elements is less than k.
Input stream contains large numbers, potentially leading to integer overflow if not handled carefully.
How to Handle:
Use long data type for storing numbers if needed and be mindful of potential overflows during calculations.
All elements in nums and the stream are identical
How to Handle:
The min-heap priority queue will correctly maintain the k identical elements and return the kth largest one.
k is a very large number approaching the maximum possible stream size.
How to Handle:
The solution should scale efficiently using a min-heap, avoiding performance degradation as k increases, and potentially hitting memory constraints.
nums contains duplicate numbers with extreme boundary values (min and max integer values).
How to Handle:
The min-heap priority queue approach can handle duplicates, including extreme values, correctly since it relies on comparisons rather than special handling.
Memory usage becomes excessive with a very large stream and large k.
How to Handle:
Consider using an external sorting algorithm or techniques for handling very large data streams if memory becomes a bottleneck.
0/0 completed