The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
arr = [2,3,4]
, the median is 3
.arr = [2,3]
, the median is (2 + 3) / 2 = 2.5
.Implement the MedianFinder class:
MedianFinder()
initializes the MedianFinder
object.void addNum(int num)
adds the integer num
from the data stream to the data structure.double findMedian()
returns the median of all elements so far. Answers within 10-5
of the actual answer will be accepted.Example 1:
Input ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] [[], [1], [2], [], [3], []] Output [null, null, null, 1.5, null, 2.0] Explanation MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.addNum(2); // arr = [1, 2] medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) medianFinder.addNum(3); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0
Constraints:
-105 <= num <= 105
findMedian
.5 * 104
calls will be made to addNum
and findMedian
.Follow up:
[0, 100]
, how would you optimize your solution?99%
of all integer numbers from the stream are in the range [0, 100]
, how would you optimize your solution?When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The core idea is to maintain a list of all numbers seen so far. To find the median, we will sort the list every time a new number is added and then pick the middle number (or the average of the two middle numbers).
Here's how the algorithm would work step-by-step:
class MedianFinder:
def __init__(self):
self.number_list = []
def addNum(self, number) -> None:
self.number_list.append(number)
def findMedian(self) -> float:
# Sort the list so that we can easily find the median
self.number_list.sort()
list_length = len(self.number_list)
# Handle odd length list
if list_length % 2 != 0:
middle_index = list_length // 2
return float(self.number_list[middle_index])
# Even length list, average two middle elements
else:
middle_index_right = list_length // 2
# The median is the average of the two middle numbers
middle_index_left = middle_index_right - 1
sum_of_middle_elements = self.number_list[middle_index_left] + self.number_list[middle_index_right]
return float(sum_of_middle_elements) / 2.0
To efficiently find the median of a constantly growing stream of numbers, we can use two special containers that keep track of the smaller half and larger half of the numbers. By ensuring these containers are always balanced, we can quickly find the middle number(s) needed to calculate the median.
Here's how the algorithm would work step-by-step:
import heapq
class MedianFinder:
def __init__(self):
self.smaller_numbers = [] # Max heap
self.larger_numbers = [] # Min heap
def addNum(self, number):
if not self.smaller_numbers or number <= -self.smaller_numbers[0]:
heapq.heappush(self.smaller_numbers, -number)
else:
heapq.heappush(self.larger_numbers, number)
# Maintain size balance between the two heaps.
if len(self.smaller_numbers) > len(self.larger_numbers) + 1:
heapq.heappush(self.larger_numbers, -heapq.heappop(self.smaller_numbers))
elif len(self.larger_numbers) > len(self.smaller_numbers):
heapq.heappush(self.smaller_numbers, -heapq.heappop(self.larger_numbers))
def findMedian(self):
# If heaps are of equal size, median is avg of two middle elements.
if len(self.smaller_numbers) == len(self.larger_numbers):
return (-self.smaller_numbers[0] + self.larger_numbers[0]) / 2.0
# If smaller_numbers is larger, it contains the median.
else:
return -self.smaller_numbers[0]
Case | How to Handle |
---|---|
Empty data stream (no numbers added yet) | Return null or throw an exception if findMedian is called before any numbers are added, as there is no median to compute. |
Stream with a single number | Return the single number itself as the median. |
Stream with two numbers | Return the average of the two numbers as the median. |
Large stream of numbers exceeding available memory | The solution should ideally use external memory or streaming algorithms if the stream is too large to fit in memory; otherwise, memory limits could be reached. |
Stream with all identical numbers | The solution should correctly return the identical number as the median, even if the internal data structure isn't perfectly balanced. |
Stream with a highly skewed distribution (e.g., mostly small numbers with a few very large numbers) | The priority queue implementation should maintain balance efficiently, preventing a large imbalance that would degrade performance. |
Stream containing negative numbers, zeros, and positive numbers | The solution needs to correctly handle all number types without special treatment. |
Stream with extreme boundary values (e.g., Integer.MAX_VALUE, Integer.MIN_VALUE) | The priority queue comparison logic must handle the extreme values without causing integer overflow errors. |