The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
arr = [2,3,4]
, the median is 3
.arr = [2,3]
, the median is (2 + 3) / 2 = 2.5
.Implement the MedianFinder
class:
MedianFinder()
initializes the MedianFinder
object.void addNum(int num)
adds the integer num
from the data stream to the data structure.double findMedian()
returns the median of all elements so far. Answers within 10-5
of the actual answer will be accepted.Example:
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]
Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
-105 <= num <= 105
findMedian
.5 * 104
calls will be made to addNum
and findMedian
.Follow up:
[0, 100]
, how would you optimize your solution?[0, 100]
, how would you optimize your solution?When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to continuously find the middle value from a flowing sequence of numbers. The basic brute force strategy involves keeping track of all the numbers we've seen so far and re-calculating the median every time a new number arrives.
Here's how the algorithm would work step-by-step:
class MedianFinder:
def __init__(self):
self.data_stream = []
def addNum(self, num):
# Simply add the new number to the data stream
self.data_stream.append(num)
def findMedian(self):
# Need to sort the stream to find the median
sorted_data_stream = sorted(self.data_stream)
number_of_elements = len(sorted_data_stream)
if number_of_elements % 2 != 0:
# Odd number of elements means median is middle element
middle_index = number_of_elements // 2
return float(sorted_data_stream[middle_index])
else:
# Even number of elements means median is average of two middle elements
middle_index_one = number_of_elements // 2 - 1
middle_index_two = number_of_elements // 2
# Need to cast to float to maintain precision
return (sorted_data_stream[middle_index_one] + sorted_data_stream[middle_index_two]) / 2.0
We can find the median of a constantly updating set of numbers efficiently by using two 'piles' that represent the lower and upper halves of the data. By keeping these piles balanced in size and order, the median can be quickly accessed from the top of the piles.
Here's how the algorithm would work step-by-step:
import heapq
class MedianFinder:
def __init__(self):
self.smaller_numbers = [] # Max-heap for smaller half
self.larger_numbers = [] # Min-heap for larger half
def addNum(self, number):
# Add to smaller or larger half based on current median
if not self.smaller_numbers or number <= -self.smaller_numbers[0]:
heapq.heappush(self.smaller_numbers, -number)
else:
heapq.heappush(self.larger_numbers, number)
# Rebalance the heaps to maintain size property
if len(self.smaller_numbers) > len(self.larger_numbers) + 1:
heapq.heappush(self.larger_numbers, -heapq.heappop(self.smaller_numbers))
elif len(self.larger_numbers) > len(self.smaller_numbers):
# Maintain the size difference by rebalancing
heapq.heappush(self.smaller_numbers, -heapq.heappop(self.larger_numbers))
def findMedian(self):
#Determine median from heaps
if len(self.smaller_numbers) == len(self.larger_numbers):
#Even Number of total elements
return (-self.smaller_numbers[0] + self.larger_numbers[0]) / 2.0
else:
#Odd number of total elements
return -self.smaller_numbers[0]
Case | How to Handle |
---|---|
Empty data stream (no numbers added yet) | The findMedian() method should return a defined value (e.g., NaN or throw an exception) if the stream is empty, as there's no median to calculate. |
Single number in the data stream | The findMedian() method should simply return the single number when only one number has been added. |
Large number of elements in the data stream | The solution must scale efficiently to handle millions or billions of numbers by utilizing an optimal data structure and algorithm (e.g., heaps). |
Data stream contains duplicate numbers | The solution must correctly calculate the median even when the data stream contains duplicate numbers. |
Data stream contains negative numbers | The solution should correctly handle both positive and negative numbers in the data stream without any bias. |
Data stream contains extreme values (very large or very small numbers) | The solution should be robust to extreme values and prevent potential integer overflow or floating-point precision issues. |
Data stream with highly skewed distribution (e.g., mostly small numbers with a few very large numbers) | The solution should still efficiently maintain the median even with skewed data distribution, ensuring that the heaps remain balanced. |
Floating-point precision errors. | Use double precision to avoid floating-point precision issues, and compare floating-point numbers with a tolerance value. |