Let's implement a MedianFinder
class to efficiently calculate the median of a stream of numbers.
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
arr = [2,3,4]
, the median is 3
.arr = [2,3]
, the median is (2 + 3) / 2 = 2.5
.Your MedianFinder
class should support the following methods:
MedianFinder()
: Initializes the MedianFinder
object.void addNum(int num)
: Adds the integer num
from the data stream to the data structure.double findMedian()
: Returns the median of all elements so far. Answers within 10-5
of the actual answer will be accepted.Example:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
-10^5 <= num <= 10^5
findMedian
.5 * 10^4
calls will be made to addNum
and findMedian
.Follow-up questions:
[0, 100]
, how would you optimize your solution?[0, 100]
, how would you optimize your solution?A naive solution to find the median of a data stream involves storing all the numbers in a list and, for each call to findMedian()
, sorting the list and returning the median. This approach is straightforward but inefficient.
addNum(int num)
: Add the number to a list.findMedian()
: Sort the list and compute the median.class MedianFinder:
def __init__(self):
self.numbers = []
def addNum(self, num):
self.numbers.append(num)
def findMedian(self):
self.numbers.sort()
n = len(self.numbers)
if n % 2 == 0:
return (self.numbers[n // 2 - 1] + self.numbers[n // 2]) / 2.0
else:
return float(self.numbers[n // 2])
addNum()
: O(1)findMedian()
: O(n log n), due to sorting.A more efficient solution utilizes two heaps: a max-heap to store the smaller half of the numbers and a min-heap to store the larger half. This ensures that the median can be found in O(1) time.
addNum(int num)
:
findMedian()
:
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # max-heap
self.large = [] # min-heap
def addNum(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
elif len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def findMedian(self):
if len(self.small) == len(self.large):
return (-self.small[0] + self.large[0]) / 2.0
else:
return float(-self.small[0])
addNum()
: O(log n), due to heap operations.findMedian()
: O(1).findMedian
, so an empty stream won't be an issue.If all numbers are in the range [0, 100]
, we can use a frequency array (bucket sort). addNum
increments the count of the number in the array and findMedian
iterates through the array to find the median.
addNum
: O(1)findMedian
: O(100) = O(1)For the case where 99% of the numbers are in the range [0, 100]
, a hybrid approach would be best. Use the frequency array as described above for numbers in the range [0, 100]
. Use the heap-based approach for numbers outside that range. This keeps the memory and time costs low as long as the outlier numbers really are rare.