Taro Logo

Sliding Window Median

Medium
15 views
2 months ago

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle values.

  • For examples, if arr = [2,3,4], the median is 3.
  • For examples, if arr = [1,2,3,4], the median is (2 + 3) / 2 = 2.5.

You are given an integer array nums and an integer k. There is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position.

Return the median array for each window in the original array. Answers within 10-5 of the actual value will be accepted.

For example:

nums = [1, 3, -1, -3, 5, 3, 6, 7], k = 3 Output: [1.00000, -1.00000, -1.00000, 3.00000, 5.00000, 6.00000]

Explanation:

Window position                Median
---------------                -----
[1  3  -1] -3  5  3  6  7        1
1 [3  -1  -3] 5  3  6  7       -1
1  3 [-1  -3  5] 3  6  7       -1
1  3  -1 [-3  5  3] 6  7        3
1  3  -1  -3 [5  3  6] 7        5
1  3  -1  -3  5 [3  6  7]       6

What is the most efficient algorithm (in terms of time complexity) to solve this problem?

Sample Answer
import heapq

class SlidingWindowMedian:

    def __init__(self):
        self.small = []  # max heap
        self.large = []  # min heap
        self.removed = {}

    def add(self, num):
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
        self.rebalance()

    def remove(self, num):
        if num in self.removed:
            self.removed[num] += 1
        else:
            self.removed[num] = 1
        self.rebalance()

    def rebalance(self):
        if len(self.small) > len(self.large) + 1:
            heapq.heappush(self.large, -heapq.heappop(self.small))
        elif len(self.large) > len(self.small):
            heapq.heappush(self.small, -heapq.heappop(self.large))
        self.prune()

    def prune(self):
        while self.small and -self.small[0] in self.removed and self.removed[-self.small[0]] > 0:
            self.removed[-self.small[0]] -= 1
            heapq.heappop(self.small)
        while self.large and self.large[0] in self.removed and self.removed[self.large[0]] > 0:
            self.removed[self.large[0]] -= 1
            heapq.heappop(self.large)

    def find_median(self):
        if len(self.small) == len(self.large):
            return (-self.small[0] + self.large[0]) / 2.0
        else:
            return -self.small[0]


def medianSlidingWindow(nums, k):
    result = []
    swm = SlidingWindowMedian()
    for i in range(len(nums)):
        swm.add(nums[i])
        if i >= k - 1:
            result.append(swm.find_median())
            swm.remove(nums[i - k + 1])
    return result


# Example usage:
nums = [1, 3, -1, -3, 5, 3, 6, 7]
k = 3
print(medianSlidingWindow(nums, k))

nums = [1,2,3,4,2,3,1,4,2]
k = 3
print(medianSlidingWindow(nums, k))

Explanation:

  1. Naive Approach (Brute Force):

    • For each sliding window, sort the elements and find the median.
    • Time Complexity: O(n * k * log(k)), where n is the length of nums and k is the window size. For each of the n windows, we sort k elements (O(klogk)).
    • Space Complexity: O(k) to store the window.
  2. Optimal Approach (Using Heaps):

    • Maintain two heaps: a max-heap (small) to store the smaller half of the elements, and a min-heap (large) to store the larger half.
    • The small heap stores elements in negated form to simulate a max-heap in Python.
    • When a new element comes, put it into small if it's smaller than the largest element in the small heap or if the heap is empty, otherwise put it in large.
    • After adding, balance the heaps so the small has either the same number of elements as large or one more element.
    • Keep track of removed elements in a removed dictionary, and lazily remove the elements when they are at the top of the heaps.

Time Complexity Analysis:

  • Adding an element: O(log k), as it involves heap push operations.
  • Removing an element: O(1) on average (amortized), as it involves just incrementing the count in the removed dictionary.
  • Rebalancing heaps: O(log k), as it involves heap push and pop operations.
  • Finding median: O(1), as it involves just peeking at the top elements of the heaps.
  • Overall: O(n log k), where n is the length of nums, and k is the window size. We iterate through nums (O(n)) and do heap operations (O(logk)).

Space Complexity Analysis:

  • O(k) to store the heaps (small and large), and O(k) in the worst case to store the removed elements in the hash map.
  • Overall, the space complexity is O(k).

Edge Cases:

  1. Empty input array: If nums is empty, return an empty array.
  2. k equals 1: The median is just the element itself.
  3. k is larger than the length of nums: Return an empty array because no sliding window can exist.
  4. Duplicate numbers: The heap-based approach correctly handles duplicate numbers.
  5. Negative numbers: The heap-based approach correctly handles negative numbers.