Taro Logo

Sliding Window Median

Hard
Amazon logo
Amazon
Topics:
ArraysSliding WindowsBinary SearchTwo PointersStacksGreedy Algorithms

You are given an integer array nums and an integer k. There is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position.

Return the median array for each window in the original array. Answers within 10-5 of the actual value will be accepted.

For example, if nums = [1,3,-1,-3,5,3,6,7] and k = 3, the output should be [1.00000,-1.00000,-1.00000,3.00000,5.00000,6.00000]

Explanation:

Window position                Median
---------------                -----
[1  3  -1] -3  5  3  6  7        1
1 [3  -1  -3] 5  3  6  7       -1
1  3 [-1  -3  5] 3  6  7       -1
1  3  -1 [-3  5  3] 6  7        3
1  3  -1  -3 [5  3  6] 7        5
1  3  -1  -3  5 [3  6  7]       6

As another example, if nums = [1,2,3,4,2,3,1,4,2] and k = 3, the output should be [2.00000,3.00000,3.00000,3.00000,2.00000,3.00000,2.00000]

How would you efficiently solve this problem?

Solution


Brute Force Solution

A naive approach would involve iterating through the array and, for each window of size k, sorting the window and then finding the median. This is straightforward to implement but not efficient.

Code

def medianSlidingWindow_brute_force(nums, k):
    result = []
    for i in range(len(nums) - k + 1):
        window = sorted(nums[i:i+k])
        if k % 2 == 0:
            median = (window[k//2 - 1] + window[k//2]) / 2
        else:
            median = window[k//2]
        result.append(median)
    return result

Time Complexity

O(n * k * log(k)), where n is the length of the input array nums. For each of the n-k+1 windows, we sort the window of size k, which takes O(k log k) time.

Space Complexity

O(k) because we are storing each window of k elements in a new sorted list.

Optimal Solution: Using Two Heaps

To improve efficiency, we can use two heaps: a max-heap to store the smaller half of the window elements and a min-heap to store the larger half.

Algorithm

  1. Initialize two heaps: small (max-heap) and large (min-heap).
  2. For the first k elements, add them to the heaps, ensuring the heaps are balanced (i.e., the size difference is at most 1).
  3. Calculate the median from the heaps.
  4. Slide the window by one position. Remove the element that left the window from the heaps and add the new element to the heaps. Rebalance the heaps.
  5. Repeat steps 3 and 4 until the end of the array.

Code

import heapq

def medianSlidingWindow(nums, k):
    small, large = [], []
    result = []

    def balance():
        if len(small) > len(large) + 1:
            heapq.heappush(large, -heapq.heappop(small))
        elif len(large) > len(small):
            heapq.heappush(small, -heapq.heappop(large))

    def add(num):
        if not small or num <= -small[0]:
            heapq.heappush(small, -num)
        else:
            heapq.heappush(large, num)
        balance()

    def remove(num):
        if num <= -small[0]:
            small.remove(-num)
            heapq.heapify(small)
        else:
            large.remove(num)
            heapq.heapify(large)
        balance()

    for i in range(k):
        add(nums[i])

    for i in range(len(nums) - k + 1):
        if k % 2 == 0:
            result.append((-small[0] + large[0]) / 2.0)
        else:
            result.append(-small[0] * 1.0)

        if i < len(nums) - k:
            remove(nums[i])
            add(nums[i + k])

    return result

Time Complexity

O(n * log(k)), where n is the length of the input array nums. Adding and removing elements from the heaps take O(log k) time, and we do this for each of the n-k+1 windows.

Space Complexity

O(k) because we are storing at most k elements in the heaps.

Edge Cases

  • Empty input array: Return an empty array.
  • k is larger than the array size: Return an empty array.
  • k is 1: The median is simply the element itself.
  • Duplicate numbers in the array: The algorithm should handle duplicates correctly.