Median of Two Sorted Arrays Interview Question for Amazon

You are given two sorted arrays nums1 and nums2 of size m and n respectively. Your task is to implement an efficient algorithm to find the median of the two sorted arrays.

Requirements:

The algorithm must have a time complexity of O(log (m+n)). This is a crucial performance constraint.
You should handle edge cases such as empty arrays or arrays of very different lengths.
The arrays contain integers, and the median may be a floating-point number.

Clarifications:

Both arrays are already sorted in non-decreasing order.
The arrays can contain duplicate numbers.
Assume that 1 <= m + n <= 2000.

Example:

Let's say you have nums1 = [1, 3] and nums2 = [2]. The merged and sorted array would be [1, 2, 3], and the median is 2.0.

Another example: nums1 = [1, 2] and nums2 = [3, 4]. The merged and sorted array would be [1, 2, 3, 4], and the median is (2 + 3) / 2 = 2.5.

Describe your algorithm, analyze its time and space complexity, and provide code to demonstrate your solution. How would you optimize your approach to achieve the required logarithmic time complexity?

Naive Solution: Merge and Find Median

The most straightforward approach is to merge the two sorted arrays into a single sorted array and then find the median. This involves creating a new array of size m + n, copying elements from both input arrays while maintaining the sorted order, and finally, calculating the median based on whether the combined length is even or odd.

Code (Python)

def find_median_sorted_arrays_naive(nums1, nums2):
    merged = []
    i, j = 0, 0
    while i < len(nums1) and j < len(nums2):
        if nums1[i] < nums2[j]:
            merged.append(nums1[i])
            i += 1
        else:
            merged.append(nums2[j])
            j += 1
    merged.extend(nums1[i:])
    merged.extend(nums2[j:])
    n = len(merged)
    if n % 2 == 0:
        return (merged[n // 2 - 1] + merged[n // 2]) / 2.0
    else:
        return float(merged[n // 2])

Time Complexity

O(m + n): Merging the two arrays takes linear time with respect to the total number of elements.

Space Complexity

O(m + n): We create a new array to store the merged result.

Optimal Solution: Binary Search

To achieve the desired O(log (m+n)) runtime complexity, we can use a binary search approach. The main idea is to find the correct partition in both arrays such that elements on the left side of the partitions are smaller than the elements on the right side. The median can then be determined from these partition boundaries.

Ensure nums1 is the smaller array: If nums1 is longer than nums2, swap them. This simplifies the logic and ensures m <= n.
Binary Search: Perform binary search on the smaller array. Let low = 0 and high = m. In each iteration, calculate partitionX and partitionY such that: partitionX + partitionY = (m + n + 1) // 2
Check Partition Validity:
- maxLeftX = nums1[partitionX - 1] (if partitionX > 0, otherwise -infinity)
- minRightX = nums1[partitionX] (if partitionX < m, otherwise infinity)
- maxLeftY = nums2[partitionY - 1] (if partitionY > 0, otherwise -infinity)
- minRightY = nums2[partitionY] (if partitionY < n, otherwise infinity)
If maxLeftX <= minRightY and maxLeftY <= minRightX, we have found the correct partitions. The median can be calculated as:
- If (m + n) is even: (max(maxLeftX, maxLeftY) + min(minRightX, minRightY)) / 2.0
- If (m + n) is odd: max(maxLeftX, maxLeftY)
Adjust Binary Search Range:
- If maxLeftX > minRightY, move left in nums1 (high = partitionX - 1)
- If maxLeftX < minRightY, move right in nums1 (low = partitionX + 1)

Code (Python)

import sys

def find_median_sorted_arrays(nums1, nums2):
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1
    m, n = len(nums1), len(nums2)
    low, high = 0, m
    while low <= high:
        partitionX = (low + high) // 2
        partitionY = (m + n + 1) // 2 - partitionX

        maxLeftX = nums1[partitionX - 1] if partitionX > 0 else -sys.maxsize
        minRightX = nums1[partitionX] if partitionX < m else sys.maxsize

        maxLeftY = nums2[partitionY - 1] if partitionY > 0 else -sys.maxsize
        minRightY = nums2[partitionY] if partitionY < n else sys.maxsize

        if maxLeftX <= minRightY and maxLeftY <= minRightX:
            if (m + n) % 2 == 0:
                return (max(maxLeftX, maxLeftY) + min(minRightX, minRightY)) / 2.0
            else:
                return float(max(maxLeftX, maxLeftY))
        elif maxLeftX > minRightY:
            high = partitionX - 1
        else:
            low = partitionX + 1
    return -1  # Should not reach here

Time Complexity

O(log(min(m, n))): We perform binary search on the smaller array.

Space Complexity

O(1): We use constant extra space.

Edge Cases

Empty Arrays: The code handles cases where one of the arrays might be empty.
Unequal Array Lengths: The algorithm is designed to work efficiently even when the arrays have significantly different lengths.
All elements in one array smaller than the other: The binary search will still correctly converge to the right partitions.