Taro Logo

Median of Two Sorted Arrays

Hard
Meta logo
Meta
1 view
Topics:
ArraysBinary Search

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

The overall run time complexity should be O(log (m+n)). Here are a couple of examples:

Example 1:

Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

Example 2:

Input: nums1 = [1,2], nums2 = [3,4]
Output: 2.50000
Explanation: merged array = [1,2,3,4] and median is (2 + 3) / 2 = 2.5.

Constraints:

  • nums1.length == m
  • nums2.length == n
  • 0 <= m <= 1000
  • 0 <= n <= 1000
  • 1 <= m + n <= 2000
  • -10^6 <= nums1[i], nums2[i] <= 10^6

How would you approach this problem and what would be the time and space complexity of your solution?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input arrays, nums1 and nums2, be empty, and if so, how should I handle that case?
  2. What is the expected data type of the numbers in the arrays? Are they integers, floating-point numbers, or some other numeric type?
  3. What is the range of possible values for the numbers in the input arrays? Are negative values allowed?
  4. In the case of an even combined length of the two arrays, should the median be the average of the two middle elements, and should I round the result if it's a float?
  5. Are duplicate numbers allowed within the input arrays, and if so, how should they be handled when determining the median?

Brute Force Solution

Approach

The brute force approach for finding the median combines two sorted lists and then finds the middle element. It's like merging two decks of cards and then picking the middle card. This approach doesn't try to be efficient; it just does the most obvious thing.

Here's how the algorithm would work step-by-step:

  1. First, take all the numbers from both lists and put them into a single new list.
  2. Then, sort the combined list from smallest to largest.
  3. Next, find the middle of the sorted list. If the list has an odd number of elements, the middle element is the median.
  4. If the list has an even number of elements, the median is the average of the two elements in the middle.

Code Implementation

def find_median_sorted_arrays_brute_force(first_array, second_array):
    # Combine the two input arrays into a single array
    combined_array = first_array + second_array

    # Sort the combined array
    combined_array.sort()

    combined_array_length = len(combined_array)

    # Determine if the array has an even or odd number of elements
    if combined_array_length % 2 == 0:
        # Calculate indices of the two middle elements for even length array
        middle_index_one = combined_array_length // 2 - 1
        middle_index_two = combined_array_length // 2

        #  Calculate the median as average of the two middle elements
        median = (combined_array[middle_index_one] + combined_array[middle_index_two]) / 2

    else:
        # Calculate the index of the middle element for odd length array
        middle_index = combined_array_length // 2

        # Median is simply the middle element
        median = combined_array[middle_index]

    return median

Big(O) Analysis

Time Complexity
O((n+m)log(n+m))The first step involves combining the two input arrays, nums1 and nums2, into a single array. If the lengths of nums1 and nums2 are n and m respectively, this takes O(n+m) time. The next step is sorting this combined array, which has a length of n+m. The time complexity of a typical sorting algorithm (like merge sort or quicksort, often used in built-in sort functions) is O(k log k) where k is the number of elements being sorted. In this case, k = n+m. Therefore, the sorting step takes O((n+m)log(n+m)) time. Finding the median after sorting takes constant time, O(1). The dominant operation is the sorting step, so the overall time complexity is O((n+m)log(n+m)).
Space Complexity
O(N)The brute force approach described creates a new list to store all elements from the two input arrays. If the total number of elements in the two input arrays is N, then the combined list will also have a size of N. Therefore, the algorithm requires auxiliary space proportional to the total number of elements from the inputs. This results in a space complexity of O(N).

Optimal Solution

Approach

The fastest way to find the middle value of two sorted lists combined is to efficiently narrow down the search area. We strategically eliminate halves of the lists that cannot contain the median, similar to a guessing game where you adjust your range with each guess.

Here's how the algorithm would work step-by-step:

  1. First, figure out which list is shorter, and focus on that list.
  2. Imagine dividing the shorter list into two parts. We're trying to find the perfect dividing point.
  3. If we make a cut in the shorter list, we can figure out where to make a corresponding cut in the longer list to ensure we have the same number of elements on either side of both cuts.
  4. Now, check if the elements near the cuts are in the correct order. Specifically, is the largest value on the left side of the cut smaller than the smallest value on the right side of the cut? If so, we've found the right cuts.
  5. If not, adjust the cut in the shorter list and repeat the process. If the values are too big on the left, move the cut to the left. If the values are too small, move the cut to the right.
  6. Once the perfect cuts are found, the median can be easily calculated from the values closest to those cuts. It might be the average of two numbers if we have an even number of total elements, or simply one number if we have an odd number of elements.

Code Implementation

def find_median_sorted_arrays(first_array, second_array):
    if len(first_array) > len(second_array):
        first_array, second_array = second_array, first_array

    array_x_length = len(first_array)
    array_y_length = len(second_array)

    low_index = 0
    high_index = array_x_length

    while low_index <= high_index:
        partition_x = (low_index + high_index) // 2

        # Calculate partition of the second array
        partition_y = (array_x_length + array_y_length + 1) // 2 - partition_x

        max_left_x = first_array[partition_x - 1] if partition_x > 0 else float('-inf')
        min_right_x = first_array[partition_x] if partition_x < array_x_length else float('inf')

        max_left_y = second_array[partition_y - 1] if partition_y > 0 else float('-inf')
        min_right_y = second_array[partition_y] if partition_y < array_y_length else float('inf')

        # Check if partitions are correct
        if max_left_x <= min_right_y and max_left_y <= min_right_x:

            # We found the correct partition
            if (array_x_length + array_y_length) % 2 == 0:
                return (max(max_left_x, max_left_y) + min(min_right_x, min_right_y)) / 2
            else:
                return max(max_left_x, max_left_y)

        # Adjust partitions based on comparisons
        elif max_left_x > min_right_y:
            # Move towards left in first_array
            high_index = partition_x - 1
        else:
            # Move towards right in first_array
            low_index = partition_x + 1

    return None # Should never reach here, input arrays are sorted

Big(O) Analysis

Time Complexity
O(log(min(m, n)))The algorithm performs a binary search on the shorter of the two arrays. Let m and n represent the lengths of the two arrays. In each step of the binary search, the search space is halved. The cost is driven by the binary search on the shorter array which results in log(min(m, n)) iterations. Therefore, the time complexity is O(log(min(m, n))).
Space Complexity
O(1)The provided algorithm iteratively narrows down the search range using comparisons and adjustments of cut points (index variables). It does not create any auxiliary data structures like lists, arrays, or hash maps to store intermediate results or track visited elements. The operations involve only a few integer variables to keep track of the cut positions. Therefore, the auxiliary space used is constant and independent of the input size, resulting in O(1) space complexity.

Edge Cases

CaseHow to Handle
Both arrays are emptyReturn an appropriate error value or throw an exception, as the median is undefined for empty arrays.
One array is empty, the other is notThe median is simply the median of the non-empty array, which can be calculated directly.
Arrays of size 1 and size > 1The algorithm should correctly merge a very small array with a significantly larger one and identify the median element(s).
Arrays contain duplicate numbersThe binary search partitioning logic should correctly handle duplicates without entering infinite loops or miscalculating the median.
Arrays contain negative numbers, zero, and positive numbersThe comparison logic should work correctly for all number types including negative numbers and zero.
Arrays with large numbers that may cause integer overflow during calculations (e.g., sums)Ensure that the calculations used for the median (especially when averaging two values) are performed using a data type that can accommodate large numbers, such as long or double, to avoid overflow.
Arrays with all identical valuesBinary search must handle identical values within the array without getting stuck in an infinite loop.
Arrays are highly skewed (e.g., one array has very small values and the other has very large values)Binary search should still converge to the correct partition even with disparate value ranges.