Taro Logo

Find K Pairs with Smallest Sums

Medium
Meta logo
Meta
4 views
Topics:
ArraysGreedy AlgorithmsDynamic Programming

You are given two integer arrays nums1 and nums2 sorted in non-decreasing order and an integer k.

Define a pair (u, v) which consists of one element from the first array and one element from the second array.

Return the k pairs (u1, v1), (u2, v2), ..., (uk, vk) with the smallest sums.

Example 1:

Input: nums1 = [1,7,11], nums2 = [2,4,6], k = 3 Output: [[1,2],[1,4],[1,6]] Explanation: The first 3 pairs are returned from the sequence: [1,2],[1,4],[1,6],[7,2],[7,4],[11,2],[7,6],[11,4],[11,6]

Example 2:

Input: nums1 = [1,1,2], nums2 = [1,2,3], k = 2 Output: [[1,1],[1,1]] Explanation: The first 2 pairs are returned from the sequence: [1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3]

Constraints:

  • 1 <= nums1.length, nums2.length <= 10^5
  • -10^9 <= nums1[i], nums2[i] <= 10^9
  • nums1 and nums2 both are sorted in non-decreasing order.
  • 1 <= k <= 10^4
  • k <= nums1.length * nums2.length

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible ranges and data types for the numbers within `nums1` and `nums2`? Can they include negative numbers, zero, or floating-point numbers?
  2. Can `nums1` or `nums2` be null or empty? What should I return in that case?
  3. Is `K` guaranteed to be non-negative and less than or equal to the product of the lengths of `nums1` and `nums2`? What should happen if K is zero?
  4. If there are multiple pairs with the same sum, is the order among them important in the returned list? Should I return the pairs in any specific order?
  5. Are duplicate values allowed within `nums1` and `nums2`, and if so, how does that affect the output? Should the pairs in the result be distinct?

Brute Force Solution

Approach

The goal is to find the pairs with the smallest sums from two lists of numbers. A brute force approach means we consider every possible pair we can make by picking one number from each list. Then, we calculate the sum for each of these pairs and find the K pairs with the smallest sums.

Here's how the algorithm would work step-by-step:

  1. Take the first number from the first list and pair it with the first number from the second list.
  2. Calculate the sum of that pair and store it.
  3. Now, take the first number from the first list and pair it with the second number from the second list.
  4. Calculate the sum of this new pair and store it.
  5. Continue pairing the first number from the first list with every number in the second list, each time calculating and storing the sum.
  6. Once you've paired the first number with all the numbers in the second list, move on to the second number in the first list.
  7. Pair the second number in the first list with every number in the second list, calculating and storing all those sums too.
  8. Keep doing this until you have paired every number from the first list with every number from the second list, and you have calculated and stored all the possible sums.
  9. Finally, look at all the sums you calculated and find the K smallest ones. The pairs that generated those K smallest sums are your answer.

Code Implementation

def find_k_pairs_with_smallest_sums_brute_force(list_one, list_two, k_smallest):
    pair_sums = []

    # Iterate through all possible pairs
    for first_list_index in range(len(list_one)):
        for second_list_index in range(len(list_two)):

            # Calculate sum for the pair
            current_sum = list_one[first_list_index] + list_two[second_list_index]
            pair_sums.append((current_sum, (list_one[first_list_index], list_two[second_list_index])))

    #Sort to find the K smallest sums
    pair_sums.sort()

    smallest_k_pairs = []
    for i in range(min(k_smallest, len(pair_sums))):

        # Extract pairs
        smallest_k_pairs.append(pair_sums[i][1])

    return smallest_k_pairs

Big(O) Analysis

Time Complexity
O(m*n + klog(m*n))The provided solution involves iterating through all possible pairs of elements from the two input lists. Let 'm' be the size of the first list and 'n' be the size of the second list. The process of generating all pairs and calculating their sums takes O(m*n) time. After generating all possible sums, we need to find the K smallest sums. Sorting all m*n sums would take O(m*n log(m*n)) time, and then selecting the first K would take O(K) time. However, a more efficient approach would be to use a min-heap of size K to keep track of the K smallest sums encountered so far. After generating all m*n sums, we would need to iterate through them and insert them into the min-heap if they are smaller than the largest element in the heap, which has a runtime of O(klog(m*n)). Therefore, sorting dominates the K selection. Hence, the overall time complexity is dominated by the initial pair generation O(m*n) and the final K selection using sorting or a heap O(klog(m*n)) making the complexity O(m*n + klog(m*n)).
Space Complexity
O(N*M)The plain English explanation describes calculating and storing the sum of every possible pair. This means we need to store sums for each combination of elements from the first list (let's say it has N elements) and the second list (let's say it has M elements). Therefore, we are storing N * M sums. This requires auxiliary space proportional to N*M, where N and M are the sizes of the two input lists respectively.

Optimal Solution

Approach

The challenge involves finding the 'k' smallest sums formed by pairing numbers from two lists. Instead of checking all possible pairs, we'll focus on the pairs that are most likely to have small sums and efficiently find the top 'k'.

Here's how the algorithm would work step-by-step:

  1. Imagine arranging all possible pairs in a grid, with one list across the top and the other down the side. The smallest sums will likely be near the top-left corner of this grid.
  2. Start by looking at the very first pair, which combines the first number from each list. This is probably among the smallest sums.
  3. Then, consider the pairs that are directly next to this first pair (either across or down). These are also likely to have small sums.
  4. We'll use a method to keep track of the pairs we've already considered and the pairs we know are potentially small but haven't looked at yet.
  5. Continue exploring outward from the top-left corner, always picking the smallest sum from the pairs we haven't fully investigated yet.
  6. Keep adding these smallest sums to our list until we have found 'k' of them.
  7. Since we always look at the smallest available sums first, we can be sure that after finding 'k' pairs, they are indeed the 'k' pairs with the smallest sums.

Code Implementation

import heapq

def find_k_smallest_pairs(nums1, nums2, k):
    result = []
    visited = set()
    heap = [(nums1[0] + nums2[0], 0, 0)]
    visited.add((0, 0))

    # We only need k smallest pairs
    while heap and len(result) < k:
        sum_of_pairs, index1, index2 = heapq.heappop(heap)
        result.append((nums1[index1], nums2[index2]))

        # Explore the neighbor to the right
        if index1 + 1 < len(nums1) and (index1 + 1, index2) not in visited:
            heapq.heappush(heap, (nums1[index1 + 1] + nums2[index2], index1 + 1, index2))
            visited.add((index1 + 1, index2))

        # Explore the neighbor below
        if index2 + 1 < len(nums2) and (index1, index2 + 1) not in visited:
            heapq.heappush(heap, (nums1[index1] + nums2[index2 + 1], index1, index2 + 1))
            visited.add((index1, index2 + 1))

    return result

Big(O) Analysis

Time Complexity
O(k log k)The algorithm maintains a min-heap (priority queue) to track potential pairs with small sums. In the worst case, the heap can grow up to size k, as we are looking for the k smallest pairs. Each time we extract the smallest element from the heap (which takes O(log k) time), we may add up to two new elements to the heap. We repeat this process k times to find the k smallest pairs. Therefore, the overall time complexity is dominated by k heap operations, each taking O(log k) time, resulting in O(k log k).
Space Complexity
O(K)The algorithm uses a heap or priority queue to keep track of potentially small sums, and this heap can grow up to a size of K, where K is the number of pairs to find. We also store the 'k' smallest pairs in a result list, contributing O(K) space. The algorithm also 'keeps track of the pairs we've already considered', which we can assume is stored in a set or similar data structure with the maximum size of K. Therefore, the overall auxiliary space complexity is dominated by the size of the heap and result list, which scales linearly with K.

Edge Cases

CaseHow to Handle
One or both input arrays are null or emptyReturn an empty list if either input array is null or has zero length.
k is zero or negativeIf k is zero or negative, return an empty list, as no pairs are requested.
k is larger than the total number of possible pairs (len(nums1) * len(nums2))Return all possible pairs up to the maximum number of pairs instead of crashing or causing an error.
Input arrays contain large negative numbersHandle potential integer overflow issues when calculating sums, perhaps using a larger data type.
Input arrays contain duplicate numbersThe algorithm should correctly process duplicate numbers and consider all possible pair combinations, not skipping any because of duplicates.
One array is significantly larger than the other (e.g., one has 10 elements, the other has 100,000)The algorithm should still perform efficiently and not result in unnecessary iterations or memory usage by optimizing heap size.
Arrays are sorted in descending order instead of ascendingThe algorithm needs to work correctly whether input arrays are sorted ascending or descending because problem description does not specify sort order requirement.
Input arrays contain very large numbers, potentially leading to integer overflow when summingUtilize data types capable of handling large sums or implement checks to prevent integer overflow during the sum calculation.