Taro Logo

Find K-th Smallest Pair Distance

Hard
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+4
More companies
Profile picture
Profile picture
Profile picture
Profile picture
61 views
Topics:
ArraysBinary SearchTwo Pointers

The distance of a pair of integers a and b is defined as the absolute difference between a and b.

Given an integer array nums and an integer k, return the kth smallest distance among all the pairs nums[i] and nums[j] where 0 <= i < j < nums.length.

Example 1:

Input: nums = [1,3,1], k = 1
Output: 0
Explanation: Here are all the pairs:
(1,3) -> 2
(1,1) -> 0
(3,1) -> 2
Then the 1st smallest distance pair is (1,1), and its distance is 0.

Example 2:

Input: nums = [1,1,1], k = 2
Output: 0

Example 3:

Input: nums = [1,6,1], k = 3
Output: 5

Constraints:

  • n == nums.length
  • 2 <= n <= 104
  • 0 <= nums[i] <= 106
  • 1 <= k <= n * (n - 1) / 2

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the value of `k`?
  2. Can the input array `nums` contain negative numbers, zero, or duplicate values?
  3. If `k` is larger than the total number of possible pairs, what value should I return?
  4. Is the input array `nums` guaranteed to have at least two elements?
  5. Are we looking for the k-th smallest distance (1-indexed) or the k-th smallest distance (0-indexed)?

Brute Force Solution

Approach

The brute force way to find the K-th smallest distance between pairs involves calculating every single possible distance. We then organize these distances from smallest to largest. Finally, we pick out the distance that's in the K-th spot.

Here's how the algorithm would work step-by-step:

  1. First, figure out all the unique pairs you can make from the given set of numbers.
  2. For each pair, calculate the distance between the two numbers (the absolute difference).
  3. Now that you have all the distances, sort them from smallest to largest.
  4. The distance that's at the K-th position in this sorted list is your answer.

Code Implementation

def find_kth_smallest_pair_distance_brute_force(numbers, k_value):
    distances = []
    number_of_elements = len(numbers)

    # Calculate all possible pair distances
    for first_index in range(number_of_elements):

        for second_index in range(first_index + 1, number_of_elements):

            distance = abs(numbers[first_index] - numbers[second_index])
            distances.append(distance)

    # Sorting is crucial to find the K-th smallest distance
    distances.sort()

    # Return the K-th smallest distance
    return distances[k_value - 1]

Big(O) Analysis

Time Complexity
O(n^2 log n)Generating all unique pairs from an array of size n requires examining each element and comparing it to every other element, resulting in approximately n * (n-1) / 2 pairs, which is O(n^2). Calculating the absolute difference for each pair takes constant time, so this part remains O(n^2). The sorting operation on these calculated distances then dominates the complexity, taking O(n^2 log n^2) time, where n^2 is the number of distances. Since log n^2 = 2 log n, we can simplify O(n^2 log n^2) to O(n^2 log n).
Space Complexity
O(N^2)The brute force approach generates all possible pairs, which results in approximately N^2 distances where N is the number of elements in the input array. These distances are stored in a list (or similar data structure) to be sorted, thus requiring O(N^2) auxiliary space. The sorting operation itself may require additional space depending on the algorithm used, but commonly sorts in place or O(log N) extra space, which doesn't exceed O(N^2). Therefore, the overall auxiliary space is dominated by the storage of the distances.

Optimal Solution

Approach

The core idea is to avoid checking every single pair of numbers. We use a technique similar to searching in a phone book combined with counting to efficiently find the distance that appears at the K-th position if we were to list all possible distances in order.

Here's how the algorithm would work step-by-step:

  1. First, sort the given list of numbers to make it easier to work with.
  2. Realize that the K-th smallest distance will be between some minimum and maximum possible distance. Find these minimum and maximum distances.
  3. Now, instead of checking every single distance, use a guessing game. Pick a distance somewhere in the middle of the minimum and maximum.
  4. Count how many pairs have a distance that is less than or equal to our guessed distance.
  5. If the number of pairs is less than K, it means our guessed distance is too small, so we need to guess a larger distance in the range above. If the number of pairs is greater than or equal to K, it means our guessed distance might be the answer, or we could find a smaller one in the range below.
  6. Repeat the guessing and counting process, narrowing down the range of possible distances each time, until we find the smallest distance where the number of pairs less than or equal to it is exactly K.

Code Implementation

def find_kth_smallest_pair_distance(numbers, k):
    numbers.sort()
    list_length = len(numbers)
    
    def count_pairs_with_distance(distance_threshold):
        count = 0
        left_index = 0
        for right_index in range(list_length):
            while numbers[right_index] - numbers[left_index] > distance_threshold:
                left_index += 1
            count += right_index - left_index
        return count

    # Define search space for binary search
    low_distance = 0
    high_distance = numbers[-1] - numbers[0]

    while low_distance < high_distance:
        middle_distance = (low_distance + high_distance) // 2
        
        # Key step: count pairs with distances less than or equal to mid
        pair_count = count_pairs_with_distance(middle_distance)

        if pair_count < k:
            # Guessed distance too small, increase lower bound
            low_distance = middle_distance + 1
        else:
            # Guessed distance might be the answer, try smaller
            high_distance = middle_distance

    # At end of loop, low == high is the Kth smallest distance
    return low_distance

Big(O) Analysis

Time Complexity
O(n log n + n log W)The algorithm first sorts the input array of size n, which takes O(n log n) time. Then, a binary search is performed on the range of possible distances, where W represents the maximum possible distance. Inside the binary search, for each guessed distance, we iterate through the sorted array to count the number of pairs with distance less than or equal to the guessed distance. This counting step takes O(n) time. Since the binary search iterates at most log W times, the overall time complexity of the binary search and counting is O(n log W). The total time complexity is therefore O(n log n + n log W).
Space Complexity
O(1)The algorithm primarily uses a binary search approach, focusing on manipulating the input array in place after sorting. It stores a few variables for the binary search boundaries (min_distance, max_distance, mid_distance), and a counter during the counting phase. The space used by these variables is constant and independent of the input size N, so the auxiliary space complexity is O(1).

Edge Cases

Empty or null input array
How to Handle:
Return 0 or throw an exception, as there are no pairs and thus no distances.
Input array with a single element
How to Handle:
Return 0 or throw an exception, as there are no pairs to calculate distances.
k is less than or equal to 0
How to Handle:
Return an error or throw an exception since k must be a positive integer representing the k-th smallest distance.
k is larger than the number of possible pairs (n * (n-1) / 2)
How to Handle:
Return the largest possible distance (max(nums) - min(nums)) or throw an exception, indicating k is out of bounds.
Input array contains duplicate numbers
How to Handle:
The algorithm should handle duplicates correctly by considering all possible pairs, including those formed by identical numbers (distance of 0).
Input array contains large numbers that could lead to integer overflow when calculating differences
How to Handle:
Use long data type to store differences, or consider the absolute values within the range of integer type during distance calculation.
Input array contains negative numbers
How to Handle:
The solution should correctly handle negative numbers since the absolute difference is used.
Input array with all identical numbers
How to Handle:
All pair distances will be 0, so return 0 if k is within the valid range (1 to n*(n-1)/2), or handle out-of-bounds k accordingly.