Taro Logo

Find K Closest Elements

Medium
DoorDash logo
DoorDash
2 views
Topics:
ArraysBinary SearchTwo Pointers

Given a sorted integer array arr, two integers k and x, return the k closest integers to x in the array. The result should also be sorted in ascending order.

An integer a is closer to x than an integer b if:

  • |a - x| < |b - x|, or
  • |a - x| == |b - x| and a < b

Example 1:

Input: arr = [1,2,3,4,5], k = 4, x = 3

Output: [1,2,3,4]

Example 2:

Input: arr = [1,1,2,3,4,5], k = 4, x = -1

Output: [1,1,2,3]

Constraints:

  • 1 <= k <= arr.length
  • 1 <= arr.length <= 104
  • arr is sorted in ascending order.
  • -104 <= arr[i], x <= 104

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible ranges for the values in the array `arr`, the integer `x`, and the integer `k`? Can they be negative?
  2. If the array `arr` is empty or if `k` is zero, what should I return?
  3. If there are multiple sets of k closest elements with the same closeness, is there a specific set I should return (e.g., the one with the smallest elements)?
  4. Is `k` always less than or equal to the length of the array `arr`?
  5. Are duplicate values allowed in the array `arr`, and if so, how should they be handled when determining the closest elements?

Brute Force Solution

Approach

The brute force strategy is like checking every possible combination. We'll look at all possible groups of numbers and see which one fits our criteria the best.

Here's how the algorithm would work step-by-step:

  1. Consider every possible set of numbers that is exactly the right size we need (K numbers).
  2. For each set of numbers, calculate how far each number is from the target value.
  3. Add up those distances to get a total distance for that set.
  4. Compare the total distance of each set with the total distance of the other sets.
  5. The set with the smallest total distance from the target is the answer.

Code Implementation

def find_k_closest_elements_brute_force(array, k_closest, target):    number_of_elements = len(array)
    min_distance = float('inf')
    result = []
    # Iterate through all possible subarrays of size k_closest.
    for start_index in range(number_of_elements - k_closest + 1):

        current_subarray = array[start_index:start_index + k_closest]
        total_distance = 0

        # Calculate the total distance of the current subarray from the target
        for element in current_subarray:
            total_distance += abs(element - target)

        # We want to keep track of the minimum distance seen thus far
        if total_distance < min_distance:

            min_distance = total_distance
            result = current_subarray

    return result

Big(O) Analysis

Time Complexity
O(nCk)The algorithm iterates through all possible subsets of size k from the input array of size n. The number of such subsets is given by the binomial coefficient n choose k, which is denoted as nCk or (n k). For each of these subsets, the algorithm calculates the sum of distances from each element in the subset to the target value. This distance calculation takes O(k) time. Therefore, the overall time complexity is O(nCk * k). Since k is a part of the input, and we are considering all possible subsets, we can consider this as O(nCk).
Space Complexity
O(K)The brute force solution described iterates through all possible subsets of size K from the input array of size N. To evaluate each subset, it implicitly creates a temporary list or array of size K to hold the candidate elements. This temporary storage is needed to calculate the total distance from the target value for that particular subset. Therefore, the auxiliary space required scales linearly with K, representing the size of the considered subset. In Big O notation, the space complexity is O(K).

Optimal Solution

Approach

The goal is to find the elements closest to a given value in a sorted collection. The best way to do this is by focusing on the area near the target value and shrinking it to find the best set of elements.

Here's how the algorithm would work step-by-step:

  1. First, find the element in the collection that is closest to the target value using a method that quickly narrows down the possibilities.
  2. Then, consider a window of elements around that closest element with the size equal to the number of elements you need to find.
  3. Next, shrink this window either from the left or right side depending on which element is farther away from the target value. This helps ensure you're always including the closest elements.
  4. Continue shrinking the window until it contains exactly the number of elements that you need.
  5. The elements that remain in the window are the K closest elements to the target value.

Code Implementation

def find_k_closest_elements(sorted_array, number_of_closest_elements, target):
    left_index = 0
    right_index = len(sorted_array) - 1
    closest_index = -1

    while left_index <= right_index:
        middle_index = (left_index + right_index) // 2
        if sorted_array[middle_index] == target:
            closest_index = middle_index
            break
        elif sorted_array[middle_index] < target:
            left_index = middle_index + 1
        else:
            right_index = middle_index - 1

    if closest_index == -1:
        closest_index = left_index

    # Define the initial window around the closest element.
    window_start = max(0, closest_index - number_of_closest_elements // 2)
    window_end = min(len(sorted_array) - 1, window_start + number_of_closest_elements - 1)
    if window_end - window_start + 1 < number_of_closest_elements:
        window_start = max(0, window_end - number_of_closest_elements + 1)

    # Shrink the window until it has the exact number of elements.
    while window_end - window_start + 1 > number_of_closest_elements:
        # Shrink from the side with elements farther from target
        if abs(sorted_array[window_start] - target) <= abs(sorted_array[window_end] - target):
            window_end -= 1
        else:
            window_start += 1

    # This is the key to returning the list of closest elements.
    return sorted_array[window_start:window_end + 1]

Big(O) Analysis

Time Complexity
O(log n + k)The algorithm first uses binary search to find the element closest to the target value in the sorted array of size n. This binary search operation takes O(log n) time. After finding the closest element, the window shrinking process involves comparing elements and adjusting the window boundaries. This shrinking process continues until the window contains k elements, leading to a maximum of k comparisons in the worst case. Therefore, the overall time complexity is O(log n + k).
Space Complexity
O(1)The algorithm's space complexity is determined by the window of size k it considers, and the shrinking process that modifies the window's boundaries. However, this process primarily involves manipulating indices and does not create any significant auxiliary data structures that scale with the input array size N. Therefore, the space used is constant regardless of the size of the input array or the value of k. The space complexity is O(1).

Edge Cases

CaseHow to Handle
Empty input arrayReturn an empty list immediately as there are no elements to compare.
k is greater than the array lengthReturn the entire sorted array since we need to return k closest elements and k exceeds the number of available elements.
Array has only one elementReturn the array as is if k is 1, return an empty list if k is 0, and throw an exception if k > 1.
x is smaller than the smallest element in arrReturn the first k elements of the array as these are the closest.
x is larger than the largest element in arrReturn the last k elements of the array as these are the closest.
Array contains duplicate elements and x is one of the duplicatesThe binary search needs to be adjusted to handle potentially many elements equally close to x, prefering the smaller values.
Large input array size and k is also large, but significantly smaller than the array size, impacting performance.Ensure efficient binary search is used to find the initial closest element and then expand outwards; avoid quadratic-time comparisons.
Integer overflow when calculating the absolute difference if arr[i] or x is Integer.MIN_VALUE or Integer.MAX_VALUEUse long data type for the absolute difference calculation to avoid overflow errors.