Taro Logo

K Closest Points to Origin

Medium
LinkedIn logo
LinkedIn
0 views
Topics:
Arrays

Given an array of points where points[i] = [xi, yi] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0).

The distance between two points on the X-Y plane is the Euclidean distance (i.e., √(x1 - x2)2 + (y1 - y2)2).

You may return the answer in any order. The answer is guaranteed to be unique (except for the order that it is in).

Example 1:

Input: points = [[1,3],[-2,2]], k = 1
Output: [[-2,2]]
Explanation:
The distance between (1, 3) and the origin is sqrt(10).
The distance between (-2, 2) and the origin is sqrt(8).
Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin.
We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]].

Example 2:

Input: points = [[3,3],[5,-1],[-2,4]], k = 2
Output: [[3,3],[-2,4]]
Explanation: The answer [[-2,4],[3,3]] would also be accepted.

Constraints:

  • 1 <= k <= points.length <= 104
  • -104 <= xi, yi <= 104

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected data type for the coordinates? Are they integers or floating-point numbers?
  2. What are the bounds on the coordinate values for the points?
  3. Is k guaranteed to be less than or equal to the number of points in the input array?
  4. Should the returned array of points be sorted in any particular order, such as by increasing distance from the origin?
  5. If two points have the same distance from the origin, how should I determine which one comes first in the output?

Brute Force Solution

Approach

The brute force approach to finding the K closest points is like checking the distance of every single point to the origin. Then, we pick the K points with the smallest distances. It's straightforward but not the most efficient way.

Here's how the algorithm would work step-by-step:

  1. For each point, calculate its distance from the origin (the point (0, 0)).
  2. Store the calculated distances along with the corresponding points.
  3. Sort all the points based on their distances from the origin, from smallest to largest.
  4. Select the first K points from the sorted list. These are the K closest points to the origin.

Code Implementation

def k_closest_points_brute_force(points, k):
    distances_with_points = []

    # Calculate distance for each point and store it
    for point in points:
        distance_to_origin = (point[0]**2) + (point[1]**2)
        distances_with_points.append((distance_to_origin, point))

    # Sort points by distance, essential for picking closest
    distances_with_points.sort()

    closest_points = []
    # Select the k closest points after sorting
    for i in range(k):
        closest_points.append(distances_with_points[i][1])

    return closest_points

Big(O) Analysis

Time Complexity
O(n log n)The brute force approach involves calculating the distance for each of the n points. Then, we sort all n points based on their calculated distances. The distance calculation for each point is O(1), contributing O(n) overall. The sorting step takes O(n log n) time. Since sorting dominates the distance calculation, the overall time complexity is O(n log n).
Space Complexity
O(N)The space complexity is determined by the auxiliary space used to store the calculated distances and corresponding points. We store the distance along with each of the N points. Specifically, we create a list to store these distances and corresponding points before sorting them. Therefore, the space complexity is directly proportional to the number of input points, which results in O(N) space complexity.

Optimal Solution

Approach

The goal is to find the closest points without sorting all of them. We achieve this by focusing on efficiently finding the 'k-th' closest point and then gathering all points closer than or equal to it. This is much faster than sorting everything.

Here's how the algorithm would work step-by-step:

  1. First, figure out a way to quickly determine how far each point is from the origin (the center).
  2. Next, imagine the points are scattered around. Instead of sorting *all* of them by distance, pick a random point and partition the other points: points closer than it, and points further than it.
  3. If the number of points closer to the origin than our chosen point is exactly what we need (k-1 points since we count our current partition point), we are done. Our chosen point is the kth closest, so everything at that distance or smaller should be returned.
  4. If there are fewer than k points closer, we know the k-th closest point is among those further away. We need to look at the points further away and adjust our 'k' value by how many items we already added.
  5. If there are more than k points closer, the k-th closest is among those closer to the origin. Discard the further ones.
  6. Repeat the process of partitioning the points around a new random point from the remaining pool until you've found the k-th closest point.
  7. Finally, gather all the points whose distances are less than or equal to the distance of the k-th closest point. This is our final answer.

Code Implementation

import random

def k_closest_points_to_origin(points, k):
    def calculate_distance_to_origin(point):
        return point[0]**2 + point[1]**2

    def partition(points, left_index, right_index, pivot_index):
        pivot_distance = calculate_distance_to_origin(points[pivot_index])
        points[pivot_index], points[right_index] = points[right_index], points[pivot_index]
        store_index = left_index

        for i in range(left_index, right_index):
            if calculate_distance_to_origin(points[i]) <= pivot_distance:
                points[store_index], points[i] = points[i], points[store_index]
                store_index += 1

        points[right_index], points[store_index] = points[store_index], points[right_index]
        return store_index

    def quickselect(points, left_index, right_index, k_smallest_index):
        # Only one element, return it.
        if left_index == right_index:
            return points[left_index]

        while True:
            pivot_index = random.randint(left_index, right_index)
            pivot_index = partition(points, left_index, right_index, pivot_index)

            # Found the k-th smallest
            if k_smallest_index == pivot_index:
                return points[k_smallest_index]
            elif k_smallest_index < pivot_index:
                right_index = pivot_index - 1
            else:
                left_index = pivot_index + 1

    kth_closest_point = quickselect(points, 0, len(points) - 1, k - 1)
    kth_closest_distance = calculate_distance_to_origin(kth_closest_point)

    # Gather all points within the k-th distance.
    result = []
    for point in points:
        if calculate_distance_to_origin(point) <= kth_closest_distance:
            result.append(point)

    # Return the result
    return result

Big(O) Analysis

Time Complexity
O(n)The algorithm uses a quickselect approach, which partitions the array around a pivot. On average, each partition step reduces the problem size. While the worst-case scenario could be O(n^2), a good pivot selection (e.g., random) ensures that the average time complexity is linear, O(n), because each step eliminates a significant portion of the input. Finding all points within the k-th distance once it is found also takes O(n) time but does not affect the overall Big O notation as it is not nested within the quickselect.
Space Complexity
O(1)The algorithm primarily operates in-place by partitioning the input array. While recursion is used, the plain English description doesn't imply an uncontrolled recursion depth that scales directly with N, where N is the number of points. We are adjusting the search space in each recursive call. The auxiliary space used consists of a few variables for indexing and distance calculations, whose memory footprint remains constant regardless of the input size. Therefore, the auxiliary space complexity is O(1).

Edge Cases

CaseHow to Handle
Empty input list of pointsReturn an empty list if the input list is null or empty.
k is zeroIf k is zero, return an empty list as no points are requested.
k is larger than the number of pointsReturn all points if k exceeds the number of points.
Points with identical distances to the originThe algorithm must handle ties appropriately; it should return any k points with the smallest distances.
Points with very large x and y coordinates (potential integer overflow)Use long or double for distance calculation to prevent integer overflow if coordinates are very large.
Points with negative x and y coordinatesThe distance calculation should correctly handle negative coordinates (squaring eliminates the negative sign).
k equals the total number of pointsReturn the entire input list without any filtering.
Very large input list (performance considerations)A heap-based solution (priority queue) is preferred for optimal performance in selecting the k closest points, avoiding sorting the entire list.