K Closest Points to Origin

Medium
7 days ago

Given an array of points where points[i] = [x_i, y_i] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0). The distance between two points on the X-Y plane is the Euclidean distance (i.e., √(x_1 - x_2)² + (y_1 - y_2)²). You may return the answer in any order. The answer is guaranteed to be unique (except for the order that it is in).

For example:

  • points = [[1,3],[-2,2]], k = 1 should return [[-2,2]] because the distance between (1, 3) and the origin is sqrt(10), the distance between (-2, 2) and the origin is sqrt(8), and sqrt(8) < sqrt(10).
  • points = [[3,3],[5,-1],[-2,4]], k = 2 should return [[3,3],[-2,4]] (or [[-2,4],[3,3]]).
Sample Answer
import heapq
import math

class Solution:
    def kClosest(self, points: list[list[int]], k: int) -> list[list[int]]:
        """ 
        Finds the k closest points to the origin (0, 0) in a list of points.
        
        Args:
            points (list[list[int]]): A list of points, where each point is a list [x, y].
            k (int): The number of closest points to return.
            
        Returns:
            list[list[int]]: A list containing the k closest points to the origin.
        """
        # Use a max heap to store the k closest points seen so far.
        # The heap will store tuples of (distance, point).
        max_heap = []
        
        for x, y in points:
            distance = -(x**2 + y**2)  # Negate for max heap
            if len(max_heap) < k:
                heapq.heappush(max_heap, (distance, (x, y)))
            elif distance > max_heap[0][0]:
                heapq.heapreplace(max_heap, (distance, (x, y)))
        
        # Extract the points from the heap and return them.
        return [point for distance, point in max_heap]

# Example Usage
if __name__ == '__main__':
    points1 = [[1, 3], [-2, 2]]
    k1 = 1
    result1 = Solution().kClosest(points1, k1)
    print(f"Example 1: Closest {k1} points to the origin are: {result1}")

    points2 = [[3, 3], [5, -1], [-2, 4]]
    k2 = 2
    result2 = Solution().kClosest(points2, k2)
    print(f"Example 2: Closest {k2} points to the origin are: {result2}")

Naive Solution

A naive solution would involve calculating the distance of each point from the origin, sorting all the points based on their distances, and then selecting the first k points. This approach is simple to understand but not the most efficient, especially for large datasets.

import math

def k_closest_naive(points: list[list[int]], k: int) -> list[list[int]]:
    """Naive solution to find the k closest points to the origin."""
    distances = []
    for x, y in points:
        distance = math.sqrt(x**2 + y**2)
        distances.append((distance, [x, y]))
    
    distances.sort(key=lambda item: item[0])
    
    result = [point for distance, point in distances[:k]]
    return result

Optimal Solution

The optimal solution involves using a max heap (priority queue). We iterate through the points, maintaining a heap of size k that contains the k closest points seen so far. For each new point, we compare its distance to the distance of the farthest point in the heap (i.e., the root of the max heap). If the new point is closer, we replace the farthest point with the new point. This ensures that the heap always contains the k closest points.

Big(O) Run-time Analysis

The optimal solution using a max heap has a time complexity of O(N log K), where N is the number of points and K is the number of closest points to find. Here's why:

  • For each point, we calculate its distance from the origin in O(1) time.
  • We potentially insert the point into the max heap. Heap insertion takes O(log K) time because we need to maintain the heap property, and the heap size is limited to K.
  • We iterate through all N points, so the overall time complexity is O(N log K).

The naive solution of sorting all distances has a time complexity of O(N log N), where N is the number of points. Therefore, the heap-based solution is more efficient when K is significantly smaller than N.

Big(O) Space Usage Analysis

The space complexity of the optimal solution is O(K), where K is the number of closest points to find. This is because the max heap stores at most K points at any given time. The space used by the max heap dominates the space complexity of the algorithm.

The naive solution requires O(N) space to store all the distances and points, so the heap-based approach is more space-efficient when k < N.

Edge Cases and How to Handle Them

  1. Empty Input: If the input list of points is empty, return an empty list.
  2. k = 0: If k is 0, return an empty list.
  3. k > len(points): If k is greater than the number of points, return all the points.
  4. Duplicate Points: If there are duplicate points, the algorithm will still work correctly, as it compares distances. Duplicate points closer to the origin will be included in the result.
  5. Points with Equal Distance: If there are points with equal distance, the algorithm may return any combination of those points as long as the total number of points returned is k. The order is not guaranteed.

These edge cases can be handled with appropriate conditional checks at the beginning of the function to ensure correct and robust behavior.