Taro Logo

K Closest Points to Origin

Medium
a month ago

Given an array of points where points[i] = [x<sub>i</sub>, y<sub>i</sub>] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0). The distance between two points on the X-Y plane is the Euclidean distance (i.e., &radic;(x<sub>1</sub> - x<sub>2</sub>)<sup>2</sup> + (y<sub>1</sub> - y<sub>2</sub>)<sup>2</sup>).

You may return the answer in any order. The answer is guaranteed to be unique (except for the order that it is in).

Example 1: points = [[1,3],[-2,2]], k = 1 Output: [[-2,2]]

Example 2: points = [[3,3],[5,-1],[-2,4]], k = 2 Output: [[3,3],[-2,4]]

Explain a solution with optimal time and space complexity. Also analyze the time and space complexity of your proposed solution. What are the edge cases to consider?

Sample Answer
import heapq
import math


# Naive Solution: Calculate distances and sort
def k_closest_naive(points, k):
    distances = []
    for x, y in points:
        dist = math.sqrt(x**2 + y**2)
        distances.append((dist, [x, y]))

    distances.sort()
    result = []
    for i in range(k):
        result.append(distances[i][1])
    return result


# Optimal Solution: Using a Max Heap (Priority Queue)
def k_closest_optimal(points, k):
    heap = []
    for x, y in points:
        dist_sq = x**2 + y**2  # Avoid sqrt for efficiency
        if len(heap) < k:
            heapq.heappush(heap, (-dist_sq, [x, y]))  # Negate distance for max-heap
        else:
            if dist_sq < -heap[0][0]:  # Compare with the largest distance in the heap
                heapq.heappop(heap)
                heapq.heappush(heap, (-dist_sq, [x, y]))

    result = [point for _, point in heap]
    return result


# Example Usage
points1 = [[1, 3], [-2, 2]]
k1 = 1
print("Naive Output 1:", k_closest_naive(points1, k1))
print("Optimal Output 1:", k_closest_optimal(points1, k1))

points2 = [[3, 3], [5, -1], [-2, 4]]
k2 = 2
print("Naive Output 2:", k_closest_naive(points2, k2))
print("Optimal Output 2:", k_closest_optimal(points2, k2))


# Big O Runtime Analysis for Naive Solution
# Calculating the distance for each point takes O(1) time, and we do this for n points, so it's O(n).
# Sorting the distances takes O(n log n) time.
# Therefore, the overall runtime complexity of the naive solution is O(n log n).

# Big O Space Usage Analysis for Naive Solution
# We store the distances and points in a list, which takes O(n) space.
# Therefore, the space complexity of the naive solution is O(n).


# Big O Runtime Analysis for Optimal Solution
# Building the heap takes O(n log k) time because for each of the n points, we potentially perform a heap push or pop operation, each costing O(log k).
# Therefore, the overall runtime complexity of the optimal solution is O(n log k).

# Big O Space Usage Analysis for Optimal Solution
# We maintain a heap of size k, so the space complexity is O(k).


# Edge Cases and Considerations:
# 1. Empty Input: If the input list of points is empty, return an empty list.
# 2. k = 0: If k is 0, return an empty list.
# 3. k > number of points: If k is greater than the number of points, return all the points.
# 4. Duplicate Points: If there are duplicate points, the algorithm should handle them correctly and return the k closest points, including duplicates if they fall within the k closest.
# 5. Large Coordinates: The algorithm should handle large coordinate values without overflow issues (using integers to represent squared distances helps).
# 6. Floating Point Precision: When comparing distances, be mindful of potential floating-point precision issues.  However, in this case, we avoid using sqrt() which mitigates the problem.

def k_closest_empty_check(points, k):
    if not points or k == 0:
        return []
    if k > len(points):
        return points
    return k_closest_optimal(points, k)


points3 = []
k3 = 1
print("Empty Check Output 1:", k_closest_empty_check(points3, k3))

points4 = [[1, 3], [-2, 2]]
k4 = 0
print("Empty Check Output 2:", k_closest_empty_check(points4, k4))

points5 = [[1, 3], [-2, 2]]
k5 = 3
print("Empty Check Output 3:", k_closest_empty_check(points5, k5))

Code Explanation:

The problem asks to find the k closest points to the origin in a given array of points. The distance is calculated using the Euclidean distance formula.

1. Naive Solution:

  • Calculate the Euclidean distance for each point from the origin.
  • Store the distances along with their corresponding points in a list.
  • Sort the list of distances.
  • Return the first k points from the sorted list.

2. Optimal Solution (Using Max Heap):

  • Use a max heap (priority queue) to keep track of the k closest points seen so far.
  • Iterate through the points, calculating the squared Euclidean distance (to avoid the square root operation for efficiency).
  • If the heap has fewer than k elements, add the point to the heap.
  • If the heap is full (has k elements), compare the current point's distance with the largest distance in the heap (the root of the max heap).
  • If the current point is closer than the farthest point in the heap, remove the farthest point and add the current point to the heap.
  • After processing all points, the heap contains the k closest points.

3. Big O Runtime Analysis (Naive):

  • Calculating distances: O(n)
  • Sorting distances: O(n log n)
  • Total: O(n log n)

4. Big O Space Usage Analysis (Naive):

  • O(n) to store distances and points.

5. Big O Runtime Analysis (Optimal):

  • Building heap: O(n log k)

6. Big O Space Usage Analysis (Optimal):

  • O(k) to maintain the heap.

7. Edge Cases and Considerations:

  • Empty input, k = 0, k > number of points are handled gracefully by returning empty lists or the original list.
  • Duplicate points are handled correctly.
  • Large coordinates are handled using squared distances (integers), avoiding potential overflow.
  • Floating point precision issues are mitigated by using squared distances, avoiding the sqrt() function.