Taro Logo

Min Cost to Connect All Points

Medium
1 views
2 months ago

You are given an array points representing integer coordinates of some points on a 2D-plane, where points[i] = [x<sub>i</sub>, y<sub>i</sub>]. The cost of connecting two points [x<sub>i</sub>, y<sub>i</sub>] and [x<sub>j</sub>, y<sub>j</sub>] is the manhattan distance between them: |x<sub>i</sub> - x<sub>j</sub>| + |y<sub>i</sub> - y<sub>j</sub>|, where |val| denotes the absolute value of val. Return the minimum cost to make all points connected. All points are connected if there is exactly one simple path between any two points. For example, given the input points = [[0,0],[2,2],[3,10],[5,2],[7,0]], the expected output is 20. As another example, if the input is points = [[3,12],[-2,5],[-4,1]], the expected output is 18. How would you approach solving this problem efficiently?

Sample Answer
# Minimum Cost to Connect All Points

This problem asks us to find the minimum cost to connect all points in a 2D plane, where the cost between two points is the Manhattan distance.  The goal is to connect all points with exactly one simple path between any two points, implying we need to find a minimum spanning tree.

## 1. Brute Force Solution (Conceptual)

A brute-force approach would involve considering all possible sets of edges that connect all points and finding the set with the minimum total cost. This is highly inefficient.

*   Generate all possible spanning trees.
*   Calculate the cost of each spanning tree.
*   Return the minimum cost.

This approach has exponential time complexity, making it impractical for larger inputs.

## 2. Optimal Solution: Prim's Algorithm

Prim's algorithm is a greedy algorithm that can efficiently find the minimum spanning tree. It works by iteratively adding the minimum-cost edge that connects a visited node to an unvisited node.

Here's a breakdown of the approach:

1.  **Initialization:**
    *   Start with an arbitrary point.
    *   Maintain a set of visited points and a priority queue (min-heap) of edges.
2.  **Iteration:**
    *   While there are unvisited points:
        *   Extract the minimum-cost edge from the priority queue.
        *   If the destination point of the edge is unvisited:
            *   Add the cost of the edge to the total cost.
            *   Mark the destination point as visited.
            *   Add all edges from the newly visited point to unvisited points to the priority queue.
3.  **Termination:**
    *   When all points are visited, the total cost is the minimum cost to connect all points.

```python
import heapq

def minCostConnectPoints(points):
    n = len(points)
    visited = [False] * n
    min_cost = 0
    pq = []  # (cost, node)

    # Start from the first point
    visited[0] = True
    for i in range(1, n):
        cost = abs(points[0][0] - points[i][0]) + abs(points[0][1] - points[i][1])
        heapq.heappush(pq, (cost, i))

    visited_count = 1
    while visited_count < n:
        cost, u = heapq.heappop(pq)
        if visited[u]:
            continue

        visited[u] = True
        min_cost += cost
        visited_count += 1

        for v in range(n):
            if not visited[v]:
                new_cost = abs(points[u][0] - points[v][0]) + abs(points[u][1] - points[v][1])
                heapq.heappush(pq, (new_cost, v))

    return min_cost

# Example usage:
points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
print(f"Minimum cost to connect all points: {minCostConnectPoints(points)}")

points = [[3,12],[-2,5],[-4,1]]
print(f"Minimum cost to connect all points: {minCostConnectPoints(points)}")

3. Big(O) Time Complexity Analysis

  • Initialization: O(N) to mark the starting node as visited and initially push the edges from the starting node to the priority queue.
  • Main Loop: O(N) iterations because we add n-1 edges.
  • Heap Operations: Each iteration extracts the min-cost edge from the priority queue (O(log N)). Adding all the vertices to the heap has a time complexity of O(N log N).
  • Edge Calculation: For each visited node, we calculate the cost to every other unvisited node to update the priority queue. Since there are n nodes, this step has a complexity of O(N^2).

Therefore, the overall time complexity is O(N^2 log N).

4. Big(O) Space Complexity Analysis

  • Visited Array: O(N) to keep track of visited nodes.
  • Priority Queue: In the worst case, the priority queue can contain edges between all pairs of nodes (n * (n-1) /2). Thus the size is O(N^2)

Therefore, the overall space complexity is O(N^2).

5. Edge Cases and Handling

  • Empty Input: If the input points array is empty, return 0.
  • Single Point: If there's only one point, the minimum cost is 0.
  • Duplicate Points: The problem statement specifies that all pairs are distinct, so we don't need to explicitly handle duplicate points.
  • Large Coordinates: The coordinates can be as large as 10^6. Use appropriate data types (e.g., int) to avoid overflow issues.