Taro Logo

Max Points on a Line

Hard
Google logo
Google
1 view
Topics:
Arrays

Given an array of points where points[i] = [x<sub>i</sub>, y<sub>i</sub>] represents a point on the X-Y plane, return the maximum number of points that lie on the same straight line.

For example:

Input: points = [[1,1],[2,2],[3,3]]
Output: 3

In this case, all three points lie on the same line (y = x), so the answer is 3.

Another example:

Input: points = [[1,1],[3,2],[5,3],[4,1],[2,3],[1,4]]
Output: 4

In this case, there is no line that contains more than 4 points. A line containing 4 points is possible.

Here are the constraints:

  • 1 <= points.length <= 300
  • points[i].length == 2
  • -10<sup>4</sup> <= x<sub>i</sub>, y<sub>i</sub> <= 10<sup>4</sup>
  • All the points are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values for the x and y coordinates of the points? Are they integers or floating-point numbers?
  2. Can there be duplicate points in the input? If so, how should they be handled?
  3. If all points are on the same line, should I return the number of points, or is there a specific line that needs to be identified?
  4. If the input array is empty or null, what should I return?
  5. What is the expected behavior if the points are very close together but not exactly on the same line, potentially due to floating-point precision issues?

Brute Force Solution

Approach

We need to find the most points that lie on the same straight line. The brute force approach involves checking every possible line that can be formed by any two points, and then counting how many other points fall on that same line.

Here's how the algorithm would work step-by-step:

  1. Take the first point and the second point. Imagine a line going through both of them.
  2. Now, go through all the other points one by one and see if they also fall on this imaginary line.
  3. Count how many points are on this line.
  4. Remember this count. Now, repeat this process for all possible pairs of points.
  5. After checking all possible lines, find the largest count we kept track of. That is the answer.

Code Implementation

def max_points_on_a_line(points):
    number_of_points = len(points)
    if number_of_points <= 2:
        return number_of_points

    max_points = 0

    for i in range(number_of_points):
        for j in range(i + 1, number_of_points):
            # Consider each pair of points to form a line
            points_on_line = 2

            for k in range(j + 1, number_of_points):
                # Check if any other points lie on the same line
                point_a = points[i]
                point_b = points[j]
                point_c = points[k]

                # Check if the cross product is 0 to see if collinear
                if (point_b[1] - point_a[1]) * (point_c[0] - point_b[0]) == (point_c[1] - point_b[1]) * (point_b[0] - point_a[0]):
                    points_on_line += 1

            # Update the maximum points found so far
            if points_on_line > max_points:
                max_points = points_on_line

    return max_points

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible pairs of points to define a line. This involves a nested loop, resulting in approximately n * (n-1) / 2 pairs, which is O(n^2). For each pair (i.e., for each line), the algorithm then iterates through all the remaining points to check if they lie on the same line. This check takes O(n) time. Therefore, the overall time complexity is O(n^2) * O(n) = O(n^3).
Space Complexity
O(1)The described brute force approach primarily iterates through pairs of points and checks if other points lie on the line formed by each pair. This involves keeping track of a current count of points on a line and the maximum count seen so far. These counts, as well as the loop index variables, require constant extra space, independent of the number of points, N. Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The key is to consider each point as a potential origin and see how many other points fall on the same line emanating from that origin. Instead of exhaustively checking every possible line, we cleverly group points by their slopes relative to our chosen origin point.

Here's how the algorithm would work step-by-step:

  1. Pick one point from the group of points; we'll treat this as our origin.
  2. For every other point, calculate the slope between it and the origin point.
  3. Count how many points share the same slope from the origin. These points lie on the same line.
  4. Keep track of the maximum number of points found on a line from that origin.
  5. Do this for every point in the group, considering each as a potential origin.
  6. Also, handle the special case where multiple points are exactly the same (duplicate points) from the origin. Handle vertical line cases to avoid division by zero issues.
  7. The largest number of points found on a line from any origin is your answer.

Code Implementation

def max_points_on_a_line(points):
    max_number_of_points = 0
    number_of_points = len(points)

    for i in range(number_of_points):
        slopes = {}
        duplicate_points = 1
        vertical_points = 0
        current_max = 0

        for j in range(i + 1, number_of_points):
            # Check for duplicate points.
            if points[i][0] == points[j][0] and points[i][1] == points[j][1]:
                duplicate_points += 1
                continue

            # Handle vertical lines (division by zero).
            if points[i][0] == points[j][0]:
                vertical_points += 1
                current_max = max(current_max, vertical_points)
                continue

            slope = (points[j][1] - points[i][1]) / (points[j][0] - points[i][0])
            slopes[slope] = slopes.get(slope, 0) + 1
            current_max = max(current_max, slopes[slope])

        # Update the maximum points on a line from this origin.
        max_number_of_points = max(max_number_of_points, current_max + duplicate_points)

    return max_number_of_points

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n points, considering each as a potential origin. For each origin point, it iterates through the remaining points (at most n-1 points) to calculate the slope and determine how many points lie on the same line. The slope calculation and comparison take constant time. Therefore, the dominant operation involves comparing each point to every other point, resulting in a nested loop structure with approximately n * (n-1) operations. This simplifies to O(n²).
Space Complexity
O(N)The algorithm iterates through each point as a potential origin. For each origin point, a hash map (or dictionary) is used to store the slopes between that origin point and all other points, along with the count of points sharing that slope. In the worst-case scenario, where no three points are collinear, the hash map could potentially store N-1 slope-count pairs, where N is the total number of points. Therefore, the auxiliary space required is proportional to N, resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
Empty input list of pointsReturn 0, as no points exist to form a line.
Input list contains only one pointReturn 1, as a single point lies on a line by itself.
Input list contains two pointsReturn 2, as two points always form a line.
All points are the same (duplicates)The slope calculation will result in undefined or NaN, but the equal points should be counted correctly within the same position/duplicate point logic.
Vertical line (infinite slope)Handle vertical lines as a special case in the slope calculation to avoid division by zero.
Floating-point precision issues when calculating slopeUse a tolerance value (epsilon) when comparing slopes to account for potential floating-point errors.
Large input dataset causing potential memory issuesEnsure the chosen data structures (e.g., hash maps) are efficient in terms of memory usage.
Integer overflow when calculating slope with very large coordinatesUse a larger data type (e.g., long) to store the numerators and denominators of the slopes to prevent integer overflow.