Taro Logo

Find the Duplicate Number

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+13
17 views
Topics:
ArraysTwo Pointers

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.

There is only one repeated number in nums, return this repeated number.

You must solve the problem without modifying the array nums and using only constant extra space.

Example 1:

Input: nums = [1,3,4,2,2]
Output: 2

Example 2:

Input: nums = [3,1,3,4,2]
Output: 3

Example 3:

Input: nums = [3,3,3,3,3]
Output: 3

Constraints:

  • 1 <= n <= 105
  • nums.length == n + 1
  • 1 <= nums[i] <= n
  • All the integers in nums appear only once except for precisely one integer which appears two or more times.

Follow up:

  • How can we prove that at least one duplicate number must exist in nums?
  • Can you solve the problem in linear runtime complexity?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum size of the input array `nums`? This will help me understand the scale of the problem.
  2. Is the input array guaranteed to contain only positive integers within the range [1, n], where n is the length of the array?
  3. The problem states there is exactly one duplicate number. Is it possible for a number to appear more than twice, or is it guaranteed to appear exactly twice?
  4. If, hypothetically, the input array did *not* contain a duplicate (which contradicts the problem statement but I want to clarify error handling), what should my function return?
  5. Can I modify the input array `nums` in place, or should I treat it as read-only?

Brute Force Solution

Approach

The brute force approach to finding a duplicate in a list of numbers is like manually comparing each number to every other number in the list. We systematically check all possible pairs until we find a match, meaning a number appears more than once.

Here's how the algorithm would work step-by-step:

  1. Take the first number in the list.
  2. Compare it to every other number in the list.
  3. If you find a match (another number that's the same), you've found the duplicate.
  4. If you don't find a match after checking all the other numbers, move on to the second number in the list.
  5. Compare the second number to every number after it in the list (you've already compared it to the first).
  6. Keep repeating this process, taking each number in turn and comparing it to all the numbers that come after it.
  7. As soon as you find any two numbers that are the same, that's the duplicate number.

Code Implementation

def find_duplicate_number_brute_force(numbers):
    list_length = len(numbers)

    for first_index in range(list_length):
        # Compare current number to the rest of the list

        for second_index in range(first_index + 1, list_length):
            # If a duplicate is found, we return it
            if numbers[first_index] == numbers[second_index]:

                return numbers[first_index]

    # Return -1 if no duplicate is found
    return -1

Big(O) Analysis

Time Complexity
O(n²)The brute force approach involves nested iterations. The outer loop iterates through each of the n elements in the array. For each element in the outer loop, the inner loop compares it to the remaining elements. In the worst case, we are comparing each element to almost every other element leading to roughly n * (n-1)/2 comparisons, which simplifies to O(n²).
Space Complexity
O(1)The brute force algorithm, as described, only uses a few variables for indexing into the input array. Specifically, it appears to use a variable to iterate through the array and another to compare each element to the rest of the array. The number of index variables remains constant regardless of the input size N. Therefore, the space complexity is constant.

Optimal Solution

Approach

The best way to find the duplicate number is by cleverly using the properties of the numbers themselves. We'll treat the problem like finding a loop in a linked list, where each number points to another number.

Here's how the algorithm would work step-by-step:

  1. Imagine each number in the group is a pointer to another number in the group.
  2. Start at the beginning and follow the 'pointers' (numbers) one at a time. Then, start at the beginning again but follow the 'pointers' two at a time.
  3. Eventually, the two 'pointers' will meet inside the loop created by the duplicate number. Find where they meet.
  4. Once they meet, reset one of the 'pointers' back to the beginning.
  5. Move both 'pointers' one at a time. The point where they meet again is the duplicate number.

Code Implementation

def find_duplicate_number(numbers):
    slow_pointer = numbers[0]
    fast_pointer = numbers[0]

    while True:
        slow_pointer = numbers[slow_pointer]
        fast_pointer = numbers[numbers[fast_pointer]]
        # Detect the intersection point of the two pointers
        if slow_pointer == fast_pointer:
            break

    pointer_to_beginning = numbers[0]

    # One pointer goes back to the beginning
    while pointer_to_beginning != slow_pointer:
        slow_pointer = numbers[slow_pointer]
        pointer_to_beginning = numbers[pointer_to_beginning]

    # The meeting point is the duplicate number
    return slow_pointer

Big(O) Analysis

Time Complexity
O(n)The algorithm involves two pointers traversing the array, where 'n' is the length of the input array. The first phase (finding the meeting point) involves each pointer moving at most 'n' steps. The second phase (finding the duplicate) also involves both pointers moving at most 'n' steps combined. Thus, the total number of operations is proportional to 'n', resulting in a time complexity of O(n).
Space Complexity
O(1)The algorithm uses only two pointers (variables), one moving one step at a time and another moving two steps at a time. Then, after they meet and one pointer is reset to the beginning, both pointers are moved one step at a time. The memory required for these pointers is independent of the input size N (where N is the number of elements in the input array). Therefore, the algorithm uses constant auxiliary space.

Edge Cases

CaseHow to Handle
Null or empty input arrayThrow an IllegalArgumentException or return -1 to indicate invalid input.
Array with only two elements, where one is the duplicateThe algorithm should correctly identify the duplicate in a size 2 array.
Array where the duplicate is the first elementThe algorithm should correctly identify the duplicate even when it's at the beginning.
Array where the duplicate is the last elementThe algorithm should correctly identify the duplicate even when it's at the end.
Large input array (close to memory limit)Consider using a space-efficient algorithm like the Floyd's Tortoise and Hare (cycle detection) method to avoid memory issues.
Array with all elements being the same (all are duplicates)The algorithm should correctly identify the single duplicate number in this scenario.
Array contains numbers outside the range [1, n]Throw an IllegalArgumentException or return -1 to indicate invalid input, as the problem statement defines the range.
Array where n is very large, potentially leading to integer overflow in calculationsChoose appropriate data types (e.g., long) or algorithm to avoid integer overflow.