Taro Logo

Find All Duplicates in an Array

Medium
Apple logo
Apple
3 views
Topics:
Arrays

Given an integer array nums of length n where all the integers of nums are in the range [1, n] and each integer appears at most twice, return an array of all the integers that appears twice.

You must write an algorithm that runs in O(n) time and uses only constant auxiliary space, excluding the space needed to store the output

Example 1:

Input: nums = [4,3,2,7,8,2,3,1] Output: [2,3]

Example 2:

Input: nums = [1,1,2] Output: [1]

Example 3:

Input: nums = [1] Output: []

Describe the optimal algorithm to solve this problem and include code. Explain the time and space complexity. Finally, discuss any edge cases.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can I assume that all numbers in the input array `nums` are within the range [1, n], where n is the length of the array?
  2. What should I return if there are no duplicate numbers in the input array?
  3. Are the integers in the input array guaranteed to be whole numbers, or could they be floats?
  4. If a number appears more than twice, should I include it multiple times in the output array, or just once?
  5. Is the order of the returned duplicate numbers significant, or can I return them in any order?

Brute Force Solution

Approach

The brute force method for finding duplicates is like comparing each item with every other item. We essentially check every possible pair to see if they match. If we find a match, we know we've found a duplicate.

Here's how the algorithm would work step-by-step:

  1. Take the first item in the collection.
  2. Compare it to every other item in the collection.
  3. If you find an item that is the same as the first one, record it as a duplicate.
  4. Now, take the second item in the collection.
  5. Compare it to every other item in the collection (you may or may not include comparisons to items already checked).
  6. Again, if you find an item that is the same, record it as a duplicate.
  7. Repeat this process for each item in the collection until you have compared every possible pair.
  8. At the end, you will have a list of all the duplicate items.

Code Implementation

def find_all_duplicates_brute_force(input_numbers):
    duplicate_numbers = []

    # Iterate through each number in the input list
    for current_index in range(len(input_numbers)):
        for comparison_index in range(len(input_numbers)):

            # Avoid comparing the number to itself
            if current_index != comparison_index:
                if input_numbers[current_index] == input_numbers[comparison_index]:

                    # Prevent adding the same duplicate multiple times
                    if input_numbers[current_index] not in duplicate_numbers:
                        duplicate_numbers.append(input_numbers[current_index])

    return duplicate_numbers

Big(O) Analysis

Time Complexity
O(n²)The brute force approach iterates through the array of n elements. For each element, it compares it with all the other elements in the array, which takes n comparisons in the worst case. Since this comparison happens for each of the n elements, the overall time complexity becomes proportional to n * n. Thus the Big O time complexity is O(n²).
Space Complexity
O(1)The brute force method, as described, iterates through the input array and compares elements. It only needs to store a few variables such as the current element being compared and potentially some index variables to keep track of positions within the array. The number of these variables does not scale with the input size N (where N is the number of elements in the array). Thus, the space required remains constant regardless of the input array's size, resulting in O(1) space complexity.

Optimal Solution

Approach

The trick here is to use the array itself as a kind of tracker. We exploit the fact that the array contains numbers within a specific range to mark seen numbers in place.

Here's how the algorithm would work step-by-step:

  1. Go through the array, one number at a time.
  2. For each number, use its value to find a corresponding position within the array.
  3. Change the number at that position to indicate we've seen the original number.
  4. If, when we go to change a number, it's already been changed before, that means we've found a duplicate.
  5. Keep track of all the numbers we find that are duplicates.
  6. Once we've gone through the entire array, report the numbers we identified as duplicates.

Code Implementation

def find_duplicates(numbers):
    duplicates = []
    for index in range(len(numbers)):
        current_number = abs(numbers[index])
        corresponding_index = current_number - 1

        # If the value at corresponding_index is negative, it's a duplicate
        if numbers[corresponding_index] < 0:
            duplicates.append(current_number)

        # Mark the number as seen by negating the value at corresponding_index.
        else:
            numbers[corresponding_index] *= -1

    # Restore original array values
    for index in range(len(numbers)):
        numbers[index] = abs(numbers[index])

    return duplicates

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array of size n exactly once. Inside the loop, it performs a constant number of operations: calculating an index, accessing an array element, and potentially modifying an array element or adding to a list of duplicates. Since the number of operations within the loop is constant and the loop runs n times, the time complexity is directly proportional to n, resulting in O(n).
Space Complexity
O(1)The algorithm primarily modifies the input array in place. The only auxiliary space used is for storing the list of duplicates. However, the plain English explanation states that we only need to 'keep track' of duplicates and report them at the end. There is no explicit constraint that duplicates must be stored in a separate data structure before reporting them. If duplicates are reported without storage, then only a constant amount of extra space is required for a few variables, independent of the input array size N. Thus the auxiliary space complexity is O(1).

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn an empty list to indicate no duplicates are found in an invalid input.
Input array with size 1Return an empty list because a single element cannot be a duplicate.
Input array where all elements are the sameThe algorithm should correctly identify this single element as duplicated n-1 times if n is the size of array, returning it once.
Input array where all numbers from 1 to n are present exactly once (no duplicates)Return an empty list because there are no duplicates as requested.
Input array contains the number n multiple timesThe algorithm handles repeated occurrences of n correctly because it's still within the [1,n] range.
Large input array (close to memory limits)Choose an algorithm with O(n) space complexity or less to avoid memory overflow, such as using in-place modification.
Array with only two elements that are the same.The algorithm should correctly identify this as a single duplicate to return.
Input array contains Integer.MAX_VALUEEnsure the algorithm doesn't cause integer overflow when performing calculations or comparisons involving MAX_VALUE.