Taro Logo

Find All Duplicates in an Array

Medium
Microsoft logo
Microsoft
2 views
Topics:
Arrays

Given an integer array nums of length n where all the integers of nums are in the range [1, n] and each integer appears at most twice, return an array of all the integers that appears twice.

You must write an algorithm that runs in O(n) time and uses only constant auxiliary space, excluding the space needed to store the output

Example 1:

Input: nums = [4,3,2,7,8,2,3,1]
Output: [2,3]

Example 2:

Input: nums = [1,1,2]
Output: [1]

Example 3:

Input: nums = [1]
Output: []

Constraints:

  • n == nums.length
  • 1 <= n <= 105
  • 1 <= nums[i] <= n
  • Each element in nums appears once or twice.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected return if no duplicates exist in the input array?
  2. Can I modify the input array in place, or do I need to preserve its original state?
  3. Is the array guaranteed to contain only positive integers, specifically within the range [1, n] where n is the length of the array?
  4. Are there any constraints on the time or space complexity of the solution?
  5. If an element appears more than twice, should it be included multiple times in the result, or only once?

Brute Force Solution

Approach

The brute force method to find duplicates in a collection of numbers is like comparing each number with every other number. We go through the numbers one by one and, for each, check if it appears anywhere else in the collection.

Here's how the algorithm would work step-by-step:

  1. Take the first number in the collection.
  2. Compare this first number with all the other numbers in the collection.
  3. If you find an identical number, mark it as a duplicate.
  4. Move to the second number in the collection.
  5. Compare this second number with all the remaining numbers in the collection (including the first).
  6. If you find an identical number that you haven't already marked, mark it as a duplicate.
  7. Repeat this process for each number in the collection, comparing it with all the other numbers.
  8. At the end, you will have identified all the duplicate numbers.

Code Implementation

def find_all_duplicates_brute_force(numbers):
    duplicate_numbers = []
    number_of_elements = len(numbers)

    for first_index in range(number_of_elements):
        first_number = numbers[first_index]

        # Compare the current number with the rest
        for second_index in range(number_of_elements):
            # Avoid comparing the number to itself
            if first_index != second_index:
                second_number = numbers[second_index]

                # Mark as duplicate if a match is found
                if first_number == second_number:

                    # Ensures no redundant addition of same element
                    if first_number not in duplicate_numbers:
                        duplicate_numbers.append(first_number)

    return duplicate_numbers

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n elements in the input array. For each element, it compares it with all other elements in the array to check for duplicates. This results in a nested loop structure where, for each element, the inner loop iterates approximately n times. Thus, the total number of comparisons grows proportionally to n multiplied by n, resulting in approximately n*n/2 operations. Therefore, the time complexity is O(n²).
Space Complexity
O(1)The provided plain English explanation details a brute-force approach that primarily uses comparisons within the input array itself. Although it identifies duplicates, it doesn't explicitly mention the creation of any auxiliary data structures like a separate list or a hash map to store the duplicates or track visited elements. Therefore, the only extra memory used is for loop indices and temporary variables used during comparisons, which takes up constant space regardless of the input size N. Thus, the space complexity is O(1).

Optimal Solution

Approach

The efficient way to find duplicates takes advantage of the array's structure itself to track what's been seen. We cleverly use the numbers in the array as pointers to spots within the same array. By modifying the array based on what we find, we can easily identify repeats.

Here's how the algorithm would work step-by-step:

  1. Go through each number in the array, one at a time.
  2. For each number, imagine it points to a specific location in the array (you may need to do a simple math trick to translate the number into a valid location).
  3. Go to that location and change the number stored there in a noticeable way (like making it negative). This marks that location as visited.
  4. If, when you go to the location, you find it's ALREADY been changed (negative, in our example), then you know the current number is a duplicate because its location has been visited before.
  5. Keep a separate list of the numbers that caused you to find an already-changed location; those are your duplicates.
  6. After going through all the numbers, you'll have a list of all the duplicates, without needing to compare every number against all the others.

Code Implementation

def find_duplicates(nums):
    duplicate_numbers = []

    for index in range(len(nums)):
        current_number = abs(nums[index])
        corresponding_index = current_number - 1

        # Use array indices as a hash key.
        if nums[corresponding_index] < 0:

            duplicate_numbers.append(current_number)

        # Mark each number as seen by negating value at index.
        else:

            nums[corresponding_index] *= -1

    # Restore the array to its original state.
    for index in range(len(nums)):
        nums[index] = abs(nums[index])

    return duplicate_numbers

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array nums of size n exactly once. In each iteration, it performs a constant amount of work: calculating an index, accessing an array element, and potentially modifying that array element or adding the current number to the duplicates list. Since the number of operations within the loop is constant and the loop runs n times, the time complexity is directly proportional to n, resulting in a linear time complexity of O(n).
Space Complexity
O(1)The algorithm uses a separate list to store the duplicates. In the worst-case scenario, all numbers in the input array of size N could be duplicates (e.g., [1, 1, 1, 1]). However, since the numbers in the array range from 1 to N, the duplicate list will contain at most N distinct numbers from the input array, so the additional space is proportional to the number of actual duplicate values and is bounded by the size of the original array. Therefore, the space used for storing the duplicates is O(1) because it is dependent on the input values but is stored as part of our output so is not considered part of the auxiliary space.

Edge Cases

CaseHow to Handle
Null input arrayCheck for null input and return an empty list to avoid NullPointerException.
Empty arrayReturn an empty list immediately as there are no duplicates.
Array with only one elementReturn an empty list because a single element cannot be a duplicate.
Array with all elements being the sameThe in-place modification approach correctly identifies the repeating element.
Array with no duplicatesReturn an empty list as specified in problem description
Maximum sized input array close to integer limits (n elements)Ensure algorithm has O(n) time complexity and reasonable space complexity (ideally O(1) or O(log n)) to avoid resource exhaustion.
Array contains values outside the range [1, n]Handle invalid input, by either ignoring, throwing an exception or filtering the array, depending on the requirements.
Integer overflow during calculations (if using arithmetic tricks)Use appropriate data types (e.g., long) or bit manipulation to prevent integer overflows during calculations if applicable.