Taro Logo

Remove Duplicates from Sorted Array

Easy
Uber logo
Uber
1 view
Topics:
ArraysTwo Pointers

Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Then return the number of unique elements in nums. You must modify the input array in-place.

Consider the number of unique elements of nums to be k, to get accepted, you need to do the following things:

  1. Change the array nums such that the first k elements of nums contain the unique elements in the order they were present in nums initially. The remaining elements of nums are not important as well as the size of nums.
  2. Return k.

For example:

  • Input: nums = [1,1,2]

  • Output: 2, and nums becomes [1,2,_] (where _ represents irrelevant values).

  • Input: nums = [0,0,1,1,1,2,2,3,3,4]

  • Output: 5, and nums becomes [0,1,2,3,4,_,_,_,_,_]

Constraints:

  • 1 <= nums.length <= 3 * 10^4
  • -100 <= nums[i] <= 100
  • nums is sorted in non-decreasing order.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of integer values within the `nums` array? Are negative numbers, zeros, or large positive numbers possible?
  2. What should I return if the input array `nums` is null or empty?
  3. Can I assume the input array `nums` is sorted in ascending order?
  4. Does the provided array `nums` have enough allocated capacity to accommodate the unique elements in-place, or do I need to consider the array's capacity?
  5. Are there any space complexity constraints beyond the in-place modification of the input array?

Brute Force Solution

Approach

The brute force method to remove duplicates means we examine each number one by one, comparing it to all other numbers to find and eliminate the copies. It's like manually checking every item against every other item to ensure uniqueness. It's straightforward but can be slow when dealing with many numbers.

Here's how the algorithm would work step-by-step:

  1. Start with the very first number.
  2. Compare it to every other number in the list.
  3. If you find a number that's the same as the first one, mark it as a duplicate.
  4. Move to the second number, and repeat the process of comparing it to all the remaining numbers.
  5. Keep doing this for each number in the list, one at a time.
  6. Once you've gone through the entire list, create a new list containing only the numbers that were not marked as duplicates.

Code Implementation

def remove_duplicates_brute_force(numbers):
    duplicate_indices = []

    for current_index in range(len(numbers)):
        # Compare the current number with all subsequent numbers
        for comparison_index in range(current_index + 1, len(numbers)):

            if numbers[current_index] == numbers[comparison_index]:
                # Mark the comparison number as a duplicate
                duplicate_indices.append(comparison_index)

    # Create a new list to store non-duplicate numbers
    unique_numbers = []

    for index in range(len(numbers)):
        # Append only non-duplicate numbers to the new list
        if index not in duplicate_indices:

            # Only add if it's not a duplicate
            unique_numbers.append(numbers[index])

    return unique_numbers

Big(O) Analysis

Time Complexity
O(n²)The brute force approach involves iterating through the array of n elements, and for each element, comparing it to all the remaining elements in the array to identify duplicates. This means for the first element, we potentially perform n-1 comparisons, for the second n-2, and so on. This results in roughly n * (n-1)/2 comparisons. Therefore, the time complexity is O(n²).
Space Complexity
O(N)The brute force approach, as described, creates a new list to store the unique numbers. In the worst-case scenario, where all elements in the input array are unique, this new list will need to store all N elements. Therefore, the auxiliary space used is proportional to the size of the input array, N. This results in a space complexity of O(N).

Optimal Solution

Approach

The key insight is that since the list is already sorted, identical values will be right next to each other. We can take advantage of this order to quickly identify and replace the duplicates. We essentially have two 'pointers': one to track the next available spot for a unique element, and another to scan the list.

Here's how the algorithm would work step-by-step:

  1. Imagine you have two markers. One marker points to where the next unique number should go, starting at the beginning of the list.
  2. The other marker steps through the entire list, one number at a time.
  3. As the second marker moves, if it finds a number that is different from the number just before the first marker, that means it's a unique number.
  4. When you find a unique number, you copy it to the spot pointed to by the first marker, and then move the first marker forward to the next available spot.
  5. Keep doing this until the second marker has gone through the entire list.
  6. In the end, everything from the beginning of the list up to the first marker's position will be unique, in sorted order.

Code Implementation

def remove_duplicates_from_sorted_array(numbers):
    if not numbers:
        return 0

    # Index to track the next unique element's position
    next_unique_index = 1

    # Iterate through the array to find unique elements
    for current_index in range(1, len(numbers)):

        # Compare current element with the element before next_unique_index
        if numbers[current_index] != numbers[next_unique_index - 1]:

            # Found a new unique number.
            numbers[next_unique_index] = numbers[current_index]

            # Move the next unique index forward.
            next_unique_index += 1

    # The first 'next_unique_index' elements are now unique.
    return next_unique_index

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array of size n using a single pointer to scan all the elements. For each element, a constant time comparison is performed to check if it is a duplicate. The other pointer tracks the position for the next unique element, which happens at most n times. Therefore, the time complexity is directly proportional to the input size n, resulting in O(n).
Space Complexity
O(1)The provided solution operates in-place, meaning it modifies the input array directly without creating any significant auxiliary data structures. It uses two index variables (representing the two markers) to traverse and modify the array. The space required for these variables remains constant regardless of the input array's size N. Therefore, the auxiliary space complexity is O(1).

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn 0 indicating no unique elements or throw an IllegalArgumentException if null is not permitted by the problem.
Array with a single elementReturn 1 since the single element is inherently unique.
Array with all identical elementsThe algorithm should correctly reduce the array to a single instance of that element and return 1.
Array with a large range of integer values (including negative and positive)Ensure the algorithm handles extreme integer values without causing overflow or underflow issues.
Array with maximum possible sizeVerify the algorithm's space complexity doesn't exceed allowed memory constraints for large arrays.
Array already containing only unique elementsThe algorithm should correctly identify that all elements are unique and return the original array's length.
Array with duplicates clustered together at the beginning, middle, or endThe algorithm should correctly handle different distributions of duplicates within the array.
Array containing only negative numbersVerify that the algorithm correctly handles negative numbers without issues related to sign comparison.