Taro Logo

Missing Element in Sorted Array

Medium
Google logo
Google
1 view
Topics:
ArraysBinary Search

Given a sorted array of positive integers arr and an integer k, find the kth positive integer that is missing from this array.

For example:

  • arr = [2, 3, 4, 7, 11], k = 5. The missing positive integers are [1, 5, 6, 8, 9, 10, 12, 13, ...]. The 5th missing positive integer is 9, so the function should return 9.
  • arr = [1, 2, 3, 4], k = 2. The missing positive integers are [5, 6, 7, ...]. The 2nd missing positive integer is 6, so the function should return 6.
  • arr = [5, 6, 7, 8, 9], k = 5. The missing positive integers are [1, 2, 3, 4, ...]. The 5th missing positive integer is 5, so the function should return 5.

Could you provide an algorithm to solve this problem efficiently? What is the time and space complexity of your solution? Can you handle edge cases such as an empty array or when k is larger than the total number of missing integers in the array? Implement the algorithm in Python.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values within the sorted array? Can I expect negative numbers or just positive integers?
  2. Can the input array be empty or null? If so, what should I return?
  3. Is the 'missing' element guaranteed to be within the range of the array's existing values, or could it be smaller than the first element or larger than the last?
  4. Are the elements in the array guaranteed to be distinct, or can there be duplicate values?
  5. What should I return if there is no missing element (i.e., if the array contains all expected values in the sorted order)?

Brute Force Solution

Approach

The brute-force method for finding a missing number is straightforward. We essentially check every possible number until we find the missing one. We do this by making sure the expected difference between the numbers is what we need it to be.

Here's how the algorithm would work step-by-step:

  1. We begin with the first number in the sorted list.
  2. We calculate what the next number *should* be based on the fact that the list is sorted, usually just by adding one.
  3. We compare this calculated number with the actual next number in the list.
  4. If they are different, the calculated number is the missing one, and we are done.
  5. If they are the same, we continue to the next number in the list, and repeat the calculation and comparison until we find the missing one.

Code Implementation

def find_missing_element_brute_force(sorted_array):
    expected_number = sorted_array[0]

    for index in range(len(sorted_array)):
        # Check if the current number matches the expected number
        if sorted_array[index] != expected_number:

            # Found the missing element
            return expected_number

        # Update the expected number based on the sorted nature of the array
        expected_number = sorted_array[index] + 1

    # All elements were consecutive, so the missing element is after the last
    return expected_number

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the sorted array once, comparing each element to its expected value based on the sorted order. In the worst-case scenario, it might need to traverse the entire array to find the missing element. The number of comparisons is directly proportional to the size of the input array, denoted as n, where n is the number of elements in the array. Therefore, the time complexity is O(n).
Space Complexity
O(1)The provided algorithm iterates through the sorted list, calculating and comparing expected values. It only uses a few constant space variables to store the current number and the expected next number during comparison. No additional data structures like arrays, hash maps, or recursion stacks are created that scale with the input size (N, representing the number of elements in the list). Therefore, the auxiliary space required remains constant regardless of the input size, resulting in O(1) space complexity.

Optimal Solution

Approach

The most efficient way to find the missing number is to avoid checking each number individually. Instead, we leverage the fact that the numbers are sorted to quickly narrow down the possibilities using a divide and conquer strategy.

Here's how the algorithm would work step-by-step:

  1. Imagine the sorted list as a complete sequence, and think about how many numbers you'd expect to find before a certain point.
  2. Check the middle of the list. Is the actual number there what you'd expect based on its position?
  3. If the number is bigger than expected, the missing number is somewhere earlier in the list. Focus your search on the first half.
  4. If the number is as expected or smaller, the missing number is somewhere later in the list. Focus your search on the second half.
  5. Repeat this process of checking the middle and narrowing down the search until you've found the exact spot where the missing number belongs.
  6. The missing number will be one more than the number right before the spot you found, or it could be the first number in the list if the missin number appears at the beginning.

Code Implementation

def find_missing_element(sorted_array, missing_element_difference):
    left_index = 0
    right_index = len(sorted_array) - 1

    while left_index <= right_index:
        middle_index = (left_index + right_index) // 2

        # Check if the actual number is what we'd expect based on its position.
        expected_number = sorted_array[0] + middle_index * missing_element_difference

        if sorted_array[middle_index] > expected_number:
            # Missing number is somewhere earlier in the list.
            right_index = middle_index - 1

        else:
            # Missing number is somewhere later in the list.
            left_index = middle_index + 1

    # leftIndex now points to the index where the missing number should be.
    if right_index == -1:
        return sorted_array[0] - missing_element_difference
    else:
        return sorted_array[right_index] + missing_element_difference

Big(O) Analysis

Time Complexity
O(log n)The algorithm employs a binary search strategy. At each step, the search space is halved by comparing the element at the middle index with its expected value. This halving continues until the missing element is located. Therefore, the number of comparisons and iterations grows logarithmically with the input size n, resulting in a time complexity of O(log n).
Space Complexity
O(1)The described solution uses a divide and conquer strategy (binary search) which operates directly on the input array. It only requires storing a few variables to keep track of the start and end indices for the search, as well as the middle index during each step. The number of these index variables does not depend on the size of the input array N. Therefore, the algorithm uses constant extra space.

Edge Cases

CaseHow to Handle
Empty input arrayReturn -1 immediately since there is no missing element.
k is larger than the total number of missing elementsReturn the last element of the array plus k since all elements are missing beyond the array's range.
The missing element is at the beginning of the arrayThe algorithm should correctly identify the missing element using the difference between the actual and expected values at index 0.
The missing element is at the end of the arrayThe algorithm should correctly identify the missing element when it is larger than the largest element of the array.
Input array contains duplicate numbersThe algorithm assumes a sorted array of distinct integers, so duplicates should be handled as if distinct by binary search.
Integer overflow when calculating the expected valueUse long data type for calculations that may exceed the maximum integer value to prevent overflow.
Large array size with large k value impacting binary search efficiencyBinary search should still be O(log n), but assess potential performance impact for extremely large inputs, though unlikely for typical interview constraints.
k is zeroIf k is 0, the algorithm should return the first missing number, meaning the first number that is not in sequence in the array.