Identify the Largest Outlier in an Array

Medium

Amazon

8 views

Topics:

Arrays

You are given an integer array nums. This array contains n elements, where exactly n - 2 elements are special numbers. One of the remaining two elements is the sum of these special numbers, and the other is an outlier.

An outlier is defined as a number that is neither one of the original special numbers nor the element representing the sum of those numbers.

Note that special numbers, the sum element, and the outlier must have distinct indices, but may share the same value.

Return the largest potential outlier in nums.

For example:

nums = [2,3,5,10]

The special numbers could be 2 and 3, thus making their sum 5 and the outlier 10.
nums = [-2,-1,-3,-6,4]

The special numbers could be -2, -1, and -3, thus making their sum -6 and the outlier 4.
nums = [1,1,1,1,1,5,5]

The special numbers could be 1, 1, 1, 1, and 1, thus making their sum 5 and the other 5 as the outlier.

How would you go about finding the largest potential outlier, optimizing for both time and space complexity?

Identify the Largest Outlier in an Array

Medium

Amazon

8 views

Topics:

Arrays

You are given an integer array nums. This array contains n elements, where exactly n - 2 elements are special numbers. One of the remaining two elements is the sum of these special numbers, and the other is an outlier.

An outlier is defined as a number that is neither one of the original special numbers nor the element representing the sum of those numbers.

Note that special numbers, the sum element, and the outlier must have distinct indices, but may share the same value.

Return the largest potential outlier in nums.

For example:

nums = [2,3,5,10]

The special numbers could be 2 and 3, thus making their sum 5 and the outlier 10.
nums = [-2,-1,-3,-6,4]

The special numbers could be -2, -1, and -3, thus making their sum -6 and the outlier 4.
nums = [1,1,1,1,1,5,5]

The special numbers could be 1, 1, 1, 1, and 1, thus making their sum 5 and the other 5 as the outlier.

How would you go about finding the largest potential outlier, optimizing for both time and space complexity?

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What should I return if the input array is empty or contains only one element?
What data type will the array contain (e.g., integers, floats)? What is the range of possible values in the array?
How is 'largest outlier' defined? Specifically, what statistical measure (e.g., standard deviation, IQR) should I use to determine if a value is an outlier?
If there are multiple elements that are equally the largest outliers, should I return all of them, or is returning any one acceptable?
Can I modify the input array, or should I operate on a copy?

Brute Force Solution

Approach

The basic idea is to check each number in the list one by one to see if it's an outlier. We compare each number to all the other numbers to see if it sticks out unusually.

Here's how the algorithm would work step-by-step:

Take the first number in the list.
Compare this number to every other number in the list to see how different it is from the rest.
If this number is significantly different from most of the other numbers, mark it as a potential outlier.
Repeat this process for every number in the list.
After checking all the numbers, look at the ones marked as potential outliers.
If only one number was marked as an outlier, then that is the largest outlier.
If more than one was found, compare those and determine the largest one, using some pre-defined logic such as average difference.

Code Implementation

def find_largest_outlier_brute_force(number_list):
    potential_outliers = []
    average_differences = []

    for current_number in number_list:
        total_difference = 0

        # Compare the current number to every other number in the list
        for other_number in number_list:
            if current_number != other_number:
                total_difference += abs(current_number - other_number)

        average_difference = total_difference / (len(number_list) - 1)
        potential_outliers.append(current_number)
        average_differences.append(average_difference)

    # If no outliers were found, return None
    if not potential_outliers:
        return None

    #If we have only one outlier return that
    if len(potential_outliers) == 1:
        return potential_outliers[0]

    largest_outlier = potential_outliers[0]
    largest_difference = average_differences[0]

    # Choose the largest outlier by comparing average differences
    for index in range(1, len(potential_outliers)):

        if average_differences[index] > largest_difference:
            largest_outlier = potential_outliers[index]
            largest_difference = average_differences[index]

    return largest_outlier

Big(O) Analysis

Time Complexity

O(n²) – The algorithm iterates through each of the n elements in the array. For each element, it compares it to every other element in the array to determine if it's an outlier. This comparison process involves another loop that, in the worst case, iterates through approximately n elements. Therefore, the nested loops result in a time complexity of approximately n * n operations. This simplifies to O(n²).

Space Complexity

O(1) – The provided algorithm's space complexity is O(1) because it only stores a few variables regardless of the input array size. Specifically, it keeps track of potential outliers and potentially the largest outlier found so far, which requires a constant amount of extra memory. It does not create any auxiliary data structures like lists or hash maps that would scale with the input size, N, where N is the number of elements in the array. Therefore, the auxiliary space used is constant and independent of the input.

Optimal Solution

Approach

The smartest way to find the largest outlier is to understand the typical range of the data first. Once we have that range, finding the item farthest away from the typical values is straightforward. This method avoids comparing every single value to every other value.

Here's how the algorithm would work step-by-step:

First, sort the values from smallest to largest.
Next, find the 'middle' part of the data. A good way to do this is to calculate the values at the 25% mark and 75% mark.
Calculate the 'normal range' by finding the difference between those two middle values, and multiplying that difference by 1.5. Add this range to the 75% value and subtract it from the 25% value to establish the upper and lower limits of the normal range.
Now, examine the values. Find the value that is greater than the upper limit, or smaller than the lower limit, and is the furthest from the typical values. This is the largest outlier.

Code Implementation

def find_largest_outlier(data_values):
    sorted_values = sorted(data_values)

    list_length = len(sorted_values)
    q1_index = int(0.25 * list_length)
    q3_index = int(0.75 * list_length)
    q1_value = sorted_values[q1_index]
    q3_value = sorted_values[q3_index]

    iqr = q3_value - q1_value
    
    # Calculate the outlier range based on the IQR.
    outlier_range = 1.5 * iqr

    upper_limit = q3_value + outlier_range
    lower_limit = q1_value - outlier_range

    largest_outlier = None
    max_distance = 0

    # Find the value that exceeds the limits by the greatest margin.
    for value in sorted_values:
        if value > upper_limit or value < lower_limit:

            #Determine distance from normal values
            if value > upper_limit:
                distance = value - q3_value
            else:
                distance = q1_value - value

            # Track the largest outlier based on its distance from typical values.
            if distance > max_distance:
                max_distance = distance
                largest_outlier = value

    return largest_outlier

Big(O) Analysis

Time Complexity

O(n log n) – The initial step sorts the input array of size n, which takes O(n log n) time using an efficient sorting algorithm like merge sort or quicksort. Calculating the 25th and 75th percentiles involves accessing elements at specific indices after sorting, taking O(1) time. Determining the normal range and identifying the largest outlier involves iterating through the sorted array once (O(n)) to compare each element to the calculated limits. Therefore, the dominant operation is the sorting step, resulting in a total time complexity of O(n log n).

Space Complexity

O(1) – The algorithm sorts the input array in place, modifying the original array directly and thus not using extra space for sorting. It calculates the 25th and 75th percentile values using a constant number of variables. The normal range, upper and lower limits, and outlier are also stored in a fixed number of variables, independent of the input size N. Therefore, the auxiliary space complexity is constant.

Edge Cases

Case	How to Handle
Null or undefined input array	Throw an IllegalArgumentException or return a default value (e.g., null or empty list) to indicate invalid input.
Array with only one element	Return null or an empty list because an outlier requires comparison to other elements.
Array with all identical elements	Return null or an empty list, as no single element can be considered an outlier.
Array contains Integer.MAX_VALUE or Integer.MIN_VALUE	Consider potential overflow issues when performing calculations involving these extreme values and use appropriate data types (e.g., long).
Array contains only negative numbers	The solution should correctly identify the largest negative outlier based on its distance from the other numbers in the array.
Array with extremely large size (memory limitations)	Ensure the solution has reasonable time and space complexity, possibly using an iterative approach instead of recursion for large arrays, or consider using an approximate algorithm if exactness is not essential.
Array with floating point numbers and precision issues	Avoid direct equality comparisons with floating-point numbers; use a tolerance or consider scaling the numbers to integers if possible to improve accuracy.
Multiple elements are equally the largest outliers	Return all elements that are considered equally the largest outliers according to the definition, or return only one based on a predefined rule (e.g. first occurrence).