Taro Logo

Find the Median of the Uniqueness Array

Hard
Amazon logo
Amazon
41 views
Topics:
Arrays

You are given an integer array nums. The uniqueness array of nums is the sorted array that contains the number of distinct elements of all the subarrays of nums. In other words, it is a sorted array consisting of distinct(nums[i..j]), for all 0 <= i <= j < nums.length.

Here, distinct(nums[i..j]) denotes the number of distinct elements in the subarray that starts at index i and ends at index j.

Return the median of the uniqueness array of nums.

Note that the median of an array is defined as the middle element of the array when it is sorted in non-decreasing order. If there are two choices for a median, the smaller of the two values is taken.

Example 1:

Input: nums = [1,2,3] Output: 1

Explanation: The uniqueness array of nums is [distinct(nums[0..0]), distinct(nums[1..1]), distinct(nums[2..2]), distinct(nums[0..1]), distinct(nums[1..2]), distinct(nums[0..2])] which is equal to [1, 1, 1, 2, 2, 3]. The uniqueness array has a median of 1. Therefore, the answer is 1.

Example 2:

Input: nums = [3,4,3,4,5] Output: 2

Explanation: The uniqueness array of nums is [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3]. The uniqueness array has a median of 2. Therefore, the answer is 2.

Example 3:

Input: nums = [4,3,5,4] Output: 2

Explanation: The uniqueness array of nums is [1, 1, 1, 1, 2, 2, 2, 3, 3, 3]. The uniqueness array has a median of 2. Therefore, the answer is 2.

Constraints:

  • 1 <= nums.length <= 10^5
  • 1 <= nums[i] <= 10^5

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values in the input array, and can it contain negative numbers, zeros, or floating-point numbers?
  2. If the 'uniqueness array' (presumably the array of unique elements) is empty, what should the function return?
  3. What data type should the function return for the median (e.g., integer, float)? If the number of unique elements is even, should I return the lower median, higher median, or the average of the two?
  4. Are there any memory constraints I should be aware of, given that I'll be creating a 'uniqueness array'?
  5. Can the input array be null or empty?

Brute Force Solution

Approach

The brute force method for this problem involves identifying all the unique numbers from the initial set of numbers, figuring out how many times each unique number appears, and then finding the middle number after arranging these counts in order. It essentially looks at every possibility to guarantee finding the solution, even if it takes a while.

Here's how the algorithm would work step-by-step:

  1. First, go through all the numbers you have and create a list of only the unique numbers. This means that if a number appears more than once, you only keep it once in this new list.
  2. Next, for each unique number on your list, count how many times that number appears in the original set of numbers.
  3. Now that you have a count for each unique number, put those counts into a new list.
  4. Arrange the counts in this new list from smallest to largest.
  5. Finally, find the middle number in the sorted list of counts. If there's an even number of counts, find the average of the two middle numbers. This middle number (or average) is your answer.

Code Implementation

def find_median_of_uniqueness_array(numbers):
    unique_numbers = []
    for number in numbers:
        if number not in unique_numbers:
            unique_numbers.append(number)

    number_counts = []
    # Count occurrences of each unique number.
    for unique_number in unique_numbers:

        count = 0
        for number in numbers:
            if number == unique_number:
                count += 1
        number_counts.append(count)

    # Sorting is crucial for finding the median.
    number_counts.sort()

    counts_length = len(number_counts)
    # Handle even and odd length lists differently
    if counts_length % 2 == 0:
        middle_index_2 = counts_length // 2
        middle_index_1 = middle_index_2 - 1
        median = (number_counts[middle_index_1] + number_counts[middle_index_2]) / 2
    else:
        middle_index = counts_length // 2
        median = number_counts[middle_index]

    return median

Big(O) Analysis

Time Complexity
O(n^2)Finding the unique numbers in the input array of size n requires iterating through the array and, for each number, checking if it already exists in the unique list. In the worst case, for each of the n numbers, we might have to compare it against all other numbers already added to the unique list, resulting in O(n^2) time complexity. Counting the frequency of each unique number involves another loop through the original n elements for each unique number, also contributing to the O(n^2) complexity in the worst case where most elements are unique. Sorting the frequencies takes O(k log k) where k is the number of unique elements, and in the worst case k is n, thus O(n log n) which is dominated by the O(n^2). Therefore, the overall time complexity is dominated by the uniqueness check and frequency counting which approximate n * n operations.
Space Complexity
O(N)The algorithm uses a list to store unique numbers, which in the worst case, could contain all N numbers from the original input. It also uses a list to store the counts of each unique number, also potentially of size N in the worst case where all input numbers are unique. Sorting this list of counts is done in-place, but the lists dominate space usage. Thus, the auxiliary space complexity is O(N).

Optimal Solution

Approach

First, we identify the unique numbers in the original collection. Then, we need to figure out the middle number once those unique numbers are arranged in order. The efficient trick is to avoid sorting the whole list; instead, use a technique to find just the middle number, which saves time.

Here's how the algorithm would work step-by-step:

  1. Go through the initial set of numbers and keep only the ones that appear exactly once.
  2. Find the count of these unique numbers.
  3. Determine if the number of unique items is even or odd to decide what constitutes the middle.
  4. Use a method to quickly discover the number that would be in the middle without needing to completely arrange all of them in order.
  5. If the count is odd, the middle number you found is the median.
  6. If the count is even, you will need to find the two middle numbers and calculate their average. Use the same quick middle number finding method to get them.
  7. That average is the median.

Code Implementation

def find_median_of_uniqueness_array(numbers):
    counts = {}
    for number in numbers:
        counts[number] = counts.get(number, 0) + 1

    unique_numbers = [number for number, count in counts.items() if count == 1]
    unique_count = len(unique_numbers)

    # Need to handle the edge case of no unique numbers.
    if unique_count == 0:
        return None

    def find_kth_smallest(numbers_list, k):
        if not numbers_list:
            return None

        pivot = numbers_list[0]
        less = [number for number in numbers_list if number < pivot]
        equal = [number for number in numbers_list if number == pivot]
        greater = [number for number in numbers_list if number > pivot]

        if k <= len(less):
            return find_kth_smallest(less, k)
        elif k <= len(less) + len(equal):
            return pivot
        else:
            return find_kth_smallest(greater, k - len(less) - len(equal))

    # Determine if the count of unique numbers is even or odd.
    if unique_count % 2 == 1:
        # If odd, find the middle element directly.
        median = find_kth_smallest(unique_numbers, (unique_count + 1) // 2)
        return median
    else:
        # If even, find the two middle elements and average them.
        middle_right = find_kth_smallest(unique_numbers, unique_count // 2 + 1)
        middle_left = find_kth_smallest(unique_numbers, unique_count // 2)
        # Calculate the average of the two middle elements.
        median = (middle_left + middle_right) / 2
        return median

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array of size n to identify unique numbers, requiring O(n) time. Finding the median of the unique numbers involves at most two calls to a selection algorithm with linear time complexity in the number of unique elements, which is at most n. Thus, the overall time complexity is dominated by the initial uniqueness check and at most two linear time selection operations, giving a total runtime of O(n).
Space Complexity
O(N)The algorithm utilizes a data structure (likely a hash set or similar) to store the unique numbers encountered in the input collection. In the worst case, all N numbers in the input are unique, requiring the storage of N elements. The space required for this data structure scales linearly with the number of unique elements, leading to a space complexity of O(N). No significant additional space is used beyond this uniqueness tracking structure.

Edge Cases

CaseHow to Handle
Null or Empty Input ArrayReturn null or throw IllegalArgumentException to indicate invalid input.
Array with only one elementReturn an empty array since a uniqueness array requires at least two elements.
Array with two identical elementsThe uniqueness array will be empty, and the median will be undefined, so return an empty array.
Array with all identical elementsThe uniqueness array will be empty, and the median will be undefined, so return an empty array.
Array with large number of distinct elements, approaching system memory limitConsider the memory footprint of the data structures used to store uniqueness and ensure it scales reasonably.
Integer overflow when calculating differences if input array contains extreme valuesUse a data type with a larger range (e.g., long) for intermediate calculations or perform checks to prevent overflow.
The uniqueness array is empty because all elements are identical or occur more than once.Return an empty array or a specific value like null or -1 to indicate no median exists.
Input array contains duplicate values and the uniqueness array has an even number of elementsCalculate the median as the average of the two middle elements after sorting the uniqueness array.