Taro Logo

Distinct Numbers in Each Subarray

Medium
Asked by:
Profile picture
13 views
Topics:
ArraysSliding Windows

Given an integer array nums and an integer k, you are asked to construct the lexicographically smallest possible array of length k by using the numbers of the array nums.

Return an integer array of size k containing the lexicographically smallest array constructed from the array nums.

A subarray is a contiguous non-empty sequence of elements within an array.

The lexicographically smallest array among all possible arrays of size k is the one that appears earliest in a lexicographical order.

Example 1:

Input: nums = [1,5,1,3,5], k = 2
Output: [1,3]
Explanation: Here are the all possible arrays of length 2 from the array nums in the order as they appear.
[1,5],[5,1],[1,3],[3,5],
Among them, the array [1,3] is the lexicographically smallest.

Example 2:

Input: nums = [1,4,2,3,5], k = 4
Output: [1,2,3,5]

Constraints:

  • 1 <= k <= nums.length <= 1000
  • 1 <= nums[i] <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible value ranges for the numbers within the array, and are negative numbers, zeros, or non-integer values possible?
  2. Could you clarify the expected behavior when the input array is empty or when the subarray length, `k`, is larger than the input array's length?
  3. Are the subarrays we consider contiguous, and should the output be a list of the number of distinct elements for each subarray, or something else?
  4. Does the order of the distinct numbers within each subarray matter when counting them?
  5. Are there any specific data type requirements for the returned value (e.g., an array of integers, a set, etc.)?

Brute Force Solution

Approach

To find the number of unique elements in every chunk of a certain size, the brute force method checks every single possible chunk. It's like looking at every single piece of a puzzle individually. This approach makes sure no possibility is missed, but it might take a while.

Here's how the algorithm would work step-by-step:

  1. First, focus on the very first set of numbers, grabbing as many as the chunk size requires.
  2. Then, carefully examine these numbers, and count only those which appear for the first time.
  3. Next, shift your focus one number forward, creating the next chunk of numbers of the same size.
  4. Again, count only the numbers that are unique within this new chunk.
  5. Keep sliding your focus, one number at a time, creating and examining each chunk until you reach the end of the entire group of numbers.
  6. Every time you examine a chunk, store the count of unique numbers you find.
  7. Finally, report all the unique number counts for each chunk that you calculated along the way.

Code Implementation

def distinct_numbers_in_each_subarray_brute_force(numbers, subarray_size):
    unique_counts = []
    number_of_subarrays = len(numbers) - subarray_size + 1

    for start_index in range(number_of_subarrays):
        subarray = numbers[start_index:start_index + subarray_size]
        unique_count = 0
        seen_numbers = set()

        # Iterate through the subarray to count distinct numbers.
        for number in subarray:
            # Only count the number if it's not already seen in this subarray.
            if number not in seen_numbers:

                unique_count += 1
                seen_numbers.add(number)

        unique_counts.append(unique_count)

    return unique_counts

Big(O) Analysis

Time Complexity
O(n*k)The algorithm iterates through the input array of size n, creating subarrays (chunks) of size k in each iteration. For each subarray, the algorithm counts the distinct elements by potentially examining each element in the subarray. Thus, for each of the approximately 'n' subarrays, we perform an operation that is O(k). Therefore, the overall time complexity is O(n*k).
Space Complexity
O(K)The provided solution uses a sliding window approach to iterate through subarrays of size K. Within each subarray, it counts the distinct numbers. The primary auxiliary space is used to store the distinct numbers within each subarray. This is done to check the uniqueness of each number in the subarray. In the worst-case scenario, all numbers within a subarray of size K could be distinct, requiring storage for K numbers. Therefore, the space complexity is O(K), where K is the size of the subarray.

Optimal Solution

Approach

Instead of recalculating the distinct count for every possible subarray, we maintain a 'window' that slides across the data. As the window moves, we efficiently update the count of distinct numbers based on what enters and leaves the window.

Here's how the algorithm would work step-by-step:

  1. Begin by examining the initial sequence of numbers, which forms the first subarray.
  2. Count the unique numbers within this initial sequence and store this count.
  3. Now, imagine sliding this window forward by one position. A new number enters the sequence, and an old number leaves.
  4. Check if the newly entered number was already present in the sequence. If not, increment the distinct count.
  5. Also check if the number that is leaving the sequence was the only one of its kind. If so, decrement the distinct count because this number is no longer in the window.
  6. Repeat this process of sliding the window and updating the count as you move along the entire dataset.
  7. Store the distinct count for each subarray (each window position). This way, you find the number of distinct elements in each subarray without doing redundant calculations.

Code Implementation

def distinct_numbers_in_each_subarray(numbers, subarray_length):
    results = []
    number_counts = {}
    distinct_count = 0

    # Initialize the sliding window and count distinct elements.
    for i in range(subarray_length):
        if numbers[i] not in number_counts:
            number_counts[numbers[i]] = 0
            distinct_count += 1
        number_counts[numbers[i]] += 1

    results.append(distinct_count)

    # Slide the window through the array.
    for i in range(subarray_length, len(numbers)):
        # Decrement count of outgoing number
        outgoing_number = numbers[i - subarray_length]
        number_counts[outgoing_number] -= 1

        # If outgoing number was the only one, decrement the distinct count.
        if number_counts[outgoing_number] == 0:
            distinct_count -= 1

        # Increment count of incoming number
        incoming_number = numbers[i]
        if incoming_number not in number_counts:
            number_counts[incoming_number] = 0

        # If incoming number was not already present, increment distinct count
        if number_counts[incoming_number] == 0:
            distinct_count += 1
        number_counts[incoming_number] += 1
        results.append(distinct_count)

    return results

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array of size n once using a sliding window approach. Inside the loop, the operations performed are checking the count of entering and leaving elements in a hashmap. Since hashmap operations (insertion, deletion, and lookup) take constant time on average, the time complexity is dominated by the single iteration through the array. Therefore, the overall time complexity is O(n).
Space Complexity
O(K)The auxiliary space is dominated by the need to store the distinct count for each subarray and a data structure to track counts within the window. According to the plain english explanation we are storing the distinct count for each subarray. If we have a window size of K, there will be N-K+1 subarrays each with its own distinct count. Therefore, we require an array of size N-K+1. We also need some form of data structure to keep track of counts of each number in the window to determine if we need to increment or decrement the total distinct count when numbers enter or leave the window, where the maximum unique numbers within the window could be up to K. Thus, the space complexity is O(N-K+1 + K), which is approximately O(N) in the worst case. However, the main data structure impacting the space complexity of the algorithm is the sliding window of size K. Therefore it's more accurate to describe the space complexity as O(K).

Edge Cases

Null or empty input array
How to Handle:
Return an empty list or throw an IllegalArgumentException as appropriate based on problem requirements.
k is zero or negative
How to Handle:
Return an empty list or throw an IllegalArgumentException because a subarray size cannot be non-positive.
k is larger than the input array size
How to Handle:
Return an empty list because no subarray of size k can be formed.
Input array contains duplicate numbers.
How to Handle:
The sliding window and hash map approach accurately counts distinct elements regardless of duplicates.
Input array contains negative numbers, zeros, or a mix of both.
How to Handle:
Hash map handles any integer value without special treatment since keys can be any integer value.
Large input array size with a small k.
How to Handle:
The sliding window approach maintains O(k) space complexity, leading to reasonable efficiency even with a large input array.
Large input array size with a large k (k approaching the array size).
How to Handle:
The sliding window approach is still efficient as it iterates through the array once, with the hash map operations taking near constant time on average.
Integer overflow when calculating the count of distinct elements if using naive approach
How to Handle:
The hash map based solution avoids potential integer overflows as it stores frequency counts, which are unlikely to exceed the maximum integer value.