Taro Logo

Subarrays with K Different Integers

Hard
Uber logo
Uber
1 view
Topics:
ArraysSliding Windows

Given an integer array nums and an integer k, return the number of good subarrays of nums.

A good array is an array where the number of different integers in that array is exactly k.

  • For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.

A subarray is a contiguous part of an array.

Example 1:

Input: nums = [1,2,1,2,3], k = 2
Output: 7
Explanation: Subarrays formed with exactly 2 different integers: [1,2], [2,1], [1,2], [2,3], [1,2,1], [2,1,2], [1,2,1,2]

Example 2:

Input: nums = [1,2,1,3,4], k = 3
Output: 3
Explanation: Subarrays formed with exactly 3 different integers: [1,2,1,3], [2,1,3], [1,3,4].

Constraints:

  • 1 <= nums.length <= 2 * 10^4
  • 1 <= nums[i], k <= nums.length

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values within the input array, and can I expect negative numbers or zero?
  2. What is the maximum possible size of the input array?
  3. If no subarrays with exactly K different integers exist, what should I return?
  4. Are duplicate integers allowed in the input array, and how should they be handled when counting distinct integers within a subarray?
  5. Is the order of the elements within the subarrays important? Should I return a specific subarray if multiple solutions exist?

Brute Force Solution

Approach

The brute force method to solve this problem is simple: we will consider every possible group of numbers within the given list. For each of these groups, we check if it fulfills the requirement of having exactly K distinct numbers.

Here's how the algorithm would work step-by-step:

  1. First, we select the starting point of our group.
  2. Then, we extend this group one number at a time, going through all numbers to the right of our starting point.
  3. For each of these groups, we count how many different numbers it contains.
  4. If the group has exactly K different numbers, we count it as a valid group.
  5. We repeat this process, starting from each number in the list as the starting point.
  6. Finally, we return the total count of all valid groups we found.

Code Implementation

def subarrays_with_k_distinct_integers_brute_force(numbers, k_distinct):
    number_of_valid_subarrays = 0

    for start_index in range(len(numbers)):

        for end_index in range(start_index, len(numbers)):

            sub_array = numbers[start_index:end_index + 1]

            # To count the distinct numbers in the subarray.
            distinct_numbers = set()

            for number in sub_array:
                distinct_numbers.add(number)

            # Check if the current subarray has exactly k distinct integers
            if len(distinct_numbers) == k_distinct:

                # Only increment counter when sub array has correct number of distinct values
                number_of_valid_subarrays += 1

    return number_of_valid_subarrays

Big(O) Analysis

Time Complexity
O(n³)The brute force approach iterates through all possible subarrays. The outer loop selects the starting index, iterating n times. The inner loop extends the subarray to the right, iterating up to n times. Inside the inner loop, counting the distinct integers in the current subarray takes O(n) time in the worst case (if we use a set or similar data structure each time, or iterate through the subarray), making the complexity of the innermost operation O(n). Therefore, the overall time complexity is O(n * n * n) = O(n³).
Space Complexity
O(N)The provided brute force algorithm implicitly uses a data structure to count the distinct numbers within each subarray. In the worst-case scenario, each subarray could contain up to N distinct numbers, where N is the size of the input list. To determine if a subarray has exactly K distinct numbers, the algorithm must 'keep track' of the distinct elements encountered, which requires a hash set or similar data structure. The hash set may potentially store up to N distinct elements, leading to O(N) auxiliary space.

Optimal Solution

Approach

The most efficient way to count subarrays with exactly K distinct integers involves using a clever sliding window approach. Instead of brute-force checking all subarrays, we use inclusion-exclusion to count subarrays with at most K distinct integers minus those with at most K-1 distinct integers.

Here's how the algorithm would work step-by-step:

  1. Define a function to count the number of subarrays having at most K distinct integers.
  2. Use a sliding window to expand the subarray as long as the number of distinct integers is less than or equal to K.
  3. If adding a new integer keeps the count of distinct integers at or below K, increase the window size.
  4. If adding a new integer makes the number of distinct integers exceed K, shrink the window from the left until the count is back within the limit.
  5. The number of valid subarrays ending at the right edge of the window is the size of the current window.
  6. To find the subarrays with exactly K distinct integers, calculate the count of subarrays with at most K distinct integers, and then subtract the count of subarrays with at most K-1 distinct integers. The difference gives the desired result.

Code Implementation

def subarraysWithKDistinct(nums, k):
    def atMostK(nums, k):
        if k < 0:
            return 0
        window_start = 0
        window_end = 0
        distinct_count = 0
        frequency_map = {}
        result = 0

        while window_end < len(nums):
            right_element = nums[window_end]
            if right_element not in frequency_map:
                frequency_map[right_element] = 0
                distinct_count += 1
            frequency_map[right_element] += 1

            # Shrink the window if distinct count exceeds k.
            while distinct_count > k:
                left_element = nums[window_start]
                frequency_map[left_element] -= 1
                if frequency_map[left_element] == 0:
                    del frequency_map[left_element]
                    distinct_count -= 1
                window_start += 1

            # Add the number of valid subarrays ending at window_end.
            result += window_end - window_start + 1
            window_end += 1
        return result

    # Inclusion-exclusion principle.
    return atMostK(nums, k) - atMostK(nums, k - 1)

Big(O) Analysis

Time Complexity
O(n)The algorithm utilizes a sliding window technique to iterate through the input array of size n. The 'atMostK' function iterates through the array once, expanding and shrinking the window. While there's a while loop inside the for loop, the left pointer only moves forward, ensuring each element is visited at most a constant number of times. Therefore, the dominant operation is a single pass through the array, resulting in O(n) time complexity for the 'atMostK' function. Since the main function calls 'atMostK' twice, the overall time complexity remains O(n).
Space Complexity
O(K)The algorithm uses a hash map (or dictionary) to store the frequency of numbers within the sliding window. In the worst-case scenario, where all elements in the subarray are distinct, the hash map will store up to K distinct integers, where K is the input parameter representing the maximum number of distinct integers allowed. Thus the space required for the hash map scales linearly with K. Therefore, the space complexity is O(K).

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn 0 immediately, as no subarrays exist.
k = 0Return 0, as a subarray must have at least one distinct integer.
k is greater than the number of distinct elements in the input arrayThe maximum number of distinct elements possible is the number of distinct elements in the array, so the result will be the same if k exceeds the number of distinct elements, which the sliding window algorithm naturally handles.
Array contains only one distinct element, and k > 1Return 0, as it's impossible to find a subarray with more than one distinct element when the input array has only one.
Array contains duplicate numbers, and k = 1The solution should correctly count all subarrays with only the repeated number.
Large array size with a small k valueSliding window approach handles this efficiently, adjusting window size as required.
Integer overflow potential during window size calculation with large array indicesEnsure calculations use appropriate data types (e.g., long) to prevent overflow.
Array contains negative numbersThe hashmap approach works fine since numbers are stored as keys, allowing for negative and zero input.