Taro Logo

Subarrays with K Different Integers

Hard
ServiceNow logo
ServiceNow
2 views
Topics:
ArraysTwo PointersSliding Windows

Given an integer array nums and an integer k, return the number of good subarrays of nums.

A good array is an array where the number of different integers in that array is exactly k.

  • For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.

A subarray is a contiguous part of an array.

Example 1:

Input: nums = [1,2,1,2,3], k = 2
Output: 7
Explanation: Subarrays formed with exactly 2 different integers: [1,2], [2,1], [1,2], [2,3], [1,2,1], [2,1,2], [1,2,1,2]

Example 2:

Input: nums = [1,2,1,3,4], k = 3
Output: 3
Explanation: Subarrays formed with exactly 3 different integers: [1,2,1,3], [2,1,3], [1,3,4].

Constraints:

  • 1 <= nums.length <= 2 * 104
  • 1 <= nums[i], k <= nums.length

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the value of `k`?
  2. Can the elements in the `nums` array be negative, zero, or only positive integers?
  3. If there are no subarrays with exactly `k` distinct integers, what should I return?
  4. Are we concerned about the order of elements within a subarray, or just the count of good subarrays?
  5. If `k` is zero, should I return the number of empty subarrays, or is a different behavior expected?

Brute Force Solution

Approach

The brute force strategy for this problem is all about checking every possible group of numbers within our list. We'll examine each group to see if it meets our criteria of having exactly K different numbers. If it does, we count it.

Here's how the algorithm would work step-by-step:

  1. Consider every possible starting point for a group of numbers.
  2. For each starting point, build a group by adding one number at a time.
  3. As you build each group, check if it contains exactly K different numbers.
  4. If the group has exactly K different numbers, count it as a valid group.
  5. Repeat this process until you've considered all possible starting points and group sizes.

Code Implementation

def subarrays_with_k_different_integers_brute_force(numbers, k_different):
    number_of_valid_subarrays = 0

    for start_index in range(len(numbers)):
        for end_index in range(start_index, len(numbers)):
            subarray = numbers[start_index:end_index + 1]

            # Create a set of unique integers
            unique_integers = set()

            for number in subarray:
                unique_integers.add(number)

            # Check if the number of unique integers is equal to k_different
            if len(unique_integers) == k_different:
                number_of_valid_subarrays += 1

    return number_of_valid_subarrays

Big(O) Analysis

Time Complexity
O(n^3)The algorithm iterates through all possible subarrays of the input array of size n. The outer loop selects the starting point, which takes n iterations. The inner loop expands the subarray from the starting point, also taking up to n iterations. Inside the inner loop, we need to count the number of distinct integers in the current subarray, which in the worst case requires iterating through all elements of the subarray, taking up to n operations. Thus, the total number of operations is proportional to n * n * n, giving a time complexity of O(n^3).
Space Complexity
O(K)The described brute force approach needs a way to keep track of the distinct integers within each subarray it examines. This could be achieved by using a hash set or dictionary to store the integers encountered in the current subarray, which at most will hold K distinct elements to satisfy the problem condition. Therefore, the maximum space required is proportional to K, where K is the target number of distinct integers. Thus, the space complexity is O(K).

Optimal Solution

Approach

The optimal approach cleverly leverages the principle of inclusion-exclusion. It cleverly transforms the problem into finding the difference between the number of subarrays with *at most* K distinct elements and the number of subarrays with *at most* K-1 distinct elements.

Here's how the algorithm would work step-by-step:

  1. Create a helper tool that efficiently finds the number of acceptable groups with a *limit* on the variety of items they can contain.
  2. Using this tool, first, find all the groups that contain at most K different kinds of items.
  3. Then, again using this tool, find all the groups that contain at most K-1 different kinds of items.
  4. Finally, subtract the number of groups with at most K-1 kinds of items from the number of groups with at most K kinds of items. This difference is the number of groups with exactly K kinds of items.

Code Implementation

def subarraysWithKDistinct(numbers, k):
    def atMostKDistinct(numbers, k):
        window_start = 0
        distinct_count = 0
        frequency_map = {}
        result = 0

        for window_end in range(len(numbers)):
            right_number = numbers[window_end]
            if right_number not in frequency_map:
                frequency_map[right_number] = 0
            frequency_map[right_number] += 1

            if frequency_map[right_number] == 1:
                distinct_count += 1

            # Shrink the window if distinct count exceeds k.
            while distinct_count > k:
                left_number = numbers[window_start]
                frequency_map[left_number] -= 1

                if frequency_map[left_number] == 0:
                    distinct_count -= 1

                window_start += 1

            result += window_end - window_start + 1

        return result

    # Inclusion-exclusion principle: K - (K-1)
    return atMostKDistinct(numbers, k) - atMostKDistinct(numbers, k - 1)

Big(O) Analysis

Time Complexity
O(n)The solution calculates the number of subarrays with at most K distinct integers and the number of subarrays with at most K-1 distinct integers. The helper function to count subarrays with at most X distinct integers uses a sliding window approach, iterating through the array of size n with two pointers (left and right) that advance linearly. Therefore, this helper function takes O(n) time. Since the helper function is called twice, the overall time complexity remains O(n).
Space Complexity
O(K)The algorithm utilizes a helper function that maintains a sliding window and a frequency map (or similar data structure) to count distinct elements within the window. In the worst case, the frequency map could store up to K distinct elements, where K is the given number of allowed distinct integers. The space used by this frequency map determines the auxiliary space complexity. Therefore, the space complexity is proportional to K, and we ignore constant factors and lower order terms.

Edge Cases

CaseHow to Handle
Empty input array (nums is null or has length 0)Return 0 if the input array is null or empty since no subarrays can be formed.
k is zero or negativeReturn 0 if k is zero or negative because a subarray cannot have a non-positive number of distinct integers.
k is greater than the length of the input arrayReturn 0 if k is greater than the length of nums since no subarray can contain more distinct integers than the array's size.
Array contains only one element and k is 1Return 1, which is the number of subarrays if the array has only one element and k is equal to 1.
Array contains duplicate numbers, and k is 1The sliding window logic correctly handles duplicate numbers and ensures the distinct count is accurate when k is 1.
Array contains all identical numbers and k is greater than 1Return 0 because an array with all identical numbers cannot have more than 1 distinct number.
Large input array with many duplicate values and k is a small numberThe sliding window approach with frequency counting efficiently handles large input sizes and duplicate values as it only needs to iterate through the array once.
Integer overflow in frequency counting or calculating window sizesUse appropriate data types (e.g., long) for frequency counts and window sizes to avoid integer overflow issues.