Taro Logo

Count the Number of Good Subarrays

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+2
More companies
Profile picture
Profile picture
88 views
Topics:
ArraysSliding WindowsTwo Pointers

Given an integer array nums and an integer k, return the number of good subarrays of nums.

A subarray arr is good if there are at least k pairs of indices (i, j) such that i < j and arr[i] == arr[j].

A subarray is a contiguous non-empty sequence of elements within an array.

Example 1:

Input: nums = [1,1,1,1,1], k = 10
Output: 1
Explanation: The only good subarray is the array nums itself.

Example 2:

Input: nums = [3,1,4,3,2,2,4], k = 2
Output: 4
Explanation: There are 4 different good subarrays:
- [3,1,4,3,2,2] that has 2 pairs.
- [3,1,4,3,2,2,4] that has 3 pairs.
- [1,4,3,2,2,4] that has 2 pairs.
- [4,3,2,2,4] that has 2 pairs.

Constraints:

  • 1 <= nums.length <= 105
  • 1 <= nums[i], k <= 109

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What defines a 'good' subarray? Is there a specific property or condition the elements within the subarray must satisfy?
  2. What are the possible data types and range of values for the numbers in the input array? Could there be negative numbers, zero, or very large numbers?
  3. Is the input array guaranteed to be non-empty, and what should I return if the input array is null or empty?
  4. Are overlapping subarrays counted multiple times, or should I count each 'good' subarray only once?
  5. Can you provide a few examples of input arrays and their corresponding 'good' subarray counts to illustrate the definition clearly?

Brute Force Solution

Approach

The brute force approach to counting good subarrays involves examining every possible subarray. We will go through each starting position and then explore all possible ending positions from that start. For each subarray, we will determine if the subarray is a 'good' subarray, and then count them.

Here's how the algorithm would work step-by-step:

  1. Start with the first element; consider it a subarray of length one.
  2. Then, consider the first two elements as a subarray.
  3. Continue adding elements to this initial subarray, one at a time, until you reach the end of the entire collection.
  4. Next, start at the second element and repeat the process of creating all possible subarrays starting from there.
  5. Keep doing this, moving the starting point one element at a time, until you've considered all possible starting positions.
  6. For each of these subarrays you created, check if it meets the criteria to be considered 'good'.
  7. Finally, count all the subarrays that you determined to be 'good'.

Code Implementation

def count_good_subarrays_brute_force(numbers): 
    total_subarrays = len(numbers) 
    good_subarray_count = 0

    for start_index in range(total_subarrays):
        # Iterate through all possible starting indices

        for end_index in range(start_index, total_subarrays):
            # Iterate through all possible ending indices for each starting index

            current_subarray = numbers[start_index : end_index+1]
            subarray_length = len(current_subarray)

            # Good Subarray is defined as a subarray whose length is an even number
            if subarray_length % 2 == 0:

                good_subarray_count += 1

    return good_subarray_count

Big(O) Analysis

Time Complexity
O(n²)The brute force approach iterates through all possible subarrays. For an input array of size n, there are n possible starting positions. For each starting position, there are up to n possible ending positions, defining a subarray. Thus, we have nested loops where the outer loop iterates n times and the inner loop iterates up to n times in the worst case. The total number of operations is proportional to the sum 1 + 2 + ... + n, which is approximately n * (n+1) / 2, simplifying to O(n²).
Space Complexity
O(1)The described brute force approach iterates through all possible subarrays using nested loops. While it generates subarrays, it doesn't explicitly store all of them simultaneously in a separate data structure. The only extra space used is for a few scalar variables, such as loop counters or a variable to store the count of 'good' subarrays, and potentially a variable used to determine if a subarray is 'good'. The amount of space required for these variables does not scale with the input size N, where N is the size of the input array. Therefore, the auxiliary space complexity is constant, O(1).

Optimal Solution

Approach

The goal is to count how many sections of a sequence meet a specific requirement. Instead of checking every possible section individually, we'll keep track of the counts we've seen and use that information to efficiently calculate if a new section meets the requirement.

Here's how the algorithm would work step-by-step:

  1. Start by going through the sequence and calculating a running sum or count based on the sequence values (e.g., the number of even numbers seen so far).
  2. Keep track of how many times each sum or count has appeared up to each point in the sequence.
  3. For each position in the sequence, determine the value needed to create a valid section starting from an earlier position.
  4. Look up how many times that 'needed' value appeared earlier in the sequence. This tells you how many valid sections end at the current position.
  5. Add these counts to your total. This approach avoids re-calculating information for overlapping sections, making it much faster.

Code Implementation

def count_good_subarrays(numbers, k):
    count_of_good_subarrays = 0
    current_sum = 0
    sum_frequency = {0: 1}

    for index in range(len(numbers)):
        current_sum += numbers[index]

        # This determines value needed for a valid subarray
        needed_sum = current_sum - k

        if needed_sum in sum_frequency:
            count_of_good_subarrays += sum_frequency[needed_sum]

        # Keep track of sum frequencies to avoid recalculation
        if current_sum in sum_frequency:
            sum_frequency[current_sum] += 1

        else:
            sum_frequency[current_sum] = 1

    return count_of_good_subarrays

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the sequence of n elements once to calculate running sums or counts. Inside this loop, it performs a constant-time lookup in a hash map (or similar data structure) to find the number of times a 'needed' value appeared earlier in the sequence. Therefore, the overall time complexity is dominated by the single loop, resulting in O(n).
Space Complexity
O(N)The algorithm uses a data structure (likely a hash map or dictionary) to keep track of how many times each running sum or count has appeared. In the worst-case scenario, where all running sums or counts are distinct, this data structure could store up to N key-value pairs, where N is the length of the input sequence. Thus, the auxiliary space used is proportional to N. Therefore, the space complexity is O(N).

Edge Cases

Empty array as input
How to Handle:
Return 0 as there are no subarrays.
Array with a single element
How to Handle:
Return 0 since a 'good' subarray requires a minimum length of 2.
Array with all elements equal to 0
How to Handle:
The number of 'good' subarrays depends on the definition of 'good', but the code needs to correctly count pairs/subarrays that sum to 0 if 'good' implies a sum of 0.
Array with very large positive numbers that could cause integer overflow when summing
How to Handle:
Use long long (C++) or equivalent to prevent integer overflow during sum calculation or return 0 if over flow is detected.
Array with very large negative numbers that could cause integer underflow when summing
How to Handle:
Use long long (C++) or equivalent to prevent integer underflow during sum calculation or return 0 if under flow is detected.
Array with alternating large positive and negative numbers
How to Handle:
Ensure the cumulative sum calculation is accurate and doesn't overflow/underflow due to alternating signs.
Maximum size array (limited by memory)
How to Handle:
The solution needs to be efficient in terms of memory usage to avoid exceeding memory limits; consider using O(1) auxillary space if possible.
All numbers in array are the same (e.g., all 1s or all -1s)
How to Handle:
Correctly count the number of subarrays depending on what the sum would equal, for example n*(n+1)/2 if you are looking for all numbers that add up.