Taro Logo

Count Subarrays With Median K #8 Most Asked

Hard
1 view
Topics:
Arrays

You are given an array nums of size n consisting of distinct integers from 1 to n and a positive integer k.

Return the number of non-empty subarrays in nums that have a median equal to k.

Note:

  • The median of an array is the middle element after sorting the array in ascending order. If the array is of even length, the median is the left middle element.
    • For example, the median of [2,3,1,4] is 2, and the median of [8,4,3,5,1] is 4.
  • A subarray is a contiguous part of an array.

Example 1:

Input: nums = [3,2,1,4,5], k = 4
Output: 3
Explanation: The subarrays that have a median equal to 4 are: [4], [4,5] and [1,4,5].

Example 2:

Input: nums = [2,3,1], k = 3
Output: 1
Explanation: [3] is the only subarray that has a median equal to 3.

Constraints:

  • n == nums.length
  • 1 <= n <= 105
  • 1 <= nums[i], k <= n
  • The integers in nums are distinct.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the range of values for its elements and for `k`?
  2. The problem defines the median for an even-sized array as the 'left-of-center' element. Could you confirm this with an example? For an array like `[1, 2, 3, 4]`, would the median be `2`?
  3. Is it guaranteed that the integer `k` will always be present in the input array `nums`?
  4. Can the input array `nums` contain duplicate values, including duplicates of `k`?
  5. Should I be concerned about integer overflow for the final count, or can I assume it will fit within a standard integer type?

Brute Force Solution

Approach

The brute force strategy is to look at every single possible continuous group of numbers within the main list. For each of these groups, we'll figure out what its middle number is and check if it's the special number 'K' we're looking for.

Here's how the algorithm would work step-by-step:

  1. First, consider all possible continuous groups of numbers from our original list.
  2. Start with the shortest possible groups, which are just single numbers, then look at all groups of two, then all groups of three, and so on, until you've considered the entire list as one group.
  3. For every single group you've identified, take all its numbers and arrange them in order from smallest to largest.
  4. Once a group's numbers are sorted, find the number that sits exactly in the middle.
  5. Check if this middle number is the special number 'K' we are interested in.
  6. Keep a running count. Each time the middle number of a group matches 'K', increase your count by one.
  7. After checking every possible group, the final count is your answer.

Code Implementation

def count_subarrays_with_median_k(numbers, special_k_value):
    subarray_count = 0
    list_length = len(numbers)

    # The first loop defines the starting point of our continuous group (subarray).
    for start_index in range(list_length):

        # The second loop defines the ending point, thus creating all possible subarrays.
        for end_index in range(start_index, list_length):
            current_subarray = numbers[start_index:end_index + 1]
            
            # For each subarray, we must sort it to find the middle element, which is the median.
            sorted_subarray = sorted(current_subarray)
            subarray_length = len(sorted_subarray)
            
            # The median index depends on whether the subarray length is odd or even.
            median_index = (subarray_length - 1) // 2
            median_value = sorted_subarray[median_index]

            if median_value == special_k_value:
                subarray_count += 1

    return subarray_count

Big(O) Analysis

Time Complexity
O(n³ log n)The dominant cost comes from iterating through all possible subarrays and then sorting each one. There are approximately n²/2 subarrays. The first loop selects a starting index (n possibilities), and a nested loop selects an ending index (up to n possibilities), defining a subarray. For each subarray of length L, we sort it, which takes O(L log L) time. In the worst case, L can be up to n, making the sorting step O(n log n). Since we perform this expensive sort for each of the O(n²) subarrays, the total operations are roughly n² * n log n, which simplifies to a time complexity of O(n³ log n).
Space Complexity
O(N)The dominant factor in space complexity comes from the process of sorting each subarray to find its median. For each subarray considered, a temporary copy of its elements is created for sorting. The largest possible subarray is the original list itself, which has N elements, thus requiring an auxiliary list of size N. Since this is the maximum space needed at any one time, the space complexity is proportional to the size of the input list.

Optimal Solution

Approach

The core idea is to re-imagine the numbers in the list based on their relationship to the target number, K. This simplifies the problem into finding groups of numbers where the count of larger numbers either equals or is one more than the count of smaller numbers, which is a much easier problem to solve.

Here's how the algorithm would work step-by-step:

  1. First, locate the special number K within the sequence. Any valid group of numbers must contain K.
  2. Now, let's simplify the entire sequence. Replace every number greater than K with a positive one (+1), every number less than K with a negative one (-1), and K itself with a zero.
  3. The median of a group of these new simplified numbers will be zero if the number of positive ones and negative ones are roughly balanced. This corresponds to the original group having K as its median.
  4. Think of the original sequence as being split into two parts: the part before K and the part after K.
  5. We start at the position of K and move leftwards, one number at a time, keeping a running total of the simplified positive and negative ones we encounter. We track how many times each running total appears.
  6. Next, we start again from K and move rightwards. For each step to the right, we calculate a new running total.
  7. For each running total from the right side, we look at the totals we recorded from the left side. We can pair up totals from the left and right that cancel each other out (or nearly cancel out) to form a valid group with a median of K.
  8. By adding up all the valid pairings found, we get the total count of all such groups without ever having to check each group individually.

Code Implementation

from collections import defaultdict

def count_subarrays_with_median_k(numbers, k):
    # Step 1: Find the index of k. Any valid subarray must contain this element.
    try:
        k_index = numbers.index(k)
    except ValueError:
        return 0

    # Step 5: Calculate prefix sums for the left part of the array relative to k.
    left_balance_counts = defaultdict(int)
    left_balance_counts[0] = 1
    current_balance = 0
    for index in range(k_index - 1, -1, -1):
        if numbers[index] < k:
            current_balance -= 1
        else:
            current_balance += 1
        left_balance_counts[current_balance] += 1

    total_valid_subarrays = 0
    current_balance = 0

    # The subarray containing just k is a valid starting point.
    total_valid_subarrays += left_balance_counts[0] + left_balance_counts[1]

    # Step 6 & 7: Iterate rightward from k, calculating balance and pairing with left balances.
    for index in range(k_index + 1, len(numbers)):
        if numbers[index] < k:
            current_balance -= 1
        else:
            current_balance += 1
        
        # A balance of 0 or 1 makes k the median. We need the opposite balance from the left part.
        # To get a total balance of 0, we need -current_balance from the left.
        total_valid_subarrays += left_balance_counts[-current_balance]
        # To get a total balance of 1, we need 1-current_balance from the left.
        total_valid_subarrays += left_balance_counts[1 - current_balance]

    return total_valid_subarrays

Big(O) Analysis

Time Complexity
O(n)The time complexity is determined by three main sequential steps, each of which scales linearly with the input size, n. First, we iterate through the array once to find the index of K. Next, we traverse the left side of K, from its index down to the beginning, calculating running sums and storing their frequencies in a hash map; this takes at most n operations. Finally, we traverse the right side of K, from its index to the end, and for each position, we perform a constant time hash map lookup to find matching sums. Since these are three separate, non-nested passes over the array or its parts, the total operations are proportional to n + n + n, which simplifies to O(n).
Space Complexity
O(N)The primary data structure contributing to space complexity is used to track the running totals from the left side of K, as described in step 5. In the worst-case scenario, where all prefixes have a unique running total, this tracking mechanism, typically a hash map, could store up to N entries. Therefore, the auxiliary space required grows linearly with the size of the input array, N.

Edge Cases

Input array is empty or null
How to Handle:
The solution should return 0 as no subarrays can be formed.
The integer k does not exist in the input array
How to Handle:
No subarray can have k as its median, so the function must return 0.
Input array has only one element
How to Handle:
The solution correctly counts 1 if the single element is k, and 0 otherwise.
All elements in the array are equal to k
How to Handle:
Every possible subarray will have k as its median, so the solution should correctly count all n*(n+1)/2 subarrays.
The integer k is the smallest or largest value in the array
How to Handle:
The relative balance logic correctly handles cases where elements are only greater than or only less than k.
Input array contains negative numbers and zeros
How to Handle:
The solution is robust as it only cares about the relative values of elements compared to k, not their absolute values.
Large input array size (e.g., up to 10^5 elements)
How to Handle:
An O(N) solution using prefix sums and a hash map is required to avoid a timeout from a brute-force O(N^2) approach.
Array contains multiple instances of k
How to Handle:
The solution must correctly calculate subarrays centered around each occurrence of k, which the prefix sum approach naturally handles.
0/0 completed