Taro Logo

Mean of Array After Removing Some Elements

Easy
Google logo
Google
2 views
Topics:
Arrays

You are given an integer array arr. Your task is to calculate the mean of the remaining integers after removing the smallest 5% and the largest 5% of the elements. The answer should be within 10^-5 of the actual answer to be considered correct.

Constraints:

  • 20 <= arr.length <= 1000
  • arr.length is a multiple of 20.
  • 0 <= arr[i] <= 10^5

Examples:

  1. Input: arr = [1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3] Output: 2.00000 Explanation: After removing the minimum (1) and the maximum (3) values, all elements are equal to 2, so the mean is 2.

  2. Input: arr = [6,2,7,5,1,2,0,3,10,2,5,0,5,5,0,8,7,6,8,0] Output: 4.00000 Explanation: 5% of 20 is 1. So we remove the smallest (0) and the largest (10) numbers. The remaining numbers sum to 80, and there are 19 numbers left. So 80/19 is approximately 4.00.

  3. Input: arr = [6,0,7,0,7,5,7,8,3,4,0,7,8,1,6,8,1,1,2,4,8,1,9,5,4,3,8,5,10,8,6,6,1,0,6,10,8,2,3,4] Output: 4.77778

How would you efficiently calculate the trimmed mean given these constraints and examples?

Solution


Naive Solution

The most straightforward approach is to sort the array and then calculate the mean of the remaining elements after removing the smallest 5% and the largest 5%.

  1. Sort the input array arr.
  2. Calculate the number of elements to remove from each end: n = arr.length, k = n * 5 / 100.
  3. Sum the elements from index k to n - k - 1.
  4. Divide the sum by n - 2 * k to get the mean.
import java.util.Arrays;

class Solution {
    public double trimMean(int[] arr) {
        Arrays.sort(arr);
        int n = arr.length;
        int k = n * 5 / 100;
        double sum = 0;
        for (int i = k; i < n - k; i++) {
            sum += arr[i];
        }
        return sum / (n - 2 * k);
    }
}
  • Time Complexity: O(n log n) due to the sorting step.
  • Space Complexity: O(1) or O(n) depending on the sorting algorithm implementation (in-place or creating a copy).

Optimal Solution

An optimized approach avoids full sorting by using a counting sort (since the problem states 0 <= arr[i] <= 10^5). This is efficient because we can count the occurrences of each number and skip the smallest and largest 5% when calculating the sum. Since the range of numbers are limited, this is preferable to sorting the array.

  1. Create a counting array count of size 100001 (as 0 <= arr[i] <= 10^5).
  2. Populate the count array by iterating through arr.
  3. Calculate the number of elements to remove from each end: n = arr.length, k = n * 5 / 100.
  4. Iterate from 0 to 100000, adding each number i to the sum count[i] times, while keeping track of how many numbers have been added to the sum. Skip the smallest and largest 5% using the counter k.
  5. Return the mean.
class Solution {
    public double trimMean(int[] arr) {
        int n = arr.length;
        int k = n * 5 / 100;
        int[] count = new int[100001];
        for (int x : arr) {
            count[x]++;
        }

        double sum = 0;
        int numCounted = 0;

        // Remove smallest 5%
        int removed = 0;
        for (int i = 0; i < count.length && removed < k; i++) {
            int toRemove = Math.min(count[i], k - removed);
            removed += toRemove;
            count[i] -= toRemove;
        }

        // Sum remaining numbers
        for (int i = 0; i < count.length; i++) {
            sum += (double) i * count[i];
            numCounted += count[i];
        }

        // Remove largest 5%
        removed = 0;
        for (int i = count.length - 1; i >= 0 && removed < k; i--) {
            int toRemove = Math.min(count[i], k - removed);
            removed += toRemove;
            count[i] -= toRemove;
        }

        

        return sum / (n - 2 * k);
    }
}
  • Time Complexity: O(n + m), where n is the length of the input array, and m is the range of numbers (100001). In many cases, n will be greater than m. So O(n) would also be a good estimate.
  • Space Complexity: O(m), where m is the range of numbers (100001).

Edge Cases

  • Empty Array: The problem statement says 20 <= arr.length <= 1000, so we don't need to consider this.
  • All elements are the same: Should work fine.
  • Large range of numbers: The constraint 0 <= arr[i] <= 10^5 is important; if the range was unbounded, sorting would be better.
  • Zero elements to remove (k = 0): This is possible if arr.length = 20. The division should still give the correct mean.