Taro Logo

Find Missing Observations

#990 Most AskedMedium
Topics:
ArraysGreedy Algorithms

You have observations of n + m 6-sided dice rolls with each face numbered from 1 to 6. n of the observations went missing, and you only have the observations of m rolls. Fortunately, you have also calculated the average value of the n + m rolls.

You are given an integer array rolls of length m where rolls[i] is the value of the ith observation. You are also given the two integers mean and n.

Return an array of length n containing the missing observations such that the average value of the n + m rolls is exactly mean. If there are multiple valid answers, return any of them. If no such array exists, return an empty array.

The average value of a set of k numbers is the sum of the numbers divided by k.

Note that mean is an integer, so the sum of the n + m rolls should be divisible by n + m.

Example 1:

Input: rolls = [3,2,4,3], mean = 4, n = 2
Output: [6,6]
Explanation: The mean of all n + m rolls is (3 + 2 + 4 + 3 + 6 + 6) / 6 = 4.

Example 2:

Input: rolls = [1,5,6], mean = 3, n = 4
Output: [2,3,2,2]
Explanation: The mean of all n + m rolls is (1 + 5 + 6 + 2 + 3 + 2 + 2) / 7 = 3.

Example 3:

Input: rolls = [1,2,3,4], mean = 6, n = 4
Output: []
Explanation: It is impossible for the mean to be 6 no matter what the 4 missing rolls are.

Constraints:

  • m == rolls.length
  • 1 <= n, m <= 105
  • 1 <= rolls[i], mean <= 6

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible values for each observation in `rolls` and the value of `m`? Can they be negative, zero, or floating point numbers?
  2. What are the valid ranges for `n` and the length of the `rolls` array?
  3. If it's impossible to produce the required `mean` with `n` observations between 1 and `m`, what should the function return?
  4. Is the given `mean` guaranteed to be attainable with integer values between 1 and `m` given the constraints?
  5. Are `n`, `m` and the elements in the `rolls` array always integers?

Brute Force Solution

Approach

The brute force method for this problem involves testing every possible number that could be missing. We essentially check if adding a specific number fulfills the required average condition by recalculating the sum and comparing it to the expected sum.

Here's how the algorithm would work step-by-step:

  1. Consider each possible number within the given range (usually 1 to 6) as the missing observation.
  2. For each of these possibilities, add the number to the total sum of the known observations.
  3. Calculate the expected total sum by multiplying the desired average by the total number of observations (including the missing one).
  4. Compare the recalculated total sum with the expected total sum.
  5. If they match, then the number we added is a valid missing observation. Keep track of all valid missing observations.
  6. Repeat this process for all the numbers in the given range.
  7. If no number resulted in a matching sum, then it means that no solution exists.
  8. Finally, return all the valid missing observations found.

Code Implementation

def find_missing_observations_brute_force(rolls, mean, number_of_missing):
    missing_observations = []
    number_of_rolls = len(rolls)

    for possible_missing_value in range(1, 7):
        recalculated_sum = sum(rolls) + possible_missing_value

        expected_sum = mean * (number_of_rolls + number_of_missing)

        # If the sums are the same, we have found a valid missing value
        if recalculated_sum == expected_sum:

            missing_observations.append(possible_missing_value)

    # If we've found possible values, return them. Otherwise return an empty array to symbolize no results
    if missing_observations:
        return missing_observations

    else:
        return []

Big(O) Analysis

Time Complexity
O(m)The algorithm iterates through a fixed range of possible missing numbers. Let's call the size of this range 'm'. Inside this loop, it performs constant-time arithmetic operations to calculate sums and compare them. The dominant factor determining the runtime is the size of the range of potential missing values. Therefore the time complexity depends on the number of missing values we check and is O(m).
Space Complexity
O(1)The algorithm's space complexity is determined by the need to store valid missing observations. The problem description mentions keeping track of all valid missing observations. However, it does not explicitly state that these valid observations are stored in a data structure like an array or a list. Thus we would store the result in a variable. The other variables used (like for looping and calculation) consume constant space regardless of the size of the input array. As the space usage remains constant irrespective of the input size, N, the space complexity is O(1).

Optimal Solution

Approach

The goal is to figure out the values that are missing from a set of observations, given their average. We can determine the total sum of all the observations and then subtract the known values to determine what the sum of the missing values should be.

Here's how the algorithm would work step-by-step:

  1. First, calculate the total sum of all observations (both known and missing) by multiplying the desired average by the total number of observations.
  2. Next, sum up all the known observations that we are given.
  3. Subtract the sum of the known observations from the total sum to find the sum of the missing observations.
  4. Determine the value of each missing observation by dividing the sum of missing observations by the number of missing observations.
  5. If the calculated value for the missing observation is not a valid value (outside the allowed range), return an empty list. This indicates the provided average is not achievable with the given constraints.
  6. If the calculated value is valid, construct a list containing the calculated value repeated for the number of missing observations and return it.

Code Implementation

def find_missing_observations(rolls, mean, number_of_missing):
    total_number_of_rolls = len(rolls) + number_of_missing
    total_sum = mean * total_number_of_rolls

    existing_rolls_sum = sum(rolls)

    # Need to find the sum of the missing values to calculate each value.
    missing_rolls_sum = total_sum - existing_rolls_sum

    missing_roll_value = missing_rolls_sum // number_of_missing

    # Check if the missing roll value is within the valid range.
    if not (1 <= missing_roll_value <= 6):
        return []

    if missing_rolls_sum % number_of_missing != 0:
        return []

    # Return the generated list containing the missing observations
    return [missing_roll_value] * number_of_missing

Big(O) Analysis

Time Complexity
O(n)The algorithm first calculates the total sum using the average in O(1) time. Then, it iterates through the input array of known observations once to compute their sum. This iteration takes O(n) time, where n is the number of known observations. The remaining steps involve constant time operations like subtraction and division. Therefore, the dominant operation is the single iteration through the input array, resulting in a time complexity of O(n).
Space Complexity
O(m)The algorithm's space complexity is determined by the size of the list containing the missing observations, where 'm' is the number of missing observations. While the algorithm calculates intermediate sums and the value of each missing observation using constant space variables, the primary space consumption comes from constructing the result list of size 'm' to hold the computed missing values. Thus, the auxiliary space used is proportional to the number of missing observations. This results in a space complexity of O(m).

Edge Cases

rolls is null or empty
How to Handle:
Return an empty array immediately as no observations can be inferred.
n is zero or negative
How to Handle:
Return an empty array since the number of missing rolls cannot be non-positive.
mean is zero or negative
How to Handle:
Return an empty array, as a negative mean for dice rolls is typically invalid.
(m + n) * mean < sum(rolls)
How to Handle:
Return an empty array, as the required sum is less than what is already available, implying impossible dice values.
(m + n) * mean > 6 * n + sum(rolls)
How to Handle:
Return an empty array, as the required sum is greater than the maximum possible sum based on number of missing rolls.
n is very large causing possible integer overflow when calculating (m+n)*mean
How to Handle:
Use long data type for intermediate calculations to prevent potential integer overflows.
The average dice value required (total sum / n) is not a valid dice roll (outside range 1-6)
How to Handle:
Adjust the distribution of dice roll values, pushing some higher and some lower until all values fall between 1 and 6, or return empty array if impossible.
rolls contains very large numbers or a huge number of rolls causing overflow issues during summation
How to Handle:
Use long datatype for summation or iterative subtraction, handling cases where individual rolls themselves might exceed the integer limit.
0/1037 completed