Taro Logo

Moving Average from Data Stream

Easy
Meta logo
Meta
4 views
Topics:
ArraysSliding Windows

Given a stream of integers and a window size, calculate the moving average of all integers in the sliding window.

Implement the MovingAverage class:

  • MovingAverage(int size) Initializes the object with the size of the window size.
  • double next(int val) Returns the moving average of the last size values of the stream.

Example 1:


MovingAverage m = new MovingAverage(3);
m.next(1); // return 1.0
m.next(10); // return (1 + 10) / 2 = 5.5
m.next(3); // return (1 + 10 + 3) / 3 = 4.66667
m.next(5); // return (10 + 3 + 5) / 3 = 6.0

Constraints:

  • 1 <= size <= 1000
  • -105 <= val <= 105
  • At most 104 calls will be made to next.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected data type for the incoming stream of integers, and what is the range of possible values for each integer?
  2. Can the provided 'size' be zero or negative? If so, what should the `next()` method return?
  3. If the number of integers added to the stream is less than 'size', should the moving average be calculated over the integers added so far, or should I return an error?
  4. Is there a maximum number of calls to the `next()` method I should anticipate?
  5. What should I return if the input stream is empty and `next()` is called?

Brute Force Solution

Approach

The goal is to find the average of the last few numbers we've seen. The brute force approach calculates the average every time we receive a new number by re-examining all the previous numbers within the specified window.

Here's how the algorithm would work step-by-step:

  1. Every time a new number comes in, we need to figure out the average.
  2. To find this average, we add up the last 'x' numbers where 'x' is the specified window size.
  3. If we haven't seen 'x' numbers yet, we just add up all the numbers we've seen so far.
  4. Then, we divide the sum by the number of values added to get the average.
  5. We repeat this process every time a new number arrives to get the moving average.

Code Implementation

class MovingAverage:

    def __init__(self, size_of_window):
        self.size_of_window = size_of_window
        self.numbers_in_window = []

    def next(self, value_to_add):
        self.numbers_in_window.append(value_to_add)
        
        # Ensure the size of the window does not exceed the specified size
        if len(self.numbers_in_window) > self.size_of_window:

            self.numbers_in_window.pop(0)

        # Calculate the current sum of numbers in the window
        current_sum = 0

        for number in self.numbers_in_window:
            current_sum += number

        # Calculating average based on elements in the window
        return current_sum / len(self.numbers_in_window)

Big(O) Analysis

Time Complexity
O(n)Each time a new number arrives, we iterate through a window of at most 'size' elements to calculate the sum. In the worst-case scenario, we might have to iterate through all 'n' numbers we have seen so far if n is smaller than or equal to the specified size. Therefore, for each incoming number, we perform an operation proportional to the number of elements processed. This implies that processing 'n' numbers takes a total of O(n) time since each number is effectively only being added to the sum once before it potentially exits the moving average calculation window.
Space Complexity
O(k)The algorithm needs to store the last 'k' numbers (where 'k' is the window size) to calculate the moving average. This is typically done using a queue or a list that holds these numbers. Therefore, the auxiliary space grows linearly with the window size 'k'. The space used does not depend on the total number of numbers seen 'N', but solely on the window size.

Optimal Solution

Approach

We need to keep track of a running average as new numbers come in. The trick is to avoid recalculating the entire average each time. We only need to adjust the average based on the new number and the oldest number that is no longer part of the window.

Here's how the algorithm would work step-by-step:

  1. When a new number arrives, add it to a running sum.
  2. Also, keep track of all the numbers in the order they arrived.
  3. If we have more numbers than the specified window size, subtract the oldest number from the running sum.
  4. After adding the new number and potentially subtracting the oldest, divide the running sum by the current number of items (up to the window size) to get the average.
  5. Return the calculated average.

Code Implementation

class MovingAverage:

    def __init__(self, size):
        self.window_size = size
        self.number_queue = []
        self.current_sum = 0.0

    def next(self, value):
        self.number_queue.append(value)
        self.current_sum += value

        # Remove the oldest element when exceeding window size.
        if len(self.number_queue) > self.window_size:

            self.current_sum -= self.number_queue.pop(0)

        # Calculate average based on elements in the window.
        return self.current_sum / len(self.number_queue)

Big(O) Analysis

Time Complexity
O(1)The `next` method processes each new number in constant time. Adding to the running sum, updating the queue of numbers, and calculating the average all take a fixed amount of time regardless of how many numbers have been processed. The window size is fixed and does not depend on the overall input size, so even managing the queue is O(1). Therefore, the time complexity is O(1).
Space Complexity
O(W)The algorithm keeps track of all numbers in the order they arrived. This implies storing the numbers in a queue or list. In the worst case, the data structure will contain at most W numbers, where W is the specified window size. Therefore, the auxiliary space used is proportional to the window size W. Other than this list, only a few constant space variables (like the running sum) are used, which do not depend on the input size.

Edge Cases

CaseHow to Handle
Size is zeroReturn 0 or throw an exception, as a moving average with a window of size 0 is undefined.
Size is negativeThrow an IllegalArgumentException since a negative window size is not meaningful.
Stream contains very large positive and negative integers, causing overflow of the sumUse long data type to store the sum or consider using a different approach that avoids storing the entire sum, like Kahan summation if high precision is required.
Stream contains floating point numbers that result in floating point precision errors when calculating averageUse double for calculations, be aware of potential precision loss, and handle comparisons carefully by using tolerances if needed.
The data stream contains only a few values such that the number of elements seen is less than sizeCalculate the average based on the number of elements seen so far, returning the sum of elements seen / the number of elements seen.
Maximum integer values in the stream are close to Integer.MAX_VALUECheck for overflow during summation and either throw an exception or use long to avoid it.
The input stream has a very large number of integers, and the size is also very large, potentially leading to high memory usageUse a data structure with fixed memory size like a circular queue or a linked list where oldest data is removed.
size is larger than the number of elements in streamReturn the average of all available elements or throw an exception.