Taro Logo

Design Hit Counter

Medium
Reddit logo
Reddit
0 views
Topics:
ArraysTwo Pointers

Design a hit counter which counts the number of hits received in the past 5 minutes (i.e., the past 300 seconds). Each hit is recorded with a timestamp that is an integer representing seconds elapsed since the beginning of time.

You should implement the following methods:

  • HitCounter() Initializes the object of the hit counter system.
  • void hit(int timestamp) Records a hit that happened at timestamp (in seconds). Several hits may occur at the same same timestamp.
  • int getHits(int timestamp) Returns the number of hits in the past 5 minutes from timestamp (i.e., the past 300 seconds).

Example 1:

Input
["HitCounter", "hit", "hit", "hit", "getHits", "hit", "getHits", "getHits"]
[[], [1], [2], [3], [4], [300], [300], [301]]
Output
[null, null, null, null, 3, null, 4, 3]

Explanation
HitCounter hitCounter = new HitCounter();
hitCounter.hit(1);       // hit at timestamp 1.
hitCounter.hit(2);       // hit at timestamp 2.
hitCounter.hit(3);       // hit at timestamp 3.
hitCounter.getHits(4);   // get hits at timestamp 4, return 3.
hitCounter.hit(300);     // hit at timestamp 300.
hitCounter.getHits(300);  // get hits at timestamp 300, return 4.
hitCounter.getHits(301);  // get hits at timestamp 301, return 3.

Constraints:

  • 1 <= timestamp <= 2 * 109
  • All the calls are being made to the methods in chronological order (i.e., timestamp is monotonically increasing).
  • At most 300 calls will be made to hit and getHits.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected frequency of calls to the `hit` and `getHits` methods? Should I optimize for more frequent hits or more frequent getHits?
  2. Are the timestamps strictly increasing, or can they be out of order, or potentially the same?
  3. What should `getHits` return if there are no hits within the past 5 minutes? Should it return 0, null, or throw an exception?
  4. What is the data type of the timestamp? Can I assume it's an integer?
  5. Is there a limit on the maximum timestamp value? This might influence the choice of data structure.

Brute Force Solution

Approach

The brute force method for the hit counter involves storing every hit's timestamp. When we need to count the hits in the last five minutes, we simply check every stored timestamp to see if it falls within the desired range.

Here's how the algorithm would work step-by-step:

  1. Every time a hit happens, write down the exact second it occurred.
  2. When someone asks how many hits occurred in the last five minutes, go through the entire list of recorded hit times.
  3. For each recorded hit time, check if it's within the last five minutes from the current time.
  4. Count up all the hit times that fall within that five-minute window.
  5. The total count is the answer.

Code Implementation

import time

class HitCounter:
    def __init__(self):
        self.hit_timestamps = []

    def hit(self, timestamp):
        # Record the timestamp of each hit.
        self.hit_timestamps.append(timestamp)

    def get_hits(self, timestamp):
        hits_in_last_five_minutes = 0

        # We only care about hits in the last 5 minutes.
        cutoff_time = timestamp - 300

        for hit_timestamp in self.hit_timestamps:
            # Check if the hit falls within the 5 min window
            if hit_timestamp > cutoff_time:
                hits_in_last_five_minutes += 1

        return hits_in_last_five_minutes

Big(O) Analysis

Time Complexity
O(n)The brute force approach stores every hit's timestamp. When count is called, it iterates through all stored timestamps to check if they fall within the last five minutes. Thus, the time complexity of the count operation is directly proportional to the number of hits (n) stored in the hit counter. Therefore, the time complexity is O(n).
Space Complexity
O(N)The brute force method stores every hit's timestamp. This implies an auxiliary data structure, most likely a list or array, is used to store these timestamps. Where N is the number of hits recorded, the space used by this list grows linearly with the number of hits. Therefore, the space complexity is O(N).

Optimal Solution

Approach

The hit counter needs to keep track of events (hits) and efficiently report the number of hits within the last five minutes. The key idea is to store the timestamps of the hits and quickly remove old hits that are no longer relevant, avoiding unnecessary counting.

Here's how the algorithm would work step-by-step:

  1. When a hit comes in, record its timestamp.
  2. To get the number of hits within the last five minutes, first clean out any old timestamps that are more than five minutes old. This prevents counting hits that are no longer relevant.
  3. After cleaning, count the remaining timestamps. The number of remaining timestamps represents the number of hits within the last five minutes.

Code Implementation

class HitCounter:
    def __init__(self):
        self.timestamps = []

    def hit(self, timestamp: int) -> None:
        self.timestamps.append(timestamp)

    def getHits(self, timestamp: int) -> int:
        # Remove hits older than 5 minutes
        while self.timestamps and self.timestamps[0] <= timestamp - 300:
            self.timestamps.pop(0)

        # Count remaining hits after cleaning
        number_of_hits = len(self.timestamps)

        return number_of_hits

Big(O) Analysis

Time Complexity
O(n)The hit() function has O(1) time complexity as it simply appends a timestamp to the list. The getHits() function first iterates through the list of timestamps to remove those older than 5 minutes. In the worst-case scenario, it iterates through all n timestamps. After cleaning, it counts the remaining timestamps, which takes O(k) time where k is the number of hits in the last 5 minutes. Since k <= n, and the timestamp removal loop dominates, the overall time complexity of getHits() is O(n), where n is the number of total hits received so far. Therefore, the overall complexity is O(n).
Space Complexity
O(N)The space complexity is determined by the auxiliary space used to store the timestamps of the hits. In the worst-case scenario, all N hits occur within the last five minutes, requiring us to store all N timestamps. Thus, the space required grows linearly with the number of hits N. Therefore, the auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Timestamp is null or undefinedThrow an IllegalArgumentException or return an error code as timestamps must be valid integers.
Timestamps are not strictly increasingEnsure the data structure can handle out-of-order timestamps or explicitly sort them upon insertion, potentially overwriting older entries to represent most recent count.
Timestamp is a very large number approaching integer limitConsider potential integer overflow issues in calculations (timestamp - 300 + 1) and use long if necessary or module to the max possible value.
Many hits occur at exactly the same timestampEnsure the solution correctly counts all hits at the same timestamp within the 5-minute window.
getHits() is called frequently with no new hit() callsThe solution should efficiently return the count without unnecessary computation when no new hits have been recorded.
The system receives a very high volume of hits, exceeding available memoryConsider using a more space-efficient data structure like a circular buffer or sampling techniques if memory is a constraint.
Timestamp is significantly earlier than previous timestamps (rewinding in time)Handle older timestamps gracefully by correctly updating the hit counts based on the new timestamp.
getHits() is called with a timestamp that is before any recorded hits.Return 0 as no hits are recorded before the initial timestamp.