Taro Logo

Design Hit Counter

Medium
Meta logo
Meta
5 views
Topics:
ArraysDatabase Problems

Let's design a hit counter system. Imagine you have a website and you want to track the number of hits (requests) it receives within the last 5 minutes.

Your task is to implement a HitCounter class with the following methods:

  1. HitCounter(): Initializes the object with an empty hit counter.
  2. void hit(int timestamp): Records a hit (request) at a specific timestamp. The timestamp represents the time in seconds from the start of the system (e.g., the Unix epoch). You can assume that timestamps are always increasing.
  3. int getHits(int timestamp): Returns the number of hits recorded within the last 5 minutes (300 seconds) from the given timestamp. In other words, it should return the number of hits with timestamps in the range [timestamp - 300, timestamp].

Example:

HitCounter hitCounter = new HitCounter();

// a hit at timestamp = 1.
hitCounter.hit(1);

// a hit at timestamp = 2.
hitCounter.hit(2);

// a hit at timestamp = 3.
hitCounter.hit(3);

// getHits(4) should return 3 (All hits are within the last 5 minutes)
hitCounter.getHits(4);

// a hit at timestamp = 300.
hitCounter.hit(300);

// getHits(300) should return 4.
hitCounter.getHits(300);

// a hit at timestamp = 301.
hitCounter.hit(301);

// getHits(301) should return 4.
hitCounter.getHits(301);

// getHits(600) should return 1, as only the hit at timestamp 301 is within the last 5 minutes.
hitCounter.getHits(600);

How would you implement the HitCounter class to efficiently handle a large number of hits and frequent calls to the getHits method? Consider the time and space complexity of your solution. What edge cases might arise, and how would you address them?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range of the timestamp values, and how should I handle potentially out-of-order timestamps in the `hit` function?
  2. If `getHits` is called with a timestamp that is less than or equal to the timestamp of a previous hit, should I still only count hits within the last 5 minutes relative to the input timestamp?
  3. What should I return from `getHits` if there are no hits recorded within the last 5 minutes (or at all)? Should I return 0, null, or throw an exception?
  4. Can I assume that the timestamp is given in seconds (or some other specific unit), and that 5 minutes corresponds to exactly 300 seconds?
  5. How often will `hit` and `getHits` be called relative to each other? Is `getHits` called much more frequently than `hit`, or vice versa, as this might influence data structure choices?

Brute Force Solution

Approach

The goal is to track website hits over time. The brute force way is to simply remember every single hit that happens, regardless of how old it is, and then when asked for the hit count, go through all the hits and filter them.

Here's how the algorithm would work step-by-step:

  1. Each time a hit comes in, remember the exact second it happened.
  2. When someone asks for the number of hits in the last five minutes (300 seconds), look at your list of all the hits.
  3. Go through the entire list of hits, one by one.
  4. For each hit, check if it happened within the last 300 seconds.
  5. If it did, count it. If not, ignore it.
  6. After checking every single hit, tell them the total number of hits that happened in the last 300 seconds.

Code Implementation

class HitCounter:

    def __init__(self):
        self.hit_timestamps = []

    def hit(self, timestamp: int) -> None:
        # Store the timestamp of the new hit.
        self.hit_timestamps.append(timestamp)


    def getHits(self, timestamp: int) -> int:
        hits_in_last_300_seconds = 0

        # Iterate through all recorded timestamps.
        for hit_timestamp in self.hit_timestamps:
            # Check if the hit occurred within the last 300 seconds.
            if timestamp - hit_timestamp < 300:
                hits_in_last_300_seconds += 1


        return hits_in_last_300_seconds

Big(O) Analysis

Time Complexity
O(n)The plain English explanation describes iterating through all recorded hits to count those within the last 300 seconds. The input size, n, is the number of recorded hits. The hit() function is O(1), adding to the end of a list which has constant time complexity. The getHits() function iterates through this entire list of hits once to count the recent ones. Thus, the time complexity of getHits() is O(n), where n is the total number of hits recorded.
Space Complexity
O(N)The provided solution stores every single hit along with its timestamp. This implies using a data structure, likely a list or array, to store all hit timestamps. As the number of hits increases, the size of this data structure grows linearly. Therefore, the auxiliary space required is proportional to the number of hits, N, where N is the total number of hits recorded. This results in a space complexity of O(N).

Optimal Solution

Approach

The best way to count recent hits is to use a data structure that automatically forgets old information. We want to quickly add new hits and efficiently count only the hits that have occurred within the last five minutes.

Here's how the algorithm would work step-by-step:

  1. Keep a list of timestamps of each hit. Each time a hit happens, record the exact time.
  2. When someone asks how many hits there have been recently, go through the list of timestamps.
  3. Remove all the timestamps that are older than five minutes ago. Get rid of the outdated information.
  4. Count how many timestamps are left. That's how many hits have happened in the last five minutes.
  5. The list naturally 'rolls over' time, always only keeping the most recent hits.

Code Implementation

class HitCounter:
    def __init__(self):
        self.hit_timestamps = []

    def hit(self, timestamp: int) -> None:
        self.hit_timestamps.append(timestamp)

    def getHits(self, timestamp: int) -> int:
        # Remove outdated timestamps from the list.
        while self.hit_timestamps and self.hit_timestamps[0] <= timestamp - 300:
            self.hit_timestamps.pop(0)

        # The remaining timestamps represent hits in the last 5 minutes.
        return len(self.hit_timestamps)


# Your HitCounter object will be instantiated and called as such:
# hit_counter = HitCounter()
# hit_counter.hit(timestamp)
# hits = hit_counter.getHits(timestamp)

Big(O) Analysis

Time Complexity
O(n)The hit function has a time complexity of O(1) as it simply appends a timestamp to the list. The getHits function iterates through the list of timestamps. Let n be the number of hits recorded. In the worst-case scenario, all the hits occurred within the last five minutes, so we iterate through all n timestamps to count them. Therefore, the getHits function has a time complexity of O(n).
Space Complexity
O(N)The primary auxiliary space used in this approach is the list of timestamps. In the worst-case scenario, all hits in the last five minutes could have occurred at different timestamps, leading to the storage of N timestamps, where N is the number of hits in the last five minutes. Therefore, the auxiliary space grows linearly with the number of hits in the last five minutes. This results in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Timestamp is non-positive (zero or negative)Treat non-positive timestamps as invalid inputs and either throw an exception, ignore the hit, or assume it's a very old event (timestamp 0), depending on requirements.
Timestamp is smaller than the previous timestamp in hit()Handle out-of-order timestamps gracefully by either ignoring the hit, throwing an exception to indicate invalid input, or adjusting the internal data structure to maintain chronological order as best as possible.
Large number of hits within a very short time frame, potentially exceeding memory or computational limitsEmploy a data structure optimized for space (e.g., a circular buffer) and consider aggregation techniques or sampling if necessary to manage high hit rates.
getHits() is called with a timestamp that is much earlier than any recorded hitsReturn 0 in `getHits()` when the provided timestamp is earlier than any recorded hits, as no hits could have occurred within the last 5 minutes from that point.
Integer overflow in timestamp representation or hit countsUse appropriate data types (e.g., long) to prevent integer overflows when storing timestamps and hit counts, and consider using a big integer library if necessary for extremely large values.
getHits() and hit() are called concurrently from multiple threadsUse thread-safe data structures and synchronization mechanisms (e.g., mutexes, locks) to ensure data consistency and prevent race conditions when `getHits()` and `hit()` are accessed concurrently.
The range of valid timestamps is very large leading to huge memory consumptionUse circular queue or optimized data structure such as hash map with expiration to limit the data held in memory for better scalability.
No hits have been recorded yetReturn 0 in `getHits()` when no hits have been recorded, which is the initial state of the HitCounter.