Let's design a hit counter system. This system will be used to track the number of hits to a popular website. You need to implement the following:
HitCounter()
: Initialize the object with an empty hit record.hit(int timestamp)
: Records a hit (can assume that each timestamp is unique). Each hit is done at a specific timestamp (in seconds granularity). It is guaranteed that timestamp is always increasing.getHits(int timestamp)
: Returns the number of hits in the past 300 seconds. getHits(timestamp) will return the number of hits that have occurred in the inclusive range [timestamp - 300, timestamp]
.Your solution should be efficient in terms of both time and space complexity. The timestamp is in seconds. Can you implement this efficiently?
For example:
HitCounter counter = new HitCounter();
counter.hit(1); // hit at timestamp 1
counter.hit(2); // hit at timestamp 2
counter.hit(3); // hit at timestamp 3
counter.getHits(4); // returns 3
counter.hit(300); // hit at timestamp 300
counter.getHits(300); // returns 4
counter.getHits(301); // returns 4
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to track website hits over time. The brute force way is to simply remember every single hit that happens, regardless of how old it is, and then when asked for the hit count, go through all the hits and filter them.
Here's how the algorithm would work step-by-step:
class HitCounter:
def __init__(self):
self.hit_timestamps = []
def hit(self, timestamp: int) -> None:
# Store the timestamp of the new hit.
self.hit_timestamps.append(timestamp)
def getHits(self, timestamp: int) -> int:
hits_in_last_300_seconds = 0
# Iterate through all recorded timestamps.
for hit_timestamp in self.hit_timestamps:
# Check if the hit occurred within the last 300 seconds.
if timestamp - hit_timestamp < 300:
hits_in_last_300_seconds += 1
return hits_in_last_300_seconds
The best way to count recent hits is to use a data structure that automatically forgets old information. We want to quickly add new hits and efficiently count only the hits that have occurred within the last five minutes.
Here's how the algorithm would work step-by-step:
class HitCounter:
def __init__(self):
self.hit_timestamps = []
def hit(self, timestamp: int) -> None:
self.hit_timestamps.append(timestamp)
def getHits(self, timestamp: int) -> int:
# Remove outdated timestamps from the list.
while self.hit_timestamps and self.hit_timestamps[0] <= timestamp - 300:
self.hit_timestamps.pop(0)
# The remaining timestamps represent hits in the last 5 minutes.
return len(self.hit_timestamps)
# Your HitCounter object will be instantiated and called as such:
# hit_counter = HitCounter()
# hit_counter.hit(timestamp)
# hits = hit_counter.getHits(timestamp)
Case | How to Handle |
---|---|
Timestamp is non-positive (zero or negative) | Treat non-positive timestamps as invalid inputs and either throw an exception, ignore the hit, or assume it's a very old event (timestamp 0), depending on requirements. |
Timestamp is smaller than the previous timestamp in hit() | Handle out-of-order timestamps gracefully by either ignoring the hit, throwing an exception to indicate invalid input, or adjusting the internal data structure to maintain chronological order as best as possible. |
Large number of hits within a very short time frame, potentially exceeding memory or computational limits | Employ a data structure optimized for space (e.g., a circular buffer) and consider aggregation techniques or sampling if necessary to manage high hit rates. |
getHits() is called with a timestamp that is much earlier than any recorded hits | Return 0 in `getHits()` when the provided timestamp is earlier than any recorded hits, as no hits could have occurred within the last 5 minutes from that point. |
Integer overflow in timestamp representation or hit counts | Use appropriate data types (e.g., long) to prevent integer overflows when storing timestamps and hit counts, and consider using a big integer library if necessary for extremely large values. |
getHits() and hit() are called concurrently from multiple threads | Use thread-safe data structures and synchronization mechanisms (e.g., mutexes, locks) to ensure data consistency and prevent race conditions when `getHits()` and `hit()` are accessed concurrently. |
The range of valid timestamps is very large leading to huge memory consumption | Use circular queue or optimized data structure such as hash map with expiration to limit the data held in memory for better scalability. |
No hits have been recorded yet | Return 0 in `getHits()` when no hits have been recorded, which is the initial state of the HitCounter. |