Let's design a hit counter system. Imagine you have a website and you want to track the number of hits (requests) it receives within the last 5 minutes.
Your task is to implement a HitCounter
class with the following methods:
HitCounter()
: Initializes the object with an empty hit counter.void hit(int timestamp)
: Records a hit (request) at a specific timestamp
. The timestamp
represents the time in seconds from the start of the system (e.g., the Unix epoch). You can assume that timestamps are always increasing.int getHits(int timestamp)
: Returns the number of hits recorded within the last 5 minutes (300 seconds) from the given timestamp
. In other words, it should return the number of hits with timestamps in the range [timestamp - 300, timestamp]
.Example:
HitCounter hitCounter = new HitCounter();
// a hit at timestamp = 1.
hitCounter.hit(1);
// a hit at timestamp = 2.
hitCounter.hit(2);
// a hit at timestamp = 3.
hitCounter.hit(3);
// getHits(4) should return 3 (All hits are within the last 5 minutes)
hitCounter.getHits(4);
// a hit at timestamp = 300.
hitCounter.hit(300);
// getHits(300) should return 4.
hitCounter.getHits(300);
// a hit at timestamp = 301.
hitCounter.hit(301);
// getHits(301) should return 4.
hitCounter.getHits(301);
// getHits(600) should return 1, as only the hit at timestamp 301 is within the last 5 minutes.
hitCounter.getHits(600);
How would you implement the HitCounter
class to efficiently handle a large number of hits and frequent calls to the getHits
method? Consider the time and space complexity of your solution. What edge cases might arise, and how would you address them?
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to track website hits over time. The brute force way is to simply remember every single hit that happens, regardless of how old it is, and then when asked for the hit count, go through all the hits and filter them.
Here's how the algorithm would work step-by-step:
class HitCounter:
def __init__(self):
self.hit_timestamps = []
def hit(self, timestamp: int) -> None:
# Store the timestamp of the new hit.
self.hit_timestamps.append(timestamp)
def getHits(self, timestamp: int) -> int:
hits_in_last_300_seconds = 0
# Iterate through all recorded timestamps.
for hit_timestamp in self.hit_timestamps:
# Check if the hit occurred within the last 300 seconds.
if timestamp - hit_timestamp < 300:
hits_in_last_300_seconds += 1
return hits_in_last_300_seconds
The best way to count recent hits is to use a data structure that automatically forgets old information. We want to quickly add new hits and efficiently count only the hits that have occurred within the last five minutes.
Here's how the algorithm would work step-by-step:
class HitCounter:
def __init__(self):
self.hit_timestamps = []
def hit(self, timestamp: int) -> None:
self.hit_timestamps.append(timestamp)
def getHits(self, timestamp: int) -> int:
# Remove outdated timestamps from the list.
while self.hit_timestamps and self.hit_timestamps[0] <= timestamp - 300:
self.hit_timestamps.pop(0)
# The remaining timestamps represent hits in the last 5 minutes.
return len(self.hit_timestamps)
# Your HitCounter object will be instantiated and called as such:
# hit_counter = HitCounter()
# hit_counter.hit(timestamp)
# hits = hit_counter.getHits(timestamp)
Case | How to Handle |
---|---|
Timestamp is non-positive (zero or negative) | Treat non-positive timestamps as invalid inputs and either throw an exception, ignore the hit, or assume it's a very old event (timestamp 0), depending on requirements. |
Timestamp is smaller than the previous timestamp in hit() | Handle out-of-order timestamps gracefully by either ignoring the hit, throwing an exception to indicate invalid input, or adjusting the internal data structure to maintain chronological order as best as possible. |
Large number of hits within a very short time frame, potentially exceeding memory or computational limits | Employ a data structure optimized for space (e.g., a circular buffer) and consider aggregation techniques or sampling if necessary to manage high hit rates. |
getHits() is called with a timestamp that is much earlier than any recorded hits | Return 0 in `getHits()` when the provided timestamp is earlier than any recorded hits, as no hits could have occurred within the last 5 minutes from that point. |
Integer overflow in timestamp representation or hit counts | Use appropriate data types (e.g., long) to prevent integer overflows when storing timestamps and hit counts, and consider using a big integer library if necessary for extremely large values. |
getHits() and hit() are called concurrently from multiple threads | Use thread-safe data structures and synchronization mechanisms (e.g., mutexes, locks) to ensure data consistency and prevent race conditions when `getHits()` and `hit()` are accessed concurrently. |
The range of valid timestamps is very large leading to huge memory consumption | Use circular queue or optimized data structure such as hash map with expiration to limit the data held in memory for better scalability. |
No hits have been recorded yet | Return 0 in `getHits()` when no hits have been recorded, which is the initial state of the HitCounter. |