You are given several logs that each log contains a unique id and timestamp. Timestamp is a string that has the following format: Year:Month:Day:Hour:Minute:Second
, for example, 2017:01:01:23:59:59
. All domains are zero-padded decimal numbers.
Design a log storage system to implement the following functions:
void Put(int id, string timestamp)
: Stores the given log along with its timestamp.
int[] Retrieve(String start, String end, String granularity)
: Returns the id of logs whose timestamp is within the range from start
to end
. start
and end
all have the same format as timestamp. However, granularity
means the time level for consideration. For example, start = 2017:01:01:23:59:59
, end = 2017:01:02:23:59:59
, granularity = Day
, it means that we need to find the logs within the range from 2017:01:01:00:00:00
to 2017:01:02:00:00:00
.
There exist five values for granularity
:
Year
Month
Day
Hour
Minute
Second
Example:
LogSystem logSystem = new LogSystem(); logSystem.put(1, "2017:01:01:23:59:59"); logSystem.put(2, "2017:01:01:22:59:59"); logSystem.put(3, "2016:01:01:00:00:00"); logSystem.retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Year"); // returns [1,2,3], because you need to return all logs within year 2016 and 2017. logSystem.retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Hour"); // returns [1,2], because you need to return all logs within the range from 2016:01:01:01 to 2017:01:01:23, both end points are inclusive.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The most straightforward way to store and retrieve logs is to keep everything in one place and look through it whenever we need something. The brute force approach does exactly that – it avoids any fancy organization and searches all logs directly.
Here's how the algorithm would work step-by-step:
def retrieve_logs_brute_force(logs, start_time, end_time):
retrieved_logs = []
# Iterate through each log entry.
for log_entry in logs:
# Check if the timestamp falls within the range.
if start_time <= log_entry['timestamp'] <= end_time:
# Append logs to results.
retrieved_logs.append(log_entry)
return retrieved_logs
The best way to store and quickly search logs is to organize them based on time. We use a system of breaking logs into time-based chunks so we can skip over large portions of logs that are irrelevant to our search.
Here's how the algorithm would work step-by-step:
class LogStorageSystem:
def __init__(self, chunk_size_seconds=3600):
self.chunk_size_seconds = chunk_size_seconds
self.log_chunks = []
self.log_metadata = []
def store_log(self, timestamp, log_message):
chunk_index = timestamp // self.chunk_size_seconds
if not self.log_metadata or self.log_metadata[-1]['end_timestamp'] <=\
timestamp:
# Create new chunk when the timestamp is outside current range.
self.log_metadata.append({
'start_timestamp': timestamp,
'end_timestamp': timestamp + self.chunk_size_seconds -1,
'chunk_index': chunk_index
})
self.log_chunks.append([{"timestamp": timestamp, "message": log_message}])
else:
# Append log to the existing chunk.
self.log_chunks[-1].append([{"timestamp": timestamp, "message": log_message}])
def search_logs(self, start_timestamp, end_timestamp):
relevant_chunks = self._find_relevant_chunks(start_timestamp, end_timestamp)
results = []
# Optimization: Only search relevant chunks.
for chunk_index in relevant_chunks:
for log_entry in self.log_chunks[chunk_index]:
if start_timestamp <= log_entry[0]['timestamp'] <= end_timestamp:
results.append(log_entry[0])
return results
def _find_relevant_chunks(self, start_timestamp, end_timestamp):
relevant_chunks = []
# Linear scan for simplicity. Can be optimized with binary search.
for index, metadata in enumerate(self.log_metadata):
if (start_timestamp <= metadata['end_timestamp'] and\
end_timestamp >= metadata['start_timestamp']):
relevant_chunks.append(index)
return relevant_chunks
Case | How to Handle |
---|---|
Null or empty log ID string | Reject null or empty strings for log IDs and return an error or exception to maintain system integrity. |
Null or empty timestamp string | Reject null or empty strings for timestamps and return an error or exception as timestamps are essential for retrieval. |
Invalid timestamp format (e.g., missing components, incorrect separators, non-numeric values) | Implement robust timestamp validation to ensure the format is correct and handle invalid formats gracefully with an error. |
Start timestamp is later than end timestamp | Return an empty list or throw an exception if the start timestamp is after the end timestamp as the range is invalid. |
Unsupported granularity level (e.g., 'Millisecond') | Return an error or exception if an unsupported granularity level is provided during retrieval. |
Large number of logs causing memory issues | Consider using a database or other persistent storage mechanism for scalability to avoid in-memory limitations. |
Timestamps with extreme values (e.g., year 0001 or 9999) | Ensure the parsing and comparison logic handles timestamps representing years near the minimum and maximum allowed dates. |
Granularity is 'Second' but the start and end timestamps only differ by milliseconds. | Round down the timestamps to the nearest second to ensure inclusivity in the specified range. |