Taro Logo

Time Based Key-Value Store

Medium
Meta logo
Meta
2 views
Topics:
ArraysBinary SearchStrings

Design a time-based key-value data structure that can store multiple values for the same key at different timestamps and retrieve the key's value at a certain timestamp.

Implement the TimeMap class:

  • TimeMap() Initializes the object of the data structure.
  • void set(String key, String value, int timestamp) Stores the key key with the value value at the given time timestamp.
  • String get(String key, int timestamp) Returns a value such that set was called previously, with timestamp_prev <= timestamp. If there are multiple such values, it returns the value associated with the largest timestamp_prev. If there are no values, it returns "".

Example:

Input
["TimeMap", "set", "get", "get", "set", "get", "get"]
[[], ["foo", "bar", 1], ["foo", 1], ["foo", 3], ["foo", "bar2", 4], ["foo", 4], ["foo", 5]]

What data structures would you use, and how would you implement the set and get methods? What is the time complexity for each method? Consider edge cases such as when the key does not exist or when the timestamp is smaller than all timestamps for a given key. How can you optimize the get method if the timestamps for each key are strictly increasing?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible ranges for the timestamps, and can they be negative or zero?
  2. If multiple values exist for a given key with timestamps less than or equal to the target timestamp, should I return the value with the *largest* timestamp, or is there another preference?
  3. What is the maximum length of the key and value strings?
  4. If I call `get` with a key that has never been `set` before, what should I return?
  5. How frequently will `set` and `get` be called relative to each other? Will there be significantly more calls to one than the other?

Brute Force Solution

Approach

Think of this problem like having a journal where you record values with timestamps. The brute force way to retrieve a value at a specific time is to look at every single entry in your journal.

Here's how the algorithm would work step-by-step:

  1. When someone asks for the value at a particular time, go through each entry in the journal, one by one.
  2. For each entry, compare its timestamp to the time the person asked about.
  3. If the entry's timestamp is earlier than or equal to the time asked about, remember the value of that entry.
  4. After checking every entry, if you remembered any values, find the one with the latest (most recent) timestamp that's still before or equal to the time asked about.
  5. If you didn't remember any values, it means there's no value available for that time, so return nothing.

Code Implementation

class TimeMap:

    def __init__(self):
        self.time_value_journal = []

    def set(self, key: str, value: str, timestamp: int) -> None:
        self.time_value_journal.append((key, value, timestamp))

    def get(self, key: str, timestamp: int) -> str:
        # This will store the closest value found so far
        closest_value = ""
        
        #Iterate through all the key, value, timestamp sets
        for journal_key, journal_value, journal_timestamp in self.time_value_journal:
            # Check if the keys match
            if journal_key == key:
                #We need to find the largest timestamp <= given timestamp
                if journal_timestamp <= timestamp:

                    closest_value = journal_value

        return closest_value

Big(O) Analysis

Time Complexity
O(n)The set operation takes constant time, O(1), as it simply appends a key-value pair to a list associated with the given key. The get operation iterates through all the timestamps associated with a key, comparing each timestamp to the requested timestamp. In the worst-case scenario, we may need to iterate through all n timestamps for a given key, leading to O(n) time complexity for the get operation. Therefore, the dominant operation is the linear scan through the timestamps in the get function.
Space Complexity
O(1)The described brute force approach iterates through the entries in the time-based key-value store, comparing timestamps. The algorithm only remembers the most recent value that satisfies the condition, which requires storing at most one value and its corresponding timestamp temporarily. The space used for these temporary variables (value and timestamp) remains constant regardless of the number of entries (N) in the key-value store. Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The optimal approach uses a smart way to quickly find the right timestamp. Instead of checking every single timestamp, we can use a special technique to narrow down our search to the most relevant ones.

Here's how the algorithm would work step-by-step:

  1. When storing the values, keep all versions for a specific key together, in order of their timestamps from oldest to newest.
  2. When you need to retrieve a value, you're looking for the most recent version that's no later than the given timestamp.
  3. Think of it like searching a sorted list. Instead of checking each timestamp one by one, start in the middle.
  4. If the timestamp in the middle is too old, you know the answer must be in the newer half of the timestamps.
  5. If the timestamp in the middle is too new, you know the answer must be in the older half.
  6. Keep cutting the search area in half until you find the closest timestamp that works. That's your answer!

Code Implementation

class TimeMap:

    def __init__(self):
        self.key_value_store = {}

    def set(self, key, value, timestamp):
        if key not in self.key_value_store:
            self.key_value_store[key] = []
        self.key_value_store[key].append((timestamp, value))

    def get(self, key, timestamp):
        if key not in self.key_value_store:
            return ""

        time_value_pairs = self.key_value_store[key]

        # Initialize binary search boundaries
        left_pointer, right_pointer = 0, len(time_value_pairs) - 1
        result = ""

        while left_pointer <= right_pointer:
            middle_pointer = (left_pointer + right_pointer) // 2
            middle_timestamp, middle_value = time_value_pairs[middle_pointer]

            # If the middle timestamp is a match or earlier, update
            if middle_timestamp <= timestamp:
                # This timestamp is valid, so update the result
                result = middle_value

                # Search for potentially better (later) timestamps
                left_pointer = middle_pointer + 1

            else:
                # The timestamp is too high, search in the left half
                right_pointer = middle_pointer - 1

        return result

Big(O) Analysis

Time Complexity
O(log n)The set operation takes O(1) time as it involves a simple append to a list. The get operation uses binary search on the list of timestamps associated with a key. Binary search iteratively halves the search space until the target timestamp is found, or the closest smaller timestamp is identified. The number of iterations required is logarithmic with respect to the number of timestamps (n) for that key, resulting in O(log n) time complexity.
Space Complexity
O(N)The primary auxiliary space usage stems from storing the key-value pairs and their associated timestamps. The explanation mentions keeping all versions for a specific key together, which implies storing multiple values for each key. In the worst-case scenario, each set operation could store a new timestamped value for a key. Therefore, if we have N put operations with distinct timestamps for various keys, the space required to store these versions grows linearly with N. Consequently, the auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or empty key in set operationThrow an IllegalArgumentException or return an error code to indicate invalid input.
Null or empty key in get operationReturn an empty string as specified in the problem description when the key does not exist.
Negative timestamp in set operationThrow an IllegalArgumentException since timestamp is generally a non-negative integer representing time.
Zero timestamp in set operationHandle timestamp zero as a valid time and store accordingly if the requirements don't explicitly disallow it.
Timestamp already exists for the same key in set operationOverwrite the previous value with the new value for the given key and timestamp since the latest entry prevails.
Timestamp in get operation is smaller than all existing timestamps for the given keyReturn an empty string as specified in the problem since no valid value exists.
Large number of set operations causing memory issuesConsider using techniques like data compression or database persistence to handle large datasets exceeding memory capacity.
Maximum integer value for timestampEnsure the code handles the maximum integer value of timestamp correctly to avoid integer overflow issues during comparisons.