Taro Logo

Time Based Key-Value Store

Medium
Netflix logo
Netflix
9 views
Topics:
ArraysBinary Search

Design a time-based key-value data structure that can store multiple values for the same key at different time stamps and retrieve the key's value at a certain timestamp.

Implement the TimeMap class:

  • TimeMap() Initializes the object of the data structure.
  • void set(String key, String value, int timestamp) Stores the key key with the value value at the given time timestamp.
  • String get(String key, int timestamp) Returns a value such that set was called previously, with timestamp_prev <= timestamp. If there are multiple such values, it returns the value associated with the largest timestamp_prev. If there are no values, it returns "".

Example 1:

Input
["TimeMap", "set", "get", "get", "set", "get", "get"]
[[], ["foo", "bar", 1], ["foo", 1], ["foo", 3], ["foo", "bar2", 4], ["foo", 4], ["foo", 5]]
Output
[null, null, "bar", "bar", null, "bar2", "bar2"]

Explanation
TimeMap timeMap = new TimeMap();
timeMap.set("foo", "bar", 1);  // store the key "foo" and value "bar" along with timestamp = 1.
timeMap.get("foo", 1);         // return "bar"
timeMap.get("foo", 3);         // return "bar", since there is no value corresponding to foo at timestamp 3 and timestamp 2, then the only value is at timestamp 1 is "bar".
timeMap.set("foo", "bar2", 4); // store the key "foo" and value "bar2" along with timestamp = 4.
timeMap.get("foo", 4);         // return "bar2"
timeMap.get("foo", 5);         // return "bar2"

Constraints:

  • 1 <= key.length, value.length <= 100
  • key and value consist of lowercase English letters and digits.
  • 1 <= timestamp <= 107
  • All the timestamps timestamp of set are strictly increasing.
  • At most 2 * 105 calls will be made to set and get.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible data types and ranges for the key, value, and timestamp? Specifically, can the timestamp be negative, zero, or a float?
  2. If multiple values exist for a key with timestamps less than or equal to the target timestamp in the get operation, which value should I return (e.g., the one with the largest timestamp that is still less than or equal to the target)?
  3. What should I return if the `set` operation is called with the same key and timestamp multiple times? Should the latest value overwrite previous ones?
  4. How frequently will the `set` and `get` operations be called, and what is the expected number of unique keys? This would help me understand the scale of data I need to store.
  5. Can the key or value be null or an empty string?

Brute Force Solution

Approach

The brute force approach is the simplest way to solve this problem. It involves checking every possible timestamp to find the one that's closest to, but not after, the given timestamp.

Here's how the algorithm would work step-by-step:

  1. When you need to store a value with a specific timestamp, just save the value and its timestamp together.
  2. When you need to retrieve a value at a given timestamp, go through all the stored values.
  3. For each stored value, check if its timestamp is before or equal to the timestamp you're looking for.
  4. If it is, remember that value. If it's after, ignore it.
  5. Once you've checked all the stored values, return the value you remembered that had the latest (closest) timestamp that was still before or equal to the timestamp you were given. If no value was found, return empty string.

Code Implementation

class TimeMap:

    def __init__(self):
        self.key_value_store = {}

    def set(self, key, value, timestamp):
        if key not in self.key_value_store:
            self.key_value_store[key] = []
        self.key_value_store[key].append((value, timestamp))

    def get(self, key, timestamp):
        if key not in self.key_value_store:
            return ""

        closest_value = ""
        # Iterate through all the values to find the closest one.
        for stored_value, stored_timestamp in self.key_value_store[key]:

            #Check to see if stored timestamp is less than or equal to
            if stored_timestamp <= timestamp:
                closest_value = stored_value

        # Return the closest value or empty string.
        return closest_value

Big(O) Analysis

Time Complexity
O(n)The set operation takes O(1) time as it simply appends the key-value pair along with its timestamp to a list. The get operation iterates through all stored key-value pairs to find the closest timestamp. In the worst case, the get operation might have to iterate through all 'n' stored entries, leading to a linear time complexity.
Space Complexity
O(N)The brute force approach stores each key-value pair along with its timestamp. In the worst-case scenario, we store N such pairs, where N is the number of set operations performed. Therefore, the space complexity is directly proportional to the number of stored key-value pairs. Thus, the auxiliary space used is O(N).

Optimal Solution

Approach

The key to efficiently storing time-based data is leveraging the fact that the timestamps are ordered. We will store each key's values along with their timestamps, and when asked for a value at a specific time, we'll quickly find the closest timestamp without checking everything.

Here's how the algorithm would work step-by-step:

  1. When a key and value are given with a timestamp, store them together in a memory structure like a dictionary or map, organized by the key.
  2. For each key, store the values and their corresponding timestamps in a sorted manner, such as using a list or another sorted structure. This is because the timestamps will always come in ascending order.
  3. When asked for the value of a key at a specific timestamp, first find the correct key.
  4. Instead of going through all values, use a method like binary search to quickly find the closest timestamp that's less than or equal to the given timestamp.
  5. If you find an exact match, return the associated value immediately.
  6. If you don't find an exact match, return the value associated with the closest smaller timestamp you found with binary search. If there is no smaller timestamp, return an empty string, implying no value existed at that time.

Code Implementation

class TimeMap:

    def __init__(self):
        self.key_map = {}

    def set(self, key: str, value: str, timestamp: int) -> None:
        if key not in self.key_map:
            self.key_map[key] = []
        
        # Timestamps are always added in ascending order.
        self.key_map[key].append((timestamp, value))

    def get(self, key: str, timestamp: int) -> str:
        if key not in self.key_map:
            return ""

        time_value_pairs = self.key_map[key]

        # Binary search for the largest timestamp <= given timestamp.
        left_pointer, right_pointer = 0, len(time_value_pairs) - 1
        result = ""

        while left_pointer <= right_pointer:
            middle_pointer = (left_pointer + right_pointer) // 2
            
            if time_value_pairs[middle_pointer][0] <= timestamp:
                # This timestamp is valid so far, try to find a better one.
                result = time_value_pairs[middle_pointer][1]
                left_pointer = middle_pointer + 1
            else:
                # The timestamp is too large, move to the left half.
                right_pointer = middle_pointer - 1

        return result

Big(O) Analysis

Time Complexity
O(log n)The set operation, which inserts the value and timestamp into the store, takes O(1) time because appending to a list associated with a key in a dictionary is a constant-time operation. The get operation, which retrieves a value at a specific timestamp, performs a binary search on the list of timestamps associated with the key. The binary search operation takes O(log n) time, where n is the number of timestamps stored for that particular key. Therefore, the overall time complexity for the get operation is O(log n).
Space Complexity
O(N)The primary auxiliary space is used to store the key-value pairs along with their timestamps. The dictionary or map stores keys, and each key maps to a list of (timestamp, value) pairs. In the worst case, we might store N key-value pairs with their timestamps, where N is the total number of set operations. Therefore, the space complexity is proportional to the number of stored entries, leading to O(N) auxiliary space.

Edge Cases

CaseHow to Handle
Null or empty key input to set or getReturn an empty string for get and do nothing for set or throw an IllegalArgumentException.
Null or empty value input to setStore an empty string for the value, or throw an IllegalArgumentException based on the specific problem requirements.
Zero or negative timestamp in set or getTreat zero as a valid timestamp and negative timestamps should be invalid, throwing an IllegalArgumentException.
Very large timestamp in set or get (potential overflow)Ensure the timestamp data type (e.g., long, int) can handle the magnitude of timestamps, or throw an exception if overflow occurs.
Multiple set operations for the same key and timestampOverwrite the previous value for that key and timestamp, which is the expected behavior.
Get operation with a timestamp smaller than any timestamp for a given keyReturn an empty string since no value exists at or before that timestamp.
Large number of set operations for the same key, requiring efficient search during getUse a data structure like a sorted list or binary search tree to maintain timestamps and values for efficient searching (e.g., binary search).
Get operation requests exact match for a timestamp that doesn't exist, but other values with same key existReturn the value associated with the greatest timestamp that is less than the query timestamp.