Taro Logo

LRU Cache

Medium
Walmart logo
Walmart
0 views
Topics:
Linked ListsArrays

Design a data structure that follows the constraints of a Least Recently Used (LRU) cache.

Implement the LRUCache class:

  • LRUCache(int capacity) Initialize the LRU cache with positive size capacity.
  • int get(int key) Return the value of the key if the key exists, otherwise return -1.
  • void put(int key, int value) Update the value of the key if the key exists. Otherwise, add the key-value pair to the cache. If the number of keys exceeds the capacity from this operation, evict the least recently used key.

The functions get and put must each run in O(1) average time complexity.

Example 1:

Input
["LRUCache", "put", "put", "get", "put", "get", "put", "get", "get", "get"]
[[2], [1, 1], [2, 2], [1], [3, 3], [2], [4, 4], [1], [3], [4]]
Output
[null, null, null, 1, null, -1, null, -1, 3, 4]

Explanation
LRUCache lRUCache = new LRUCache(2);
lRUCache.put(1, 1); // cache is {1=1}
lRUCache.put(2, 2); // cache is {1=1, 2=2}
lRUCache.get(1);    // return 1
lRUCache.put(3, 3); // LRU key was 2, evicts key 2, cache is {1=1, 3=3}
lRUCache.get(2);    // returns -1 (not found)
lRUCache.put(4, 4); // LRU key was 1, evicts key 1, cache is {4=4, 3=3}
lRUCache.get(1);    // return -1 (not found)
lRUCache.get(3);    // return 3
lRUCache.get(4);    // return 4

Constraints:

  • 1 <= capacity <= 3000
  • 0 <= key <= 104
  • 0 <= value <= 105
  • At most 2 * 105 calls will be made to get and put.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible values for the keys and values that will be stored in the cache?
  2. What is the expected capacity of the LRU cache? Is it safe to assume it will always be a positive integer?
  3. What should happen if I try to `put` a key that already exists in the cache? Should I update the value and move it to the most recently used, or treat it as an error?
  4. If the key is not found in the `get` operation, should I return null instead of -1, or is -1 strictly the required return value?
  5. Is the cache expected to be thread-safe? If so, what kind of concurrency handling should be implemented?

Brute Force Solution

Approach

A Least Recently Used (LRU) cache stores a limited number of items. The brute force approach involves checking the entire history of item usage every time we need to retrieve or store something. This ensures we always have the absolute least recently used item to discard, but it requires significant effort each time.

Here's how the algorithm would work step-by-step:

  1. When we want to retrieve an item, we go through every item we have stored and manually check if any of them match the item we're looking for.
  2. If we find a match, we return the item.
  3. If we don't find the item, we return 'not found'.
  4. When we want to store an item, we first check if the cache is full by counting how many items are currently stored.
  5. If the cache is not full, we simply add the new item to the cache.
  6. If the cache is full, we need to find the least recently used item. To do this, we examine the entire history of every action we have ever taken: adding or accessing values.
  7. We find the item that was used the longest time ago from the entire history.
  8. We remove that least recently used item to make space for the new item, and then we add the new item to the cache.

Code Implementation

class LRUCacheBruteForce:

    def __init__(self, capacity):
        self.capacity = capacity
        self.cache_items = []
        self.usage_history = []

    def get(self, key):
        for item in self.cache_items:
            if item['key'] == key:
                self.usage_history.append(key)
                return item['value']
        return -1

    def put(self, key, value):
        # Check if the cache is full
        if len(self.cache_items) == self.capacity:
            # Find the least recently used item
            least_recent_key = self.find_least_recently_used()

            # Remove the least recently used item
            self.remove_item(least_recent_key)

        # Add the new item to the cache
        self.cache_items.append({'key': key, 'value': value})
        self.usage_history.append(key)

    def find_least_recently_used(self):
        if not self.usage_history:
            return None

        key_counts = {}
        for key in self.usage_history:
            key_counts[key] = key_counts.get(key, 0) + 1
        
        least_recent_key = None
        least_recent_index = float('inf')

        # Iterate through cache to find the least recently used.
        for cache_item in self.cache_items:
            try:
                first_occurrence = self.usage_history.index(cache_item['key'])
            except ValueError:
                first_occurrence = float('inf')

            if first_occurrence < least_recent_index:
                least_recent_index = first_occurrence
                least_recent_key = cache_item['key']

        return least_recent_key

    def remove_item(self, key):
        # Find and remove from usage history and cache items
        self.cache_items = [item for item in self.cache_items if item['key'] != key]
        self.usage_history = [k for k in self.usage_history if k != key]

Big(O) Analysis

Time Complexity
O(n)The get operation iterates through all n elements in the cache to find a key, resulting in O(n) complexity. The put operation involves checking for cache fullness (O(1)), finding the least recently used item by examining a history of potentially n actions (O(n) to find the least recently used), and potentially removing/adding an item (O(1)). Since finding the least recently used item dominates when the cache is full, the put operation is also O(n). Therefore, the overall time complexity is O(n) for both get and put.
Space Complexity
O(1)The brute force approach described requires only a fixed number of variables to keep track of the cache's capacity and potentially a counter for iterating through the cache. No auxiliary data structures like lists, hash maps, or trees are created to store the history of accesses or cached items beyond the cache itself. Therefore, the auxiliary space used remains constant regardless of the cache size N. The algorithm's memory usage does not scale with the number of operations or the size of the data stored.

Optimal Solution

Approach

The challenge is to create a memory system that quickly remembers the most recently used items while forgetting the least used. The optimal approach combines a way to quickly find an item (like a dictionary) and a way to keep track of how recently things were used (like a line of people waiting).

Here's how the algorithm would work step-by-step:

  1. Use a system that lets you instantly find the item you're looking for, like using a key to open a specific locker.
  2. Use a 'line' to represent how recently items were used. The most recently used item goes to the front of the line.
  3. When you use an item, move it to the front of the line.
  4. If the memory is full and you need to add a new item, remove the item at the back of the line (the least recently used) to make space.
  5. Keep your 'line' and your 'locker' system in sync so you always know where to find things and what's been used most recently.
  6. This ensures that accessing, adding, and removing items happens very quickly, as you're not searching through everything, just adjusting the 'line' and 'locker'.

Code Implementation

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None
        self.prev = None

class LRUCache:

    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        # Remove node from linked list
        node.prev.next = node.next
        node.next.prev = node.prev

    def _add(self, node):
        # Add node to the head of linked list
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

    def get(self, key):
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)
            return node.value
        return -1

    def put(self, key, value):
        if key in self.cache:
            # If key exists, update and move to front
            node = self.cache[key]
            node.value = value
            self._remove(node)
            self._add(node)
        else:
            # Need to add a new node
            new_node = Node(key, value)
            self.cache[key] = new_node
            self._add(new_node)

            if len(self.cache) > self.capacity:
                # Remove the least recently used node
                tail_node = self.tail.prev
                self._remove(tail_node)
                del self.cache[tail_node.key]

Big(O) Analysis

Time Complexity
O(1)The LRU cache implementation uses a hash map (dictionary) for O(1) average-case time complexity for get and put operations. The doubly linked list operations for maintaining the order of recent use, such as moving a node to the head or removing the tail, also take O(1) time. Therefore, both get and put have constant time complexity, independent of the number of items stored (n). Thus, the time complexity of the described LRU cache operations is O(1).
Space Complexity
O(capacity)The LRU cache implementation utilizes a dictionary (hash map) to store key-value pairs for quick access and a doubly linked list to maintain the order of recently used items. The dictionary stores references to the nodes in the linked list. Both the dictionary and the linked list can grow up to the cache's capacity. Therefore, the auxiliary space used is proportional to the capacity of the cache, leading to a space complexity of O(capacity), where capacity is the maximum number of items the cache can hold.

Edge Cases

CaseHow to Handle
Cache capacity is zeroDisallow creation of cache with capacity zero or throw exception, since it cannot store any elements.
Inserting the same key multiple times with different valuesThe `put` operation should update the existing key's value and move it to the most recently used position.
Getting a key that doesn't existThe `get` operation must return -1 without modifying the cache state.
Cache is full and we try to put a new keyThe least recently used key should be evicted before adding the new key-value pair.
Cache is empty and we call get()Get should return -1 if the cache is empty.
Large number of get() and put() operationsEnsure the data structure scales efficiently (O(1) for both operations) even with a large number of operations, and avoids memory leaks.
Integer overflow when handling key or valueConsider using long data type instead of int if keys or values might exceed integer limits.
Evicting the head or tail of the doubly linked listEnsure the head and tail pointers are correctly updated when evicting either the head or tail of the linked list.