Taro Logo

LFU Cache

Hard
Rippling logo
Rippling
0 views
Topics:
ArraysLinked ListsDesign

Design and implement a data structure for a Least Frequently Used (LFU) cache.

Implement the LFUCache class:

  • LFUCache(int capacity) Initializes the object with the capacity of the data structure.
  • int get(int key) Gets the value of the key if the key exists in the cache. Otherwise, returns -1.
  • void put(int key, int value) Update the value of the key if present, or inserts the key if not already present. When the cache reaches its capacity, it should invalidate and remove the least frequently used key before inserting a new item. For this problem, when there is a tie (i.e., two or more keys with the same frequency), the least recently used key would be invalidated.

To determine the least frequently used key, a use counter is maintained for each key in the cache. The key with the smallest use counter is the least frequently used key.

When a key is first inserted into the cache, its use counter is set to 1 (due to the put operation). The use counter for a key in the cache is incremented either a get or put operation is called on it.

The functions get and put must each run in O(1) average time complexity.

Example 1:

Input
["LFUCache", "put", "put", "get", "put", "get", "get", "put", "get", "get", "get"]
[[2], [1, 1], [2, 2], [1], [3, 3], [2], [3], [4, 4], [1], [3], [4]]
Output
[null, null, null, 1, null, -1, 3, null, -1, 3, 4]

Explanation
// cnt(x) = the use counter for key x
// cache=[] will show the last used order for tiebreakers (leftmost element is  most recent)
LFUCache lfu = new LFUCache(2);
lfu.put(1, 1);   // cache=[1,_], cnt(1)=1
lfu.put(2, 2);   // cache=[2,1], cnt(2)=1, cnt(1)=1
lfu.get(1);      // return 1
                 // cache=[1,2], cnt(2)=1, cnt(1)=2
lfu.put(3, 3);   // 2 is the LFU key because cnt(2)=1 is the smallest, invalidate 2.
                 // cache=[3,1], cnt(3)=1, cnt(1)=2
lfu.get(2);      // return -1 (not found)
lfu.get(3);      // return 3
                 // cache=[3,1], cnt(3)=2, cnt(1)=2
lfu.put(4, 4);   // Both 1 and 3 have the same cnt, but 1 is LRU, invalidate 1.
                 // cache=[4,3], cnt(4)=1, cnt(3)=2
lfu.get(1);      // return -1 (not found)
lfu.get(3);      // return 3
                 // cache=[3,4], cnt(4)=1, cnt(3)=3
lfu.get(4);      // return 4
                 // cache=[4,3], cnt(4)=2, cnt(3)=3

Constraints:

  • 1 <= capacity <= 104
  • 0 <= key <= 105
  • 0 <= value <= 109
  • At most 2 * 105 calls will be made to get and put.
 

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible ranges for the key and value, and can they be negative or zero?
  2. What is the expected behavior if the capacity is zero or negative? Should I throw an exception or treat it as an empty cache?
  3. If there are multiple keys with the same lowest frequency, how do I determine the least recently used key among them (specifically, what information do I need to store or track)?
  4. If I call `get` with a non-existent key, should the frequency of the cache be updated in any way?
  5. Can I assume that the capacity will always be an integer, and will I need to handle any potential memory limitations or overflow issues related to the cache size or frequency counters?

Brute Force Solution

Approach

The LFU cache stores key-value pairs and needs to evict the least frequently used one when it's full. The brute force approach to this problem involves keeping track of every single access and, when needed, going through all items to find the least used one.

Here's how the algorithm would work step-by-step:

  1. When a value is requested, record that it was requested.
  2. When a new key-value pair needs to be added but the cache is full, examine the usage count of all currently stored key-value pairs.
  3. Identify the key-value pair with the lowest usage count.
  4. If multiple key-value pairs have the same lowest usage count, pick one of them to remove based on which was accessed the earliest.
  5. Remove the selected key-value pair to make space for the new one.
  6. Store the new key-value pair in the cache.

Code Implementation

class LFUCache:

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}
        self.usage_count = {}
        self.access_order = {}
        self.current_time = 0

    def get(self, key: int) -> int:
        if key in self.cache:
            self.usage_count[key] += 1
            self.access_order[key] = self.current_time
            self.current_time += 1
            return self.cache[key]
        return -1

    def put(self, key: int, value: int) -> None:
        self.current_time += 1
        if self.capacity == 0:
            return

        if key in self.cache:
            self.cache[key] = value
            self.usage_count[key] += 1
            self.access_order[key] = self.current_time
            return

        if len(self.cache) == self.capacity:
            # Need to evict the least frequently used item
            least_frequent = float('inf')
            key_to_evict = None

            for cache_key, frequency in self.usage_count.items():
                if frequency < least_frequent and cache_key in self.cache:
                    least_frequent = frequency
                    key_to_evict = cache_key
                elif frequency == least_frequent and cache_key in self.cache:
                    # Break ties based on access order
                    if self.access_order[cache_key] < self.access_order[key_to_evict]:
                        key_to_evict = cache_key

            del self.cache[key_to_evict]
            del self.usage_count[key_to_evict]
            del self.access_order[key_to_evict]

        self.cache[key] = value
        self.usage_count[key] = 1
        self.access_order[key] = self.current_time

Big(O) Analysis

Time Complexity
O(n) – The get operation has O(1) time complexity since we record access in constant time. When adding a new element and the cache is full, we need to iterate through all n items currently in the cache to determine the least frequently used item. This iteration to find the minimum frequency takes O(n) time. All other operations like adding or removing are O(1). Therefore, the overall complexity is dominated by the iteration to find the least frequent item.
Space Complexity
O(N) – The described solution stores key-value pairs along with their usage counts. In the worst-case scenario, the cache could hold N key-value pairs, where N is the maximum capacity of the cache. Therefore, we need to maintain auxiliary space proportional to the number of items in the cache for storing the key-value pairs and their associated usage counts. This leads to a space complexity of O(N).

Optimal Solution

Approach

We want to create a special storage system that quickly finds and removes items used least often. We'll track how frequently each item is used, and when we run out of space, we'll remove the least frequently used one.

Here's how the algorithm would work step-by-step:

  1. Keep track of two things for each item: its value and how many times it has been used.
  2. Also, keep a record of the different usage counts. For example, we need to know which items have been used once, which have been used twice, and so on.
  3. When we need to get an item, we look it up and increase its usage count. Then we move it to the correct group based on its new, higher usage count.
  4. When we need to add a new item and the storage is full, we find the group of items with the lowest usage count.
  5. From that group of items with the lowest usage count, we remove the item that was added to the storage system the earliest.
  6. Then, we add the new item to the group with a usage count of one.
  7. By keeping items grouped by usage count and tracking the order they were added, we can quickly find the least frequently used item when we need to make space.

Code Implementation

class LFUCache:

def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}
        self.frequencyMap = {}
        self.frequencyListHeads = {}
        self.leastFrequency = 1

def get(self, key):
        if key not in self.cache:
            return -1
        
        node = self.cache[key]
        self._update_frequency(node)
        return node.value

def put(self, key, value):
        if self.capacity == 0:
            return

        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._update_frequency(node)
            return

        if len(self.cache) == self.capacity:
            # Remove the least frequently used item when cache is full.
            frequencyList = self.frequencyListHeads[self.leastFrequency]
            removedNode = frequencyList.remove_tail()
            del self.cache[removedNode.key]

        newNode = Node(key, value)
        self.cache[key] = newNode
        self.leastFrequency = 1

        if 1 not in self.frequencyListHeads:
            self.frequencyListHeads[1] = DoublyLinkedList()

        self.frequencyListHeads[1].add_node(newNode)
        self.frequencyMap[key] = 1

def _update_frequency(self, node):
        key = node.key
        currentFrequency = self.frequencyMap[key]
        newFrequency = currentFrequency + 1

        # Remove the node from its current frequency list.
        self.frequencyListHeads[currentFrequency].remove_node(node)
        if self.leastFrequency == currentFrequency and self.frequencyListHeads[currentFrequency].size == 0:
            self.leastFrequency += 1

        # Add the node to the new frequency list.
        if newFrequency not in self.frequencyListHeads:
            self.frequencyListHeads[newFrequency] = DoublyLinkedList()

        self.frequencyListHeads[newFrequency].add_node(node)
        self.frequencyMap[key] = newFrequency

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.frequency = 0
        self.next = None
        self.prev = None

class DoublyLinkedList:
    def __init__(self):
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head
        self.size = 0

    def add_node(self, node):
        # Add the node right after head.
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node
        self.size += 1

    def remove_node(self, node):
        # Remove the node from the list.
        node.prev.next = node.next
        node.next.prev = node.prev
        self.size -= 1

    def remove_tail(self):
        # Remove the tail node, which is the LRU node for the frequency.
        tailNode = self.tail.prev
        self.remove_node(tailNode)
        return tailNode

Big(O) Analysis

Time Complexity
O(1) – The LFU cache operations described, get and put, rely on hash maps (dictionaries) and doubly linked lists. Hash map lookups and insertions are O(1) on average. Moving elements between frequency lists in the doubly linked list is also O(1). Therefore, the time complexity of both get and put operations is O(1), assuming hash collisions are minimal. Note that 'n' in this context would represent the cache capacity, but the operations are designed to be independent of the cache size.
Space Complexity
O(N) – The solution uses a hash map to store the value and usage count for each item, which can grow up to the cache's capacity N. It also maintains a hash map to group items by their usage counts; in the worst case, each item has a unique usage count, leading to N entries. Therefore, the auxiliary space used is proportional to the cache's capacity, N, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Capacity is zero.All get and put operations should return -1 or do nothing respectively, as nothing can be stored.
Key already exists with the same value in put.Frequency should still be incremented, but no insertion should happen since the value is the same.
Putting a new key when the cache is full and multiple keys have the same minimum frequency.Evict the least recently used among the keys with the minimum frequency.
Getting a key that does not exist.Return -1 and do not modify frequencies.
Integer overflow when incrementing frequency.Frequencies should be large enough to prevent overflow and potentially reset when approaching max values
Cache filled, LFU eviction required, min frequency is higher than 1 (e.g., 2 or 3).Correctly identify and evict keys with frequency equal to the current minimum frequency.
Repeatedly putting and getting the same key leading to a very high frequencyThe LFU cache needs to efficiently update the frequency list when a key's frequency becomes very high.
Extremely large number of unique keys are put in the cache.Data structures should support efficient addition and removal of keys without excessive memory overhead.