Taro Logo

Design HashMap

Easy
Two Sigma logo
Two Sigma
1 view
Topics:
Arrays

Design a HashMap without using any built-in hash table libraries.

Implement the MyHashMap class:

  • MyHashMap() initializes the object with an empty map.
  • void put(int key, int value) inserts a (key, value) pair into the HashMap. If the key already exists in the map, update the corresponding value.
  • int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
  • void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.

Example 1:

Input
["MyHashMap", "put", "put", "get", "get", "put", "get", "remove", "get"]
[[], [1, 1], [2, 2], [1], [3], [2, 1], [2], [2], [2]]
Output
[null, null, null, 1, -1, null, 1, null, -1]

Explanation
MyHashMap myHashMap = new MyHashMap();
myHashMap.put(1, 1); // The map is now [[1,1]]
myHashMap.put(2, 2); // The map is now [[1,1], [2,2]]
myHashMap.get(1);    // return 1, The map is now [[1,1], [2,2]]
myHashMap.get(3);    // return -1 (i.e., not found), The map is now [[1,1], [2,2]]
myHashMap.put(2, 1); // The map is now [[1,1], [2,1]] (i.e., update the existing value)
myHashMap.get(2);    // return 1, The map is now [[1,1], [2,1]]
myHashMap.remove(2); // remove the mapping for 2, The map is now [[1,1]]
myHashMap.get(2);    // return -1 (i.e., not found), The map is now [[1,1]]

Constraints:

  • 0 <= key, value <= 106
  • At most 104 calls will be made to put, get, and remove.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible data types and range for the keys and values that will be stored in the HashMap?
  2. How many key-value pairs do you expect the HashMap to store, and should I optimize for a specific load factor?
  3. If there's a collision (i.e., two keys hash to the same index), is it acceptable to use separate chaining with linked lists, or is there a preferred collision resolution strategy?
  4. If I try to `get` a key that doesn't exist, you've specified to return -1, but are we guaranteed that -1 will not be used as a valid value?
  5. Are we expected to handle null or empty keys, or are they explicitly disallowed?

Brute Force Solution

Approach

Imagine we're building a very basic phone book. The brute force way to manage this book is to keep a simple list where we store each name and number as it comes. When we need to find a number, we simply go through the entire list to see if the name exists.

Here's how the algorithm would work step-by-step:

  1. To add a new name and number, we just tack it on to the end of our list.
  2. To find a number, we start at the beginning of the list and check each name one by one until we find the one we are looking for.
  3. If we don't find the name after checking every entry, it means the name isn't in our phone book.
  4. To remove a name and number, we again check the entire list and once we find it, we remove that entire entry from the list.

Code Implementation

class MyHashMap:

    def __init__(self):
        self.phone_book = []

    def put(self, key: int, value: int) -> None:
        # Add a new entry to the end of the list.
        self.phone_book.append((key, value))

    def get(self, key: int) -> int:
        # Iterate through the list to find the key.
        for entry_key, entry_value in self.phone_book:
            if entry_key == key:
                return entry_value

        return -1

    def remove(self, key: int) -> None:
        # Find the key and remove the entry.
        for index, (entry_key, entry_value) in enumerate(self.phone_book):
            if entry_key == key:
                # Avoid index issues by deleting the element.
                del self.phone_book[index]
                return

Big(O) Analysis

Time Complexity
O(n)The 'put' operation simply appends to the end of the list, which takes constant time, O(1). The 'get' and 'remove' operations iterate through the list to find the key. In the worst case, we might have to iterate through all n elements in the list to find the key or determine it does not exist. Therefore, 'get' and 'remove' have a time complexity of O(n). The overall complexity is dominated by the linear search in 'get' and 'remove'.
Space Complexity
O(1)The brute force approach described involves storing a list of name-number pairs. The add operation appends to this list, the find operation iterates through it, and the remove operation searches and removes from it. However, the plain English explanation doesn't explicitly state that a *new* data structure of size N is created as auxiliary space. All operations modify the implicit list (the phone book) that contains the data, but this list is the *input* itself, not auxiliary space. The problem description describes only modifying the list, not making copies or using other data structures in a way that impacts the auxiliary space beyond constant factors. Therefore, the auxiliary space complexity is constant, O(1).

Optimal Solution

Approach

We're creating a tool that stores information using keys, like a real-world dictionary. Instead of looking through every possible storage location, we'll use a clever system to quickly find the right spot for each key. This system will allow us to store, retrieve, and update information efficiently.

Here's how the algorithm would work step-by-step:

  1. First, divide the entire storage area into smaller compartments. Think of it like creating sections in a giant filing cabinet.
  2. When we need to store something, we'll use the key to decide which compartment it belongs to. Imagine using the first letter of a word to choose a drawer in the filing cabinet.
  3. Sometimes, different keys might point to the same compartment. When this happens, we'll keep a list of all the items that belong in that specific compartment.
  4. To find something, we'll use the key to quickly find the right compartment. Then, we'll check the list of items in that compartment until we find what we're looking for.
  5. To remove something, we find its compartment and remove it from the list of items stored there.
  6. When adding, finding, or removing, if the compartment's list gets too long, we'll expand that list to make it easier and faster to search within that compartment.
  7. By using compartments and lists, we can quickly find the right storage location without searching everywhere, making the whole process much faster.

Code Implementation

class MyHashMap:

    def __init__(self):
        self.key_space = 2069
        self.hash_table = [[] for _ in range(self.key_space)]

    def put(self, key: int, value: int) -> None:
        key_index = key % self.key_space
        bucket = self.hash_table[key_index]

        for i, (existing_key, existing_value) in enumerate(bucket):
            if existing_key == key:
                bucket[i] = (key, value)
                return

        bucket.append((key, value))

    def get(self, key: int) -> int:
        key_index = key % self.key_space
        bucket = self.hash_table[key_index]

        for existing_key, existing_value in bucket:
            if existing_key == key:
                return existing_value
        return -1

    def remove(self, key: int) -> None:
        key_index = key % self.key_space
        bucket = self.hash_table[key_index]
        # Need to iterate with index to remove the element
        for i, (existing_key, existing_value) in enumerate(bucket):
            if existing_key == key:
                # Removing the key if it exists.
                del bucket[i]
                return

# Your MyHashMap object will be instantiated and called as such:
# my_hash_map = MyHashMap()
# my_hash_map.put(key, value)
# param_2 = my_hash_map.get(key)
# my_hash_map.remove(key)

Big(O) Analysis

Time Complexity
O(n)The HashMap uses a bucketing strategy where we have a fixed number of buckets. Ideally, keys are distributed evenly across the buckets using a hash function. In the best and average cases, get, put, and remove operations take O(1) time each. In the worst case, all keys map to the same bucket, turning that bucket into a linked list of size n, where n is the number of keys. In this worst-case scenario, the operations would take O(n) time because you have to potentially traverse all n items in the single bucket. Therefore, overall the operations are O(n).
Space Complexity
O(N)The described HashMap implementation uses an array of compartments (buckets). In the worst-case scenario, all N keys map to different compartments, and each compartment stores a single key-value pair. This requires allocating space proportional to the number of keys, N. If collisions happen, linked lists will be created, potentially storing all N key-value pairs across all lists, still resulting in a space complexity of O(N), where N is the number of key-value pairs stored in the HashMap.

Edge Cases

CaseHow to Handle
Null or negative keysDefine the HashMap's behavior with null/negative keys, either by throwing an exception or treating them as valid keys by appropriate hashing.
Zero valueThe zero value should be handled the same way as any other valid integer value; no special treatment is required.
Extremely large number of keys (close to Integer.MAX_VALUE)Ensure the initial capacity and load factor of the HashMap are appropriately set to handle a large number of keys and minimize collisions to maintain reasonable performance.
Hash collisions (multiple keys map to the same index)Separate chaining should handle collisions gracefully by storing multiple key-value pairs at the same index in a linked list or other suitable data structure.
Deleting a key that doesn't existThe remove operation should not throw an error if the key doesn't exist; instead, it should simply do nothing, ensuring stability of the HashMap.
Updating value of existing key to nullThe 'put' operation should handle updating a key's value to null correctly, overwriting any previous non-null value.
Very uneven key distribution leading to excessive collisions in some bucketsConsider using a better hash function or dynamically resizing the hash table to redistribute keys and reduce the length of collision chains.
Integer overflow when calculating hashUse a modulo operator to ensure the hash index stays within the valid range of the underlying array, or choose a hash function that minimizes overflow risk.