Taro Logo

Design HashMap

Easy
Nvidia logo
Nvidia
0 views
Topics:
ArraysLinked ListsDynamic ProgrammingGreedy Algorithms

Design a HashMap without using any built-in hash table libraries.

Implement the MyHashMap class:

  • MyHashMap() initializes the object with an empty map.
  • void put(int key, int value) inserts a (key, value) pair into the HashMap. If the key already exists in the map, update the corresponding value.
  • int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
  • void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.

For example:

Input
["MyHashMap", "put", "put", "get", "get", "put", "get", "remove", "get"]
[[], [1, 1], [2, 2], [1], [3], [2, 1], [2], [2], [2]]
Output
[null, null, null, 1, -1, null, 1, null, -1]

Explanation
MyHashMap myHashMap = new MyHashMap();
myHashMap.put(1, 1); // The map is now [[1,1]]
myHashMap.put(2, 2); // The map is now [[1,1], [2,2]]
myHashMap.get(1);    // return 1, The map is now [[1,1], [2,2]]
myHashMap.get(3);    // return -1 (i.e., not found), The map is now [[1,1], [2,2]]
myHashMap.put(2, 1); // The map is now [[1,1], [2,1]] (i.e., update the existing value)
myHashMap.get(2);    // return 1, The map is now [[1,1], [2,1]]
myHashMap.remove(2); // remove the mapping for 2, The map is now [[1,1]]
myHashMap.get(2);    // return -1 (i.e., not found), The map is now [[1,1]]

Constraints:

  • 0 <= key, value <= 10^6
  • At most 10^4 calls will be made to put, get, and remove.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible integer values for the keys and values to be stored in the HashMap?
  2. How many operations (put, get, remove) are expected to be performed in total? Should I optimize for a large number of operations?
  3. What should the `get` method return if the key is not present in the HashMap?
  4. Are we concerned about memory usage, or is the primary focus on operation speed?
  5. Can I assume that the keys will be integers, or do I need to handle other data types?

Brute Force Solution

Approach

Imagine a simple phone book where you want to store names and phone numbers. The most basic way to do this is to keep a simple list of name-number pairs. To find a number, we just check each pair one by one until we find the right name.

Here's how the algorithm would work step-by-step:

  1. When we want to add a new name and phone number, we simply add it to the end of our list.
  2. When we want to find a phone number for a given name, we start at the beginning of the list and check each name one by one.
  3. If the name matches, we found the number and we can stop looking.
  4. If we get to the end of the list and haven't found the name, it means the name isn't in our phone book.
  5. When we want to update a phone number for a name, we again start at the beginning and look for the name.
  6. If we find the name, we replace the old phone number with the new one.
  7. If we don't find the name, it means the name isn't in our phone book, and we might decide to add it with the new phone number.

Code Implementation

class MyHashMap:

    def __init__(self):
        self.key_value_pairs = []

    def put(self, key: int, value: int) -> None:
        # Iterate through existing pairs to check for key
        for index, pair in enumerate(self.key_value_pairs):
            if pair[0] == key:
                self.key_value_pairs[index] = (key, value)
                return

        self.key_value_pairs.append((key, value))

    def get(self, key: int) -> int:
        # Iterate through pairs to find matching key.
        for pair in self.key_value_pairs:
            if pair[0] == key:
                return pair[1]

        return -1

    def remove(self, key: int) -> None:
        # Iterate to find the key to be removed
        for index, pair in enumerate(self.key_value_pairs):
            if pair[0] == key:
                del self.key_value_pairs[index]

                # Exit after removing the pair.
                return

Big(O) Analysis

Time Complexity
O(n)The get, put, and remove operations each involve iterating through a list of key-value pairs. In the worst-case scenario, the key we're looking for is either at the end of the list or not present, requiring us to examine all n elements. Therefore, each operation has a time complexity of O(n), where n is the number of key-value pairs stored in the data structure. The put operation, in some implementations, might involve first checking if the key exists (which is O(n)) and potentially then appending a new key-value pair (O(1)), still resulting in an overall O(n) complexity for the put operation.
Space Complexity
O(N)The described solution maintains a list of name-number pairs. In the worst-case scenario, we add N name-number pairs to the list, where N represents the total number of entries in our phone book. Therefore, the space required to store this list grows linearly with the number of entries, N. Thus, the auxiliary space complexity is O(N).

Optimal Solution

Approach

Imagine a set of labeled bins where we can store values. Instead of searching every bin to find a value, we use a clever trick to quickly locate the right bin based on the label. This avoids a lot of unnecessary searching and makes finding and storing things much faster.

Here's how the algorithm would work step-by-step:

  1. First, figure out how many bins you need. This is based on how many different labels (or keys) you expect to store.
  2. Next, create a way to convert each label into a bin number. This conversion process should try to spread the labels evenly across all bins to prevent overcrowding.
  3. When you want to store a value with a label, convert the label into a bin number and place the value in that bin.
  4. If a bin already has something in it, store the new value alongside the existing one in that bin. Think of it like a list of things within that bin.
  5. When you want to find a value, convert its label into a bin number and then look in that specific bin for the value.
  6. If there are multiple values in the bin, search through them until you find the value you're looking for.

Code Implementation

class MyHashMap:

    def __init__(self):
        self.capacity = 1000 #Initial size
        self.data = [[] for _ in range(self.capacity)]

    def put(self, key: int, value: int) -> None:
        index = key % self.capacity

        # Handle collisions by adding new k-v pairs to the bucket
        for i, (current_key, _) in enumerate(self.data[index]):
            if current_key == key:
                self.data[index][i] = (key, value)
                return
        self.data[index].append((key, value))

    def get(self, key: int) -> int:
        index = key % self.capacity

        # Search the bucket for the key and return the value
        for current_key, current_value in self.data[index]:
            if current_key == key:
                return current_value
        return -1

    def remove(self, key: int) -> None:
        index = key % self.capacity

        #Locate key within bucket to remove if present
        for i, (current_key, _) in enumerate(self.data[index]):
            if current_key == key:

                #This is how we remove from the list when found
                del self.data[index][i]
                return

Big(O) Analysis

Time Complexity
O(n)The described HashMap implementation involves calculating a hash of the key, which typically takes O(1) time. Inserting, deleting, or getting an element involves calculating the hash to find the bucket and then iterating through the elements within that bucket. In the best-case scenario where the hash function distributes keys evenly, each bucket will contain a small, constant number of elements, resulting in O(1) operations. However, in the worst-case scenario where all keys collide into a single bucket, these operations would require iterating through all n key-value pairs resulting in O(n) time complexity for put, get, and remove operations, where n is the number of keys. With good hashing strategy, however, the average-case performance is usually O(1).
Space Complexity
O(N)The design of this HashMap, as described, involves creating a set of bins. The number of bins is determined by the expected number of keys (N). Each bin could potentially store multiple values if collisions occur. Therefore, in the worst-case scenario, where all keys map to the same bin, we'd need to store all N values in that single bin. Thus, the auxiliary space used by the bins scales linearly with the number of keys (N), resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
Null or undefined key inputReject null keys and throw an IllegalArgumentException, or use a sentinel value if null keys are allowed.
Integer key overflows when hashingUse a bitwise AND operation with the size of the underlying array during hash calculation to prevent index out of bounds.
Extremely high collision rate in the hash function, leading to O(n) lookup time in bucketsImplement separate chaining with balanced trees or probing with rehashing to maintain O(1) average case lookup.
Deleting a non-existent keyThe delete operation should gracefully handle non-existent keys without errors, potentially doing nothing or returning a flag indicating failure.
Inserting a large number of key-value pairs causing the underlying array to reach its maximum capacityDynamically resize the underlying array (e.g., doubling) when the load factor exceeds a threshold, rehashing all existing keys.
Using mutable objects as keys that change after insertion, corrupting the hash and lookupThe hashmap should either copy the object, create an immutable wrapper, or prohibit mutable objects as keys.
Negative integer keysEnsure the hash function handles negative integers correctly, avoiding negative indices or bitwise shifting issues.
Zero keyTreat zero as a valid key, hashing it appropriately just like any other integer key.