Taro Logo

Design HashMap

Easy
3 views
a month ago

Design a HashMap without using any built-in hash table libraries. Implement the MyHashMap class:

  1. MyHashMap() initializes the object with an empty map.
  2. void put(int key, int value) inserts a (key, value) pair into the HashMap. If the key already exists in the map, update the corresponding value.
  3. int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
  4. void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.

For example:

Input
["MyHashMap", "put", "put", "get", "get", "put", "get", "remove", "get"]
[[], [1, 1], [2, 2], [1], [3], [2, 1], [2], [2], [2]]
Output
[null, null, null, 1, -1, null, 1, null, -1]

What are the time and space complexities of each operation, and how do you handle collisions?

Sample Answer
# MyHashMap Implementation

This response provides an implementation of a HashMap without using any built-in hash table libraries. The implementation utilizes an array of linked lists (buckets) to handle collisions. Each bucket stores key-value pairs. The hash function maps keys to indices within the array.

## Code Implementation

```python
class MyHashMap:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.capacity = 1000  # Initial capacity of the hash map
        self.table = [None] * self.capacity  # Array of linked lists (buckets)

    def _hash(self, key):
        """Hashes the key to an index in the hash table.
        A simple hash function to convert the key into an index.
        """
        return key % self.capacity

    def put(self, key: int, value: int) -> None:
        """
        Inserts a (key, value) pair into the HashMap.
        If the key already exists in the map, update the corresponding value.
        """
        index = self._hash(key)
        if self.table[index] is None:
            self.table[index] = []  # Create a new list (bucket)
        
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)  # Update existing key
                return
        
        self.table[index].append((key, value))  # Add new key-value pair

    def get(self, key: int) -> int:
        """
        Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
        """
        index = self._hash(key)
        if self.table[index] is None:
            return -1  # Key not found
        
        for k, v in self.table[index]:
            if k == key:
                return v  # Key found, return value
        
        return -1  # Key not found

    def remove(self, key: int) -> None:
        """
        Removes the key and its corresponding value if the map contains the mapping for the key.
        """
        index = self._hash(key)
        if self.table[index] is None:
            return  # Key not found
        
        self.table[index] = [(k, v) for k, v in self.table[index] if k != key]

# Example Usage
myHashMap = MyHashMap()
myHashMap.put(1, 1)
myHashMap.put(2, 2)
print(myHashMap.get(1))    # Output: 1
print(myHashMap.get(3))    # Output: -1
myHashMap.put(2, 1)
print(myHashMap.get(2))    # Output: 1
myHashMap.remove(2)
print(myHashMap.get(2))    # Output: -1

Explanation

  1. Initialization:

    • The __init__ method initializes the HashMap with a fixed capacity (e.g., 1000). It creates a list (array) named table of the specified capacity, where each element is initially None. This table will hold the buckets. Each bucket is essentially a linked list (implemented via a Python list for simplicity) that stores key-value pairs. This handles collision.
  2. Hash Function:

    • The _hash method takes a key as input and returns an index within the table. This is a crucial part of the HashMap as it determines where a key-value pair will be stored. A simple modulo operation (key % self.capacity) is used to map the key to an index. Good hash function minimize collision.
  3. Put (Insert/Update):

    • The put method inserts a key-value pair into the HashMap. It first calculates the index using the _hash method. Then, it checks if a bucket exists at that index.
      • If no bucket exists, it creates a new list at that index.
      • It iterates through the existing key-value pairs in the bucket to see if the key already exists.
        • If the key exists, it updates the value.
        • If the key doesn't exist, it appends the new key-value pair to the bucket.
  4. Get (Retrieve):

    • The get method retrieves the value associated with a given key. It calculates the index using the _hash method and checks if a bucket exists at that index.
      • If no bucket exists, it returns -1 (key not found).
      • It iterates through the key-value pairs in the bucket to find the key.
        • If the key is found, it returns the corresponding value.
        • If the key is not found after iterating through the entire bucket, it returns -1.
  5. Remove (Delete):

    • The remove method removes a key-value pair from the HashMap. It calculates the index using the _hash method and checks if a bucket exists at that index.
      • If no bucket exists, there is nothing to remove, so the method returns.
      • It filters the key-value pairs in the bucket, creating a new list that excludes the key to be removed. This effectively removes the key-value pair from the bucket. The updated list is then assigned back to the corresponding index in the table.

Big(O) Runtime Analysis

  • Put:
    • Average Case: O(1). In the average case, the hash function distributes keys evenly across the buckets, so the search within a bucket takes constant time.
    • Worst Case: O(n), where n is the number of key-value pairs in the bucket. This occurs when all keys hash to the same index (collision).
  • Get:
    • Average Case: O(1). Similar to put, the average case is constant time if keys are evenly distributed.
    • Worst Case: O(n), where n is the number of key-value pairs in the bucket. Occurs with high collision.
  • Remove:
    • Average Case: O(1). On average, removing an element from a bucket takes constant time.
    • Worst Case: O(n), where n is the number of key-value pairs in the bucket. In worst case scenarios it takes O(n) to find the node.

Big(O) Space Usage Analysis

  • Space Complexity: O(N), where N is the number of keys to be inserted in the HashMap. In the worst-case scenario, all keys are mapped to the same index, and the space complexity becomes O(N) because we need to store all key-value pairs in a single bucket (linked list). However, if the keys are well-distributed across all the buckets, the space complexity will be closer to O(K), where K is the number of buckets (which is constant, based on capacity). The overall space complexity is dominated by the number of key-value pairs stored in the HashMap.

Edge Cases

  1. Hash Collisions:

    • Scenario: Multiple keys map to the same index.
    • Handling: Use a linked list (or other data structure like a tree) at each index to store multiple key-value pairs. When retrieving, search the linked list for the key.
  2. Large Number of Keys:

    • Scenario: The HashMap becomes very full, leading to increased collisions and degraded performance.
    • Handling: Implement dynamic resizing. When the number of keys exceeds a certain threshold (load factor), create a new table with a larger capacity (e.g., double the size), and rehash all existing keys into the new table.
  3. Negative Keys:

    • Scenario: The keys can be negative values.
    • Handling: Adjust the hash function to handle negative keys. For example, take the absolute value of the key before applying the modulo operation: abs(key) % capacity.
  4. Non-Integer Keys:

    • Scenario: The keys are not integers (e.g., strings, objects).
    • Handling: Use a suitable hash function that can convert the keys to integers. For strings, common hash functions include the polynomial hash function or using built-in hash functions (if allowed by problem constraints).

Additional Considerations

  • Load Factor: The load factor (number of entries / capacity) affects the performance of the HashMap. A high load factor increases the likelihood of collisions, while a low load factor wastes memory. A typical load factor is around 0.75.
  • Resizing: When the load factor exceeds a threshold, the HashMap should be resized. Resizing involves creating a new table with a larger capacity and rehashing all existing keys into the new table. This can be an expensive operation, so it should be done judiciously.
  • Choice of Hash Function: The choice of hash function is crucial for good performance. A good hash function should distribute keys evenly across the buckets and minimize collisions. For integer keys, a simple modulo operation may suffice. For other types of keys, more sophisticated hash functions may be needed.