Design HashMap Interview Question for Meta

# MyHashMap Implementation

This response provides an implementation of a HashMap without using any built-in hash table libraries. The implementation utilizes an array of linked lists (buckets) to handle collisions. Each bucket stores key-value pairs. The hash function maps keys to indices within the array.

## Code Implementation

```python
class MyHashMap:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.capacity = 1000  # Initial capacity of the hash map
        self.table = [None] * self.capacity  # Array of linked lists (buckets)

    def _hash(self, key):
        """Hashes the key to an index in the hash table.
        A simple hash function to convert the key into an index.
        """
        return key % self.capacity

    def put(self, key: int, value: int) -> None:
        """
        Inserts a (key, value) pair into the HashMap.
        If the key already exists in the map, update the corresponding value.
        """
        index = self._hash(key)
        if self.table[index] is None:
            self.table[index] = []  # Create a new list (bucket)
        
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)  # Update existing key
                return
        
        self.table[index].append((key, value))  # Add new key-value pair

    def get(self, key: int) -> int:
        """
        Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
        """
        index = self._hash(key)
        if self.table[index] is None:
            return -1  # Key not found
        
        for k, v in self.table[index]:
            if k == key:
                return v  # Key found, return value
        
        return -1  # Key not found

    def remove(self, key: int) -> None:
        """
        Removes the key and its corresponding value if the map contains the mapping for the key.
        """
        index = self._hash(key)
        if self.table[index] is None:
            return  # Key not found
        
        self.table[index] = [(k, v) for k, v in self.table[index] if k != key]

# Example Usage
myHashMap = MyHashMap()
myHashMap.put(1, 1)
myHashMap.put(2, 2)
print(myHashMap.get(1))    # Output: 1
print(myHashMap.get(3))    # Output: -1
myHashMap.put(2, 1)
print(myHashMap.get(2))    # Output: 1
myHashMap.remove(2)
print(myHashMap.get(2))    # Output: -1

Explanation

Initialization:
- The __init__ method initializes the HashMap with a fixed capacity (e.g., 1000). It creates a list (array) named table of the specified capacity, where each element is initially None. This table will hold the buckets. Each bucket is essentially a linked list (implemented via a Python list for simplicity) that stores key-value pairs. This handles collision.
Hash Function:
- The _hash method takes a key as input and returns an index within the table. This is a crucial part of the HashMap as it determines where a key-value pair will be stored. A simple modulo operation (key % self.capacity) is used to map the key to an index. Good hash function minimize collision.
Put (Insert/Update):
- The put method inserts a key-value pair into the HashMap. It first calculates the index using the _hash method. Then, it checks if a bucket exists at that index.
  - If no bucket exists, it creates a new list at that index.
  - It iterates through the existing key-value pairs in the bucket to see if the key already exists.
    - If the key exists, it updates the value.
    - If the key doesn't exist, it appends the new key-value pair to the bucket.
Get (Retrieve):
- The get method retrieves the value associated with a given key. It calculates the index using the _hash method and checks if a bucket exists at that index.
  - If no bucket exists, it returns -1 (key not found).
  - It iterates through the key-value pairs in the bucket to find the key.
    - If the key is found, it returns the corresponding value.
    - If the key is not found after iterating through the entire bucket, it returns -1.
Remove (Delete):
- The remove method removes a key-value pair from the HashMap. It calculates the index using the _hash method and checks if a bucket exists at that index.
  - If no bucket exists, there is nothing to remove, so the method returns.
  - It filters the key-value pairs in the bucket, creating a new list that excludes the key to be removed. This effectively removes the key-value pair from the bucket. The updated list is then assigned back to the corresponding index in the table.

Big(O) Runtime Analysis

Put:
- Average Case: O(1). In the average case, the hash function distributes keys evenly across the buckets, so the search within a bucket takes constant time.
- Worst Case: O(n), where n is the number of key-value pairs in the bucket. This occurs when all keys hash to the same index (collision).
Get:
- Average Case: O(1). Similar to put, the average case is constant time if keys are evenly distributed.
- Worst Case: O(n), where n is the number of key-value pairs in the bucket. Occurs with high collision.
Remove:
- Average Case: O(1). On average, removing an element from a bucket takes constant time.
- Worst Case: O(n), where n is the number of key-value pairs in the bucket. In worst case scenarios it takes O(n) to find the node.

Big(O) Space Usage Analysis

Space Complexity: O(N), where N is the number of keys to be inserted in the HashMap. In the worst-case scenario, all keys are mapped to the same index, and the space complexity becomes O(N) because we need to store all key-value pairs in a single bucket (linked list). However, if the keys are well-distributed across all the buckets, the space complexity will be closer to O(K), where K is the number of buckets (which is constant, based on capacity). The overall space complexity is dominated by the number of key-value pairs stored in the HashMap.

Edge Cases

Hash Collisions:
- Scenario: Multiple keys map to the same index.
- Handling: Use a linked list (or other data structure like a tree) at each index to store multiple key-value pairs. When retrieving, search the linked list for the key.
Large Number of Keys:
- Scenario: The HashMap becomes very full, leading to increased collisions and degraded performance.
- Handling: Implement dynamic resizing. When the number of keys exceeds a certain threshold (load factor), create a new table with a larger capacity (e.g., double the size), and rehash all existing keys into the new table.
Negative Keys:
- Scenario: The keys can be negative values.
- Handling: Adjust the hash function to handle negative keys. For example, take the absolute value of the key before applying the modulo operation: abs(key) % capacity.
Non-Integer Keys:
- Scenario: The keys are not integers (e.g., strings, objects).
- Handling: Use a suitable hash function that can convert the keys to integers. For strings, common hash functions include the polynomial hash function or using built-in hash functions (if allowed by problem constraints).

Additional Considerations

Load Factor: The load factor (number of entries / capacity) affects the performance of the HashMap. A high load factor increases the likelihood of collisions, while a low load factor wastes memory. A typical load factor is around 0.75.
Resizing: When the load factor exceeds a threshold, the HashMap should be resized. Resizing involves creating a new table with a larger capacity and rehashing all existing keys into the new table. This can be an expensive operation, so it should be done judiciously.
Choice of Hash Function: The choice of hash function is crucial for good performance. A good hash function should distribute keys evenly across the buckets and minimize collisions. For integer keys, a simple modulo operation may suffice. For other types of keys, more sophisticated hash functions may be needed.