Design HashSet

Easy

Design HashSet

Easy

Solution

MyHashSet Implementation

This problem requires designing a HashSet without using built-in hash table libraries. We need to implement add, contains, and remove operations.

1. Naive Solution (Using a Simple Array/List)

The most straightforward approach is to use a simple array (or list) of booleans where the index represents the key. If the value at that index is true, the key exists in the set; otherwise, it doesn't.

Implementation:

class MyHashSet:
    def __init__(self):
        self.size = 1000001  # Given constraint: 0 <= key <= 10^6
        self.set = [False] * self.size

    def add(self, key: int) -> None:
        self.set[key] = True

    def remove(self, key: int) -> None:
        self.set[key] = False

    def contains(self, key: int) -> bool:
        return self.set[key]

Explanation:

Initialization: We initialize a boolean array of size 1000001 (to accommodate keys from 0 to 10^6). All elements are initially False.
Add: To add a key, we set the value at the corresponding index to True.
Remove: To remove a key, we set the value at the corresponding index to False.
Contains: To check if a key exists, we simply return the value at the corresponding index.

Big O Analysis:

Time Complexity:
- add: O(1)
- remove: O(1)
- contains: O(1)
Space Complexity: O(N), where N is the maximum possible key value (1000001 in this case).

Edge Cases:

The main limitation is the large space requirement if the range of possible keys is very large, even if the actual number of stored keys is small. This solution is suitable when the range of keys is relatively constrained as specified in the problem.

2. Optimal Solution: Using Hashing with Separate Chaining

A more memory-efficient approach is to use a hash function to map keys to indices in a smaller array. To handle collisions (when two keys map to the same index), we can use separate chaining, where each index in the array points to a linked list (or another dynamic data structure) that stores the keys that hash to that index.

Implementation:

class MyHashSet:
    def __init__(self, capacity=1000):
        self.capacity = capacity
        self.table = [[] for _ in range(capacity)]

    def add(self, key: int) -> None:
        index = self._hash(key)
        if key not in self.table[index]:
            self.table[index].append(key)

    def remove(self, key: int) -> None:
        index = self._hash(key)
        if key in self.table[index]:
            self.table[index].remove(key)

    def contains(self, key: int) -> bool:
        index = self._hash(key)
        return key in self.table[index]

    def _hash(self, key: int) -> int:
        return key % self.capacity

Explanation:

Initialization: We create an array (list) table of a certain capacity. Each element of the table is a list (or linked list), initially empty.
Hash Function: _hash(key) computes the index in the table where the key should be stored. A simple modulo operation is used as the hash function. More sophisticated hash functions can be used to minimize collisions.
Add: We compute the hash index for the key. If the key is not already in the list at that index, we add it.
Remove: We compute the hash index for the key. If the key is in the list at that index, we remove it.
Contains: We compute the hash index for the key and check if the key exists in the list at that index.

Big O Analysis:

Time Complexity:
- add: O(1) on average, O(N) in the worst case (when all keys hash to the same index).
- remove: O(1) on average, O(N) in the worst case.
- contains: O(1) on average, O(N) in the worst case.
Space Complexity: O(N), where N is the number of unique keys stored in the HashSet. The space is used for the table and the lists (or linked lists) to store the keys.

Edge Cases:

Hash Function Quality: The performance of this solution heavily depends on the quality of the hash function. A good hash function distributes keys evenly across the table, minimizing collisions. A poor hash function can lead to all keys mapping to the same index, resulting in O(N) time complexity for all operations.
Table Capacity: The choice of capacity also affects performance. A larger capacity reduces the likelihood of collisions but increases memory usage. A smaller capacity saves memory but increases the risk of collisions.
Resizing: To maintain good performance as the number of keys increases, the table can be resized dynamically (e.g., doubling the capacity when the load factor exceeds a certain threshold). Resizing involves rehashing all existing keys to the new table, which can be an expensive operation but amortized over time, it maintains O(1) average time complexity.

Solution

MyHashSet Implementation

This problem requires designing a HashSet without using built-in hash table libraries. We need to implement add, contains, and remove operations.

1. Naive Solution (Using a Simple Array/List)

Implementation:

class MyHashSet:
    def __init__(self):
        self.size = 1000001  # Given constraint: 0 <= key <= 10^6
        self.set = [False] * self.size

    def add(self, key: int) -> None:
        self.set[key] = True

    def remove(self, key: int) -> None:
        self.set[key] = False

    def contains(self, key: int) -> bool:
        return self.set[key]

Explanation:

Initialization: We initialize a boolean array of size 1000001 (to accommodate keys from 0 to 10^6). All elements are initially False.
Add: To add a key, we set the value at the corresponding index to True.
Remove: To remove a key, we set the value at the corresponding index to False.
Contains: To check if a key exists, we simply return the value at the corresponding index.

Big O Analysis:

Time Complexity:
- add: O(1)
- remove: O(1)
- contains: O(1)
Space Complexity: O(N), where N is the maximum possible key value (1000001 in this case).

Edge Cases:

The main limitation is the large space requirement if the range of possible keys is very large, even if the actual number of stored keys is small. This solution is suitable when the range of keys is relatively constrained as specified in the problem.

2. Optimal Solution: Using Hashing with Separate Chaining

Implementation:

class MyHashSet:
    def __init__(self, capacity=1000):
        self.capacity = capacity
        self.table = [[] for _ in range(capacity)]

    def add(self, key: int) -> None:
        index = self._hash(key)
        if key not in self.table[index]:
            self.table[index].append(key)

    def remove(self, key: int) -> None:
        index = self._hash(key)
        if key in self.table[index]:
            self.table[index].remove(key)

    def contains(self, key: int) -> bool:
        index = self._hash(key)
        return key in self.table[index]

    def _hash(self, key: int) -> int:
        return key % self.capacity

Explanation:

Initialization: We create an array (list) table of a certain capacity. Each element of the table is a list (or linked list), initially empty.
Hash Function: _hash(key) computes the index in the table where the key should be stored. A simple modulo operation is used as the hash function. More sophisticated hash functions can be used to minimize collisions.
Add: We compute the hash index for the key. If the key is not already in the list at that index, we add it.
Remove: We compute the hash index for the key. If the key is in the list at that index, we remove it.
Contains: We compute the hash index for the key and check if the key exists in the list at that index.

Big O Analysis:

Time Complexity:
- add: O(1) on average, O(N) in the worst case (when all keys hash to the same index).
- remove: O(1) on average, O(N) in the worst case.
- contains: O(1) on average, O(N) in the worst case.
Space Complexity: O(N), where N is the number of unique keys stored in the HashSet. The space is used for the table and the lists (or linked lists) to store the keys.

Edge Cases:

Hash Function Quality: The performance of this solution heavily depends on the quality of the hash function. A good hash function distributes keys evenly across the table, minimizing collisions. A poor hash function can lead to all keys mapping to the same index, resulting in O(N) time complexity for all operations.
Table Capacity: The choice of capacity also affects performance. A larger capacity reduces the likelihood of collisions but increases memory usage. A smaller capacity saves memory but increases the risk of collisions.
Resizing: To maintain good performance as the number of keys increases, the table can be resized dynamically (e.g., doubling the capacity when the load factor exceeds a certain threshold). Resizing involves rehashing all existing keys to the new table, which can be an expensive operation but amortized over time, it maintains O(1) average time complexity.