Design a HashSet data structure from scratch, without utilizing any built-in hash table libraries. Your HashSet should support the following operations:
add(key)
: Inserts the value key
into the HashSet.contains(key)
: Returns true
if the value key
exists in the HashSet, otherwise returns false
.remove(key)
: Removes the value key
from the HashSet. If key
is not present, the function should do nothing.Consider the following constraints:
0 <= key <= 10^6
add
, remove
, and contains
will be at most 10^4
.For example:
MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1);
myHashSet.add(2);
myHashSet.contains(1); // returns true
myHashSet.contains(3); // returns false
myHashSet.add(2);
myHashSet.contains(2); // returns true
myHashSet.remove(2);
myHashSet.contains(2); // returns false
Discuss the time and space complexity of your solution. What are the trade-offs? Can you optimize for space, or time?
This problem requires designing a HashSet without using built-in hash table libraries. We need to implement add
, contains
, and remove
operations.
The most straightforward approach is to use a simple array (or list) of booleans where the index represents the key. If the value at that index is true, the key exists in the set; otherwise, it doesn't.
Implementation:
class MyHashSet:
def __init__(self):
self.size = 1000001 # Given constraint: 0 <= key <= 10^6
self.set = [False] * self.size
def add(self, key: int) -> None:
self.set[key] = True
def remove(self, key: int) -> None:
self.set[key] = False
def contains(self, key: int) -> bool:
return self.set[key]
Explanation:
False
.True
.False
.Big O Analysis:
add
: O(1)remove
: O(1)contains
: O(1)Edge Cases:
A more memory-efficient approach is to use a hash function to map keys to indices in a smaller array. To handle collisions (when two keys map to the same index), we can use separate chaining, where each index in the array points to a linked list (or another dynamic data structure) that stores the keys that hash to that index.
Implementation:
class MyHashSet:
def __init__(self, capacity=1000):
self.capacity = capacity
self.table = [[] for _ in range(capacity)]
def add(self, key: int) -> None:
index = self._hash(key)
if key not in self.table[index]:
self.table[index].append(key)
def remove(self, key: int) -> None:
index = self._hash(key)
if key in self.table[index]:
self.table[index].remove(key)
def contains(self, key: int) -> bool:
index = self._hash(key)
return key in self.table[index]
def _hash(self, key: int) -> int:
return key % self.capacity
Explanation:
table
of a certain capacity
. Each element of the table
is a list (or linked list), initially empty._hash(key)
computes the index in the table
where the key should be stored. A simple modulo operation is used as the hash function. More sophisticated hash functions can be used to minimize collisions.Big O Analysis:
add
: O(1) on average, O(N) in the worst case (when all keys hash to the same index).remove
: O(1) on average, O(N) in the worst case.contains
: O(1) on average, O(N) in the worst case.table
and the lists (or linked lists) to store the keys.Edge Cases:
capacity
also affects performance. A larger capacity reduces the likelihood of collisions but increases memory usage. A smaller capacity saves memory but increases the risk of collisions.table
can be resized dynamically (e.g., doubling the capacity when the load factor exceeds a certain threshold). Resizing involves rehashing all existing keys to the new table
, which can be an expensive operation but amortized over time, it maintains O(1) average time complexity.