Taro Logo

Design HashSet

Easy
Meta logo
Meta
4 views
Topics:
ArraysBit Manipulation

Design a HashSet data structure from scratch, without utilizing any built-in hash table libraries. Your HashSet should support the following operations:

  1. add(key): Inserts the value key into the HashSet.
  2. contains(key): Returns true if the value key exists in the HashSet, otherwise returns false.
  3. remove(key): Removes the value key from the HashSet. If key is not present, the function should do nothing.

Consider the following constraints:

  • 0 <= key <= 10^6
  • The number of calls to add, remove, and contains will be at most 10^4.

For example:

MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1);
myHashSet.add(2);
myHashSet.contains(1); // returns true
myHashSet.contains(3); // returns false
myHashSet.add(2);
myHashSet.contains(2); // returns true
myHashSet.remove(2);
myHashSet.contains(2); // returns false

Discuss the time and space complexity of your solution. What are the trade-offs? Can you optimize for space, or time?

Solution


MyHashSet Implementation

This problem requires designing a HashSet without using built-in hash table libraries. We need to implement add, contains, and remove operations.

1. Naive Solution (Using a Simple Array/List)

The most straightforward approach is to use a simple array (or list) of booleans where the index represents the key. If the value at that index is true, the key exists in the set; otherwise, it doesn't.

Implementation:

class MyHashSet:
    def __init__(self):
        self.size = 1000001  # Given constraint: 0 <= key <= 10^6
        self.set = [False] * self.size

    def add(self, key: int) -> None:
        self.set[key] = True

    def remove(self, key: int) -> None:
        self.set[key] = False

    def contains(self, key: int) -> bool:
        return self.set[key]

Explanation:

  • Initialization: We initialize a boolean array of size 1000001 (to accommodate keys from 0 to 10^6). All elements are initially False.
  • Add: To add a key, we set the value at the corresponding index to True.
  • Remove: To remove a key, we set the value at the corresponding index to False.
  • Contains: To check if a key exists, we simply return the value at the corresponding index.

Big O Analysis:

  • Time Complexity:
    • add: O(1)
    • remove: O(1)
    • contains: O(1)
  • Space Complexity: O(N), where N is the maximum possible key value (1000001 in this case).

Edge Cases:

  • The main limitation is the large space requirement if the range of possible keys is very large, even if the actual number of stored keys is small. This solution is suitable when the range of keys is relatively constrained as specified in the problem.

2. Optimal Solution: Using Hashing with Separate Chaining

A more memory-efficient approach is to use a hash function to map keys to indices in a smaller array. To handle collisions (when two keys map to the same index), we can use separate chaining, where each index in the array points to a linked list (or another dynamic data structure) that stores the keys that hash to that index.

Implementation:

class MyHashSet:
    def __init__(self, capacity=1000):
        self.capacity = capacity
        self.table = [[] for _ in range(capacity)]

    def add(self, key: int) -> None:
        index = self._hash(key)
        if key not in self.table[index]:
            self.table[index].append(key)

    def remove(self, key: int) -> None:
        index = self._hash(key)
        if key in self.table[index]:
            self.table[index].remove(key)

    def contains(self, key: int) -> bool:
        index = self._hash(key)
        return key in self.table[index]

    def _hash(self, key: int) -> int:
        return key % self.capacity

Explanation:

  • Initialization: We create an array (list) table of a certain capacity. Each element of the table is a list (or linked list), initially empty.
  • Hash Function: _hash(key) computes the index in the table where the key should be stored. A simple modulo operation is used as the hash function. More sophisticated hash functions can be used to minimize collisions.
  • Add: We compute the hash index for the key. If the key is not already in the list at that index, we add it.
  • Remove: We compute the hash index for the key. If the key is in the list at that index, we remove it.
  • Contains: We compute the hash index for the key and check if the key exists in the list at that index.

Big O Analysis:

  • Time Complexity:
    • add: O(1) on average, O(N) in the worst case (when all keys hash to the same index).
    • remove: O(1) on average, O(N) in the worst case.
    • contains: O(1) on average, O(N) in the worst case.
  • Space Complexity: O(N), where N is the number of unique keys stored in the HashSet. The space is used for the table and the lists (or linked lists) to store the keys.

Edge Cases:

  • Hash Function Quality: The performance of this solution heavily depends on the quality of the hash function. A good hash function distributes keys evenly across the table, minimizing collisions. A poor hash function can lead to all keys mapping to the same index, resulting in O(N) time complexity for all operations.
  • Table Capacity: The choice of capacity also affects performance. A larger capacity reduces the likelihood of collisions but increases memory usage. A smaller capacity saves memory but increases the risk of collisions.
  • Resizing: To maintain good performance as the number of keys increases, the table can be resized dynamically (e.g., doubling the capacity when the load factor exceeds a certain threshold). Resizing involves rehashing all existing keys to the new table, which can be an expensive operation but amortized over time, it maintains O(1) average time complexity.