Taro Logo

Remove Duplicates From an Unsorted Linked List

Medium
Asked by:
Profile picture
1 view
Topics:
Linked Lists

Given the head of an unsorted linked list, write a function to remove all the duplicate nodes from the list.

Example 1:


Input: head = [1,2,3,2,1]
Output: [1,2,3]
Explanation: The linked list should contain only distinct values.

Example 2:


Input: head = [1,3,1,4,5,5,2,3]
Output: [1,3,4,5,2]
Explanation: The linked list should contain only distinct values.

Constraints:

  • The number of nodes in the list is in the range [0, 104].
  • -105 <= Node.val <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values stored in the linked list nodes? Are negative values, zero, or null values possible?
  2. What should I return if the input linked list `head` is null or empty?
  3. Should I preserve the original order of the remaining, non-duplicate nodes in the list?
  4. Is it acceptable to modify the existing linked list in-place, or do I need to create a new linked list?
  5. Can you provide an example input and the expected output to illustrate the desired behavior?

Brute Force Solution

Approach

The brute force method for removing duplicates in a linked list is simple but inefficient. We go through each item in the list and, for each item, we check all the other items to see if they are the same.

Here's how the algorithm would work step-by-step:

  1. Start with the very first item in the list.
  2. Compare this first item to every other item in the list one by one.
  3. If you find an item that is the same as the first item, remove the duplicate item from the list.
  4. Move on to the next item in the original list.
  5. Repeat the process of comparing this item to all the items that come after it in the list.
  6. Again, if you find a duplicate, remove it.
  7. Continue this process until you have checked all the items in the list.

Code Implementation

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

def remove_duplicates_brute_force(head):
    current_node = head

    # Iterate through each node in the list.
    while current_node:
        runner = current_node

        # Compare the current node with all subsequent nodes.
        while runner.next:
            if runner.next.data == current_node.data:
                # Remove duplicate node if found
                runner.next = runner.next.next

            else:
                runner = runner.next

        current_node = current_node.next

    return head

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through the linked list of n elements. For each element, it compares it to all subsequent elements in the list to find duplicates. In the worst case, for the first element, we potentially compare it with n-1 elements, for the second with n-2 elements, and so on. This results in approximately n * (n-1)/2 comparisons. Therefore, the time complexity simplifies to O(n²).
Space Complexity
O(1)The brute force approach iterates through the linked list using pointers. It doesn't create any auxiliary data structures like arrays, hash maps, or sets to store the elements. The algorithm only uses a constant amount of extra memory to store pointers for traversing and comparing nodes. Thus, the space complexity is independent of the number of nodes, N, in the linked list and is considered constant.

Optimal Solution

Approach

To remove duplicate items from a linked list efficiently, we'll keep track of items we've seen before. This allows us to quickly identify and skip over duplicates as we go through the list. We use extra storage to make the removal process faster.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the linked list.
  2. Create a 'memory' to remember each unique item we encounter.
  3. Look at the current item in the list. Check if we've seen it before by consulting our 'memory'.
  4. If the item is already in our 'memory', we know it's a duplicate. Remove it from the list by bypassing it (connecting the previous item directly to the next item).
  5. If the item is new (not in our 'memory'), add it to our 'memory' and move on to the next item in the list.
  6. Repeat the process until we reach the end of the list. The 'memory' helps us avoid unnecessary comparisons, making the process quick.

Code Implementation

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

def remove_duplicates_from_unsorted_linked_list(head):
    if not head:
        return head

    seen_values = set()
    current_node = head
    previous_node = None

    while current_node:
        # Check if the current node's data is a duplicate
        if current_node.data in seen_values:

            # Remove duplicate by skipping the current node
            previous_node.next = current_node.next
            current_node = current_node.next
        else:
            # Add current node's data to seen values
            seen_values.add(current_node.data)

            previous_node = current_node
            current_node = current_node.next

    return head

Big(O) Analysis

Time Complexity
O(n)We iterate through the linked list once, examining each of the n nodes. For each node, we perform a constant-time lookup in our 'memory' (hash table or set) to check for duplicates. Adding a node to the 'memory' also takes constant time. Therefore, the overall time complexity is driven by the single pass through the linked list, resulting in O(n) time.
Space Complexity
O(N)The algorithm uses a 'memory' to remember each unique item encountered in the linked list. This 'memory' is implemented as a hash set (or similar data structure) to efficiently check for duplicates. In the worst-case scenario, all N elements in the linked list are unique. Therefore, the hash set could potentially store all N elements, resulting in auxiliary space proportional to the number of nodes in the list. This leads to a space complexity of O(N).

Edge Cases

CaseHow to Handle
Null or empty linked listReturn null immediately as there are no nodes to process.
Linked list with only one nodeReturn the head directly as there are no other nodes to compare it against.
Linked list with all nodes having the same valueThe solution should remove all but the first node, leaving a single node linked list.
Linked list with a large number of nodesThe solution's time complexity (O(n) with a hash set) should be efficient enough to handle a reasonable number of nodes; consider memory constraints for extremely large lists.
Linked list with many duplicate values clustered togetherThe hash set approach efficiently removes consecutive duplicates without unnecessary traversals.
Linked list containing negative, zero, and positive valuesThe solution's hashing mechanism should handle all integer values without issues.
Linked list where the first node is a duplicate and needs to be removed.Correctly handle removing the head node if it's a duplicate, updating the head pointer.
Memory Constraints with Extremely Long ListIf the list is too large to fit the set in memory, consider using an external sort based approach which will have much higher time complexity (n log n) but will be memory efficient.