Taro Logo

Number of Connected Components in an Undirected Graph

Medium
Amazon logo
Amazon
1 view
Topics:
Graphs

Given an undirected graph represented by a number of nodes n and a list of edges edges, write a function to find the number of connected components in the graph.

Each node is labeled from 0 to n - 1. The edges list contains pairs of node labels representing the connections between them.

A connected component is defined as a subgraph in which any two vertices are connected to each other by a path, and which is connected to no additional vertices in the supergraph.

For example:

  1. If n = 5 and edges = [[0, 1], [1, 2], [3, 4]], the function should return 2. The two connected components are {0, 1, 2} and {3, 4}.
  2. If n = 5 and edges = [[0, 1], [1, 2], [2, 3], [3, 4]], the function should return 1. All nodes are connected in a single component.
  3. If n = 5 and edges = [[0, 1], [2, 3], [4, 4]], the function should return 2. The connected components are {0, 1}, {2, 3}, and {4} (a single node connected to itself).
  4. If n = 10 and edges = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]], the function should return 5. Each pair forms a connected component.

Consider edge cases such as an empty graph (n = 0), a graph with no edges, and a graph with self-loops. Explain the time and space complexity of your approach. Can you propose an algorithm that performs better on large graphs?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. How is the graph represented as input? Is it an adjacency list, an adjacency matrix, or a list of edges?
  2. What is the range of values for the node labels? Are they integers, and if so, are they positive or can they be negative?
  3. Can the graph be empty (no nodes or edges), and if so, what should the return value be?
  4. Are there any self-loops (edges from a node to itself) or parallel edges (multiple edges between the same pair of nodes)?
  5. Are the node labels guaranteed to be contiguous, or could there be gaps in the numbering (e.g., nodes labeled 1, 2, and 5)?

Brute Force Solution

Approach

The goal is to find how many separate groups of friends exist where friends are defined as people directly or indirectly connected. The brute force approach involves exploring all possible groupings by exhaustively checking connections between people. It's like trying every possible team combination to see which teams form independently.

Here's how the algorithm would work step-by-step:

  1. Start by assuming each person is in their own separate group.
  2. Pick any two people and check if they are directly connected as friends.
  3. If they are friends, merge their groups into a single group.
  4. Repeat the process of picking any two people until you have checked every possible pair of people to see if they are friends.
  5. After you have checked all possible pairs, count how many separate groups remain. This is the number of connected components.

Code Implementation

def count_connected_components_brute_force(number_of_nodes, edges):
    # Initialize each node as its own component
    node_components = list(range(number_of_nodes))

    # Iterate through all possible pairs of nodes
    for first_node in range(number_of_nodes):
        for second_node in range(first_node + 1, number_of_nodes):
            # Check if an edge exists between the two nodes.
            if (first_node, second_node) in edges or (second_node, first_node) in edges:

                # Merge components if nodes are connected
                first_node_component = node_components[first_node]
                second_node_component = node_components[second_node]

                # If they are in different components, merge them
                if first_node_component != second_node_component:
                    # Update all nodes in the first component to the second
                    for node_index in range(number_of_nodes):
                        if node_components[node_index] == first_node_component:
                            node_components[node_index] = second_node_component

    # Count the number of distinct components
    number_of_components = len(set(node_components))
    return number_of_components

Big(O) Analysis

Time Complexity
O(n^2)The provided solution involves iterating through all possible pairs of people to check for direct connections. Given n people, we need to consider each possible pair. The outer loop conceptually iterates n times, and the inner loop, in the worst case, iterates approximately n times for each outer loop iteration, to check connections. This results in roughly n * n comparisons. Therefore, the time complexity is O(n^2).
Space Complexity
O(N)The algorithm maintains a grouping for each person, initially assuming each of the N people are in their own group. The groups need to be stored in memory and could be represented with an array of size N where each index represents a person and the value represents its group. Therefore, the auxiliary space required to store these groupings is proportional to N. Consequently, the space complexity of this algorithm is O(N).

Optimal Solution

Approach

The problem asks us to count how many separate groups of friends exist in a network. The key idea is to explore each person's group completely before moving on to the next, using a smart method to keep track of who we've already visited.

Here's how the algorithm would work step-by-step:

  1. Imagine each person in the network as a node and friendships as connections. Start with any person.
  2. Explore all of that person's friends, then explore all of their friends' friends, and so on, until you've found everyone in that person's group.
  3. As you explore, mark each person as 'visited' so you don't accidentally visit them again from another starting point and double-count them.
  4. Once you've explored the entire group, you've found one connected component. Increase your count.
  5. Now, find a person who hasn't been visited yet. If there are no unvisited people, you're done!
  6. Repeat the exploration process starting from this new unvisited person. Each time you finish exploring a group, increment the count.
  7. The final count represents the total number of separate groups (connected components) in the network.

Code Implementation

def count_connected_components(number_of_nodes, adjacency_list):
    visited_nodes = [False] * number_of_nodes
    number_of_components = 0

    def depth_first_search(node):
        visited_nodes[node] = True

        for neighbor in adjacency_list[node]:
            if not visited_nodes[neighbor]:
                depth_first_search(neighbor)

    # Iterate through all nodes
    for node in range(number_of_nodes):
        # Ensure that we only explore unvisited components.
        if not visited_nodes[node]:

            depth_first_search(node)
            number_of_components += 1

            # Once the DFS is complete, increment the counter

    return number_of_components

Big(O) Analysis

Time Complexity
O(V + E)The algorithm iterates through all vertices (nodes) once to find unvisited nodes, which takes O(V) time, where V is the number of vertices. For each vertex, it explores its neighbors using either Depth-First Search (DFS) or Breadth-First Search (BFS). In the worst case, it may traverse all edges E connected to the vertices, resulting in O(E) time. Therefore, the overall time complexity is O(V + E), where V is the number of vertices and E is the number of edges in the graph. The algorithm's runtime is directly proportional to the size of the graph represented by its vertices and edges.
Space Complexity
O(N)The algorithm uses a 'visited' set to keep track of which nodes have already been explored, where N is the number of people (nodes) in the network. In the worst-case scenario, we might visit all N nodes, requiring space proportional to N to store the visited status of each node. Additionally, the algorithm implicitly uses a recursion stack (or a queue in BFS), which, in the worst case of a deeply connected graph or a skewed tree-like structure, could also reach a depth proportional to N. Therefore, the auxiliary space used is O(N).

Edge Cases

CaseHow to Handle
Null or empty graph (no nodes)Return 0 as there are no connected components in an empty graph.
Graph with only one nodeReturn 1 as a single node constitutes one connected component.
Graph with no edges (all nodes isolated)Return the number of nodes as each node is its own component.
Graph with a single connected component (fully connected)Return 1, representing the entire graph as one component.
Graph represented as adjacency list with self-loops (node connected to itself)Ensure the DFS or BFS algorithm avoids infinite loops by marking visited nodes.
Graph with disconnected nodes (multiple components)The algorithm must correctly identify and count all separate components.
Large graph (large number of nodes and edges)Use an efficient algorithm like DFS or BFS with appropriate data structures to avoid stack overflow or excessive memory usage; consider iterative BFS for very large graphs.
Graph with duplicate edges between same nodesThe algorithm should still correctly identify connected components regardless of duplicate edges as they don't affect connectivity.