Taro Logo

Graph Valid Tree

Medium
Amazon logo
Amazon
2 views
Topics:
Graphs

You are given a graph represented by n nodes labeled from 0 to n - 1, and a list of undirected edges where each edge is a pair of node labels. Determine whether the given graph forms a valid tree.

A valid tree must satisfy the following two conditions:

  1. Connectedness: The graph must be fully connected, meaning there is a path between any two nodes.
  2. Acyclicity: The graph must not contain any cycles.

For example:

  • Example 1: n = 5, edges = [[0,1], [0,2], [0,3], [1,4]]. The expected output is true because the graph is connected and contains no cycles.
  • Example 2: n = 5, edges = [[0,1], [1,2], [2,3], [1,3], [1,4]]. The expected output is false because the graph contains a cycle (1-2-3-1).
  • Example 3: n = 5, edges = [[0,1], [0,2], [0,3], [1,2]]. The expected output is false because the graph contains a cycle (0-1-2-0).
  • Example 4: n = 5, edges = [[0,1], [2,3], [3,4]]. The expected output is false because the graph is disconnected.

Explain your approach, considering time and space complexity. Also, think about edge cases such as an empty graph, a single-node graph, disconnected graphs, and graphs with cycles. Provide code implementation in Python.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of values for the nodes in the graph, and how is the graph represented (e.g., adjacency list, adjacency matrix, list of edges)?
  2. Are there any self-loops (edges from a node to itself) or duplicate edges in the input?
  3. Can the graph be disconnected? If so, what should the output be?
  4. Is 'n' guaranteed to be a non-negative integer, and can it be zero?
  5. What should be returned if the input 'edges' is empty or null? Should I consider that a valid tree if n is 0 or 1?

Brute Force Solution

Approach

We want to determine if a given set of connections represents a valid tree. A brute force approach would explore all possible paths and arrangements to verify the tree properties. We'll check for cycles and ensure all nodes are connected.

Here's how the algorithm would work step-by-step:

  1. Start with one node and treat it as the root of our potential tree.
  2. Explore all possible paths starting from that root node to see if we can reach all other nodes.
  3. While exploring, carefully watch out for cycles, meaning if you visit the same node twice along a path.
  4. If we find a cycle at any point, this arrangement cannot represent a valid tree.
  5. If we have gone through every possible path from our starting node and we still haven't visited all the nodes, the graph is disconnected, meaning it isn't a tree.
  6. If we can reach all other nodes without any cycles, this could be a valid tree. Now, repeat this process by picking a different node as the root.
  7. If any of these starting nodes yields a valid tree and all nodes are connected without cycles, then we can confidently conclude that the given graph is a valid tree. If none of the arrangements forms a valid tree, it means the graph is not a valid tree.

Code Implementation

def is_valid_tree_brute_force(number_of_nodes, edges):
    if not edges and number_of_nodes == 1:
        return True
    if not edges and number_of_nodes != 1:
        return False

    # We'll try each node as a root.
    for start_node in range(number_of_nodes):
        visited_nodes = set()
        stack = [(start_node, -1)] # Node, parent

        while stack:
            node, parent = stack.pop()

            if node in visited_nodes:
                # Cycle detected, not a valid tree.
                break

            visited_nodes.add(node)

            for neighbor_one, neighbor_two in edges:
                if neighbor_one == node:
                    neighbor = neighbor_two
                elif neighbor_two == node:
                    neighbor = neighbor_one
                else:
                    continue

                if neighbor != parent:
                    stack.append((neighbor, node))

        # Check if we visited all nodes and if there are no cycles.
        if len(visited_nodes) == number_of_nodes and node not in visited_nodes:
            return True

    return False

Big(O) Analysis

Time Complexity
O(n!)The described brute-force approach considers each node as a potential root and explores all possible paths. In the worst case, to detect cycles and connectivity, it might explore all permutations of nodes starting from each root. Exploring all possible paths from each starting node to verify connectivity and cycle presence leads to a factorial growth in operations relative to the number of nodes. Therefore, the time complexity can be approximated as O(n!).
Space Complexity
O(N)The provided algorithm uses a depth-first search (DFS) or breadth-first search (BFS) approach implicitly, to explore all possible paths. To keep track of visited nodes during the traversal to detect cycles, we would need a boolean array or set of size N, where N is the number of nodes in the graph. Additionally, the recursion stack in DFS could grow up to N in the worst-case scenario (a skewed tree or a path). Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

We want to determine if a given set of connections between points forms a valid tree. A valid tree must have all points connected and have no closed loops or cycles. We can verify this using a technique that explores the connections in a structured way.

Here's how the algorithm would work step-by-step:

  1. First, check if the number of connections matches the number of points. A tree must have exactly one fewer connection than points.
  2. Choose a starting point and explore its connections to other points. Mark each point we visit as 'seen'.
  3. As we explore, if we ever encounter a point we've already seen (other than the one we just came from), that means there's a loop, and it's not a tree.
  4. Keep exploring until we've visited all points that are connected to the starting point either directly or indirectly.
  5. After exploring, if there are any points that haven't been marked as 'seen', then the graph is not fully connected. It's not a valid tree if there is more than one connected component.
  6. If the number of connections is correct, no loops are found, and all points are connected, then it's a valid tree.

Code Implementation

def is_graph_a_valid_tree(number_of_nodes, edges):
    if len(edges) != number_of_nodes - 1:
        return False

    graph = {i: [] for i in range(number_of_nodes)}
    for node1, node2 in edges:
        graph[node1].append(node2)
        graph[node2].append(node1)

    visited_nodes = [False] * number_of_nodes

    # Start DFS from node 0.
    if not depth_first_search(graph, 0, -1, visited_nodes):
        return False

    # Check if all nodes are visited.
    # This ensures the graph is connected.
    for node_visited in visited_nodes:
        if not node_visited:
            return False

    return True

def depth_first_search(graph, current_node, parent_node, visited_nodes):
    visited_nodes[current_node] = True

    for neighbor_node in graph[current_node]:
        if neighbor_node == parent_node:
            continue

        # Cycle detected!
        if visited_nodes[neighbor_node]:
            return False

        if not depth_first_search(graph, neighbor_node, current_node, visited_nodes):
            return False
    return True

Big(O) Analysis

Time Complexity
O(n)The algorithm first checks if the number of edges equals n-1, which takes O(1) time. Then, a depth-first search (DFS) is performed to traverse the graph, where n is the number of nodes. In the worst-case scenario, the DFS visits each node and edge once. Since a tree has at most n-1 edges, the DFS takes O(n) time. Finally, checking if all nodes have been visited takes O(n) time. Therefore, the overall time complexity is dominated by the DFS, resulting in O(n).
Space Complexity
O(N)The algorithm uses a 'seen' set to keep track of visited nodes during the depth-first search. In the worst case, all N nodes in the graph will be visited and stored in the 'seen' set. Additionally, the adjacency list, implicitly used when exploring connections, can also take up space proportional to N in a sparse graph or N^2 in a dense graph. The recursion stack could also grow up to depth N in the worst case, therefore the dominant factor is O(N).

Edge Cases

CaseHow to Handle
Null or empty input: n = 0, edges = nullReturn true; an empty graph is considered a valid tree.
Single node: n = 1, edges = []Return true; a single node is considered a valid tree.
Disconnected graph: n > 1, but not all nodes are connected through edgesThe number of connected components should be exactly 1, otherwise return false.
Graph contains a cycle: More edges than nodes-1Return false; a tree cannot contain cycles.
Self-loop: An edge connects a node to itself (e.g., [0, 0])The union-find or DFS approach should handle this correctly by detecting the cycle.
Parallel edges: Multiple edges connect the same two nodes (e.g., [0, 1], [0, 1])The cycle detection algorithms should handle this as a cycle, returning false.
Large number of nodes and edges: n is very large (e.g., n = 10000)Union-find should provide near linear time complexity, scaling efficiently.
Input with negative node valuesSince the problem is about connectivity, re-map the node values to positive indices starting from 0.