Taro Logo

Minimum Number of Vertices to Reach All Nodes

Medium
Asked by:
Profile picture
Profile picture
Profile picture
8 views
Topics:
Graphs

Given a directed acyclic graph, with n vertices numbered from 0 to n-1, and an array edges where edges[i] = [fromi, toi] represents a directed edge from node fromi to node toi.

Find the smallest set of vertices from which all nodes in the graph are reachable. It's guaranteed that a unique solution exists.

Notice that you can return the vertices in any order.

Example 1:

Input: n = 6, edges = [[0,1],[0,2],[2,5],[3,4],[4,2]]
Output: [0,3]
Explanation: It's not possible to reach all the nodes from a single vertex. From 0 we can reach [0,1,2,5]. From 3 we can reach [3,4,2,5]. So we output [0,3].

Example 2:

Input: n = 5, edges = [[0,1],[2,1],[3,1],[1,4],[2,4]]
Output: [0,2,3]
Explanation: Notice that vertices 0, 3 and 2 are not reachable from any other node, so we must include them. Also any of these vertices can reach nodes 1 and 4.

Constraints:

  • 2 <= n <= 10^5
  • 1 <= edges.length <= min(10^5, n * (n - 1) / 2)
  • edges[i].length == 2
  • 0 <= fromi, toi < n
  • All pairs (fromi, toi) are distinct.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range for the number of nodes `n`? Can I assume it fits within a 32-bit integer?
  2. Are the edges in the `edges` list directed or undirected?
  3. Is it guaranteed that the graph is a directed acyclic graph (DAG)?
  4. If there are multiple sets of vertices that cover all nodes, is any valid set acceptable, or is there a specific criteria to choose one (e.g., the set with the fewest vertices)?
  5. Can the `edges` list be empty? If so, what should I return?

Brute Force Solution

Approach

The brute force approach for this problem is like trying out every single combination of starting points and seeing if we can reach all locations. We explore every possibility, even if it takes a long time, to guarantee we find the right set of starting points.

Here's how the algorithm would work step-by-step:

  1. First, consider each location as a potential starting point, one at a time.
  2. For each potential starting point, trace all the paths that can be reached from it.
  3. See if starting from just that single location allows you to reach every other location.
  4. If that doesn't work, try combining pairs of starting locations.
  5. Check if starting from that pair of locations allows you to reach every other location.
  6. Keep going, testing combinations of three, four, five, and so on, until you've tried all possible combinations of starting locations.
  7. As you try different combinations, keep track of the smallest set of starting points that can reach every location.
  8. Finally, report this smallest set of starting points.

Code Implementation

def minimum_vertices_to_reach_all_nodes_brute_force(number_of_nodes, edges):
    all_nodes = set(range(number_of_nodes))
    minimum_size_subset = list(range(number_of_nodes))

    for subset_size in range(1, number_of_nodes + 1):
        import itertools
        for combination in itertools.combinations(range(number_of_nodes), subset_size):
            reachable_nodes = set()
            starting_nodes = list(combination)

            # Explore all reachable nodes from the selected subset.
            for start_node in starting_nodes:
                nodes_to_visit = [start_node]
                visited_nodes = {start_node}

                while nodes_to_visit:
                    current_node = nodes_to_visit.pop(0)
                    reachable_nodes.add(current_node)

                    for start_node_edge, end_node_edge in edges:
                        if start_node_edge == current_node:
                            if end_node_edge not in visited_nodes:
                                nodes_to_visit.append(end_node_edge)
                                visited_nodes.add(end_node_edge)

            # Check if all nodes are reachable from this subset
            if reachable_nodes == all_nodes:
                if subset_size < len(minimum_size_subset):
                    minimum_size_subset = list(combination)

    return sorted(minimum_size_subset)

# This function will determine the potential starting points
# from which we can reach all other nodes.
# First, we try single element subsets, then pairs and so on.

Big(O) Analysis

Time Complexity
O(2^n * n)The brute force approach iterates through all possible subsets of vertices to find the minimum set that can reach all nodes. There are 2^n possible subsets of vertices, where n is the number of vertices. For each subset, we need to perform a traversal (e.g., BFS or DFS) to check if all nodes are reachable. This traversal takes O(n+m) time, where m is the number of edges. In the worst case, m can be O(n^2), but we can consider it O(n) here given that the traversal will stop once all nodes are visited. Therefore, for each subset, the reachability check takes O(n) time. Thus, the overall time complexity is O(2^n * n).
Space Complexity
O(2^N)The brute force approach explores all possible combinations of vertices. This implies that in the worst-case scenario, we might need to generate all subsets of the set of vertices to see which combination can reach all nodes. Representing these subsets requires storing them, even temporarily, which in the worst case leads to checking all 2^N possible subsets, where N is the number of vertices. Each subset requires storing its elements. Therefore, the space complexity is dominated by the storage of these subsets. Thus, the auxiliary space required is O(2^N).

Optimal Solution

Approach

The problem asks for the fewest starting points needed to visit every location in a network. The key insight is to identify locations that are only reachable from other locations, meaning nothing points to them. These 'entry point' locations must be our starting points.

Here's how the algorithm would work step-by-step:

  1. Think of the network as a map where some locations have routes leading into them and out of them.
  2. Identify locations that have routes leading into them. These locations are reachable from somewhere else.
  3. Now, find the locations that *do not* have any routes leading into them. These are locations you can only get to if you start there.
  4. These locations without incoming routes are your required starting points. You need to start at each of these to be able to reach every location on the map.
  5. The set of these starting point locations is the smallest set of locations needed to reach every location in the network.

Code Implementation

def find_smallest_set_of_vertices(number_of_nodes, edges):
    incoming_edges = [False] * number_of_nodes

    # Mark nodes with incoming edges
    for edge_start, edge_end in edges:
        incoming_edges[edge_end] = True

    minimum_set = []

    # Add nodes without incoming edges to result
    # These are the entry points to the graph.
    for node_index in range(number_of_nodes):
        if not incoming_edges[node_index]:
            minimum_set.append(node_index)

    return minimum_set

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the list of edges once, where each edge represents a route between two locations. Let 'n' represent the number of edges. The algorithm identifies the locations with incoming routes, effectively marking them as reachable. The second part of the algorithm, implicitly, involves identifying those locations that were *not* marked as reachable during the edge processing. Since each edge is visited only once and other operations are constant time, the time complexity is directly proportional to the number of edges, 'n'. Therefore, the time complexity is O(n).
Space Complexity
O(N)The algorithm identifies locations with incoming routes. This implicitly means storing, at most, all N vertices (locations) to check if a route leads to them, which requires auxiliary space. Therefore, the auxiliary space used is proportional to the number of vertices, N. This results in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Empty graph (n=0)Return an empty list, as there are no vertices to select.
Graph with a single node and no edgesReturn a list containing only the single node (index 0).
Graph where all nodes are reachable from a single source nodeThe solution should return only the source node's index.
Graph with a cycle; all nodes are mutually reachableThe solution should return nodes with in-degree 0 (if any), as those are the entry points.
Maximum number of nodes and edges (scalability)Ensure the chosen algorithm (e.g., based on in-degree calculation) performs efficiently with large graphs, potentially using appropriate data structures to avoid O(n^2) complexity.
Disconnected graph; multiple components requiring multiple source nodesThe solution correctly identifies and returns all nodes with in-degree 0, representing the entry points of each disconnected component.
Graph represented with self-loops (an edge from a node to itself)Self-loops do not affect the in-degree of other nodes and do not need special treatment, as the algorithm only cares about incoming edges from *other* nodes.
Graph where a node has edges to all other nodesThis node won't be in the result, and the other nodes' in-degrees will be updated appropriately.