Remove Methods From Project

Medium

Asked by:

0 views

Topics:

GraphsArrays

You are maintaining a project that has n methods numbered from 0 to n - 1.

You are given two integers n and k, and a 2D integer array invocations, where invocations[i] = [a_i, b_i] indicates that method a_i invokes method b_i.

There is a known bug in method k. Method k, along with any method invoked by it, either directly or indirectly, are considered suspicious and we aim to remove them.

A group of methods can only be removed if no method outside the group invokes any methods within it.

Return an array containing all the remaining methods after removing all the suspicious methods. You may return the answer in any order. If it is not possible to remove all the suspicious methods, none should be removed.

Example 1:

Input: n = 4, k = 1, invocations = [[1,2],[0,1],[3,2]]

Output: [0,1,2,3]

Explanation:

Method 2 and method 1 are suspicious, but they are directly invoked by methods 3 and 0, which are not suspicious. We return all elements without removing anything.

Example 2:

Input: n = 5, k = 0, invocations = [[1,2],[0,2],[0,1],[3,4]]

Output: [3,4]

Explanation:

Methods 0, 1, and 2 are suspicious and they are not directly invoked by any other method. We can remove them.

Example 3:

Input: n = 3, k = 2, invocations = [[1,2],[0,1],[2,0]]

Output: []

Explanation:

All methods are suspicious. We can remove them.

Constraints:

1 <= n <= 10⁵
0 <= k <= n - 1
0 <= invocations.length <= 2 * 10⁵
invocations[i] == [a_i, b_i]
0 <= a_i, b_i <= n - 1
a_i != b_i
invocations[i] != invocations[j]

Try coding it in LeetCode

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

To confirm the structure of the problem, if `invocations` contains `[a, b]`, does this represent a directed call from method `a` to method `b`?
The problem states that if it's not possible to remove all suspicious methods, none are removed. Does this mean I should first identify the complete set of suspicious methods, and then perform a single check on that entire set to decide if the whole group can be removed?
Can the graph of method invocations contain cycles? For instance, is it possible for method `a` to invoke `b`, which in turn invokes `a`?
How should I handle methods that are not mentioned in the `invocations` array at all? Should they be treated as non-suspicious, unless one of them is the initial buggy method `k`?
Is the set of suspicious methods determined statically at the beginning? Meaning, we find all methods reachable from `k` once, and this set doesn't change while we check the removal condition?

Brute Force Solution

Approach

The approach is to start with the methods we're initially told to remove and then repeatedly scan the entire project. In each scan, we find other methods that have become useless because all the methods that depended on them are also being removed.

Here's how the algorithm would work step-by-step:

First, make a list of the initial methods that you are required to remove from the project.
Now, we'll repeatedly check all the other methods to see if any have become unnecessary as a result of these removals.
In a single pass, you look at every method that is not yet on your removal list.
For each of these methods, you check all the other parts of the project that use it.
If every part of the project that used this method is already on the removal list, it means this method is now an 'orphan'.
Since it's an orphan, add it to your list of methods to be removed.
Repeat this entire process of checking for new orphans until you can go through a whole pass without finding any.
Once a full pass results in no new methods being added to the removal list, the process is done, and that list is your final answer.

Code Implementation

def remove_methods_brute_force(num_methods, methods_to_remove, dependencies):
    all_method_indices = range(num_methods)
    min_broken_methods = float('inf')
    best_group_to_remove = []

    # To meet the brute-force requirement, we must generate every single possible group of methods to remove.

    all_possible_groups = itertools.combinations(all_method_indices, methods_to_remove)

    for group_to_remove in all_possible_groups:
        removed_methods_set = set(group_to_remove)
        current_broken_count = 0

        # We iterate through all methods to check which ones remain and might be broken by the removal.

        for method_index in all_method_indices:
            if method_index in removed_methods_set:
                continue

            is_method_broken = False
            # A method is broken if any of its direct dependencies are in the group slated for removal.

            for dependency in dependencies[method_index]:
                if dependency in removed_methods_set:
                    is_method_broken = True
                    break
            
            if is_method_broken:
                current_broken_count += 1
        
        # If the current group results in fewer broken methods, it becomes our new best solution.

        if current_broken_count < min_broken_methods:
            min_broken_methods = current_broken_count
            best_group_to_remove = list(group_to_remove)
            
    return best_group_to_remove

Big(O) Analysis

Time Complexity

O(n³) – Let n be the total number of methods. The solution repeatedly scans the project until a pass adds no new methods to the removal list, which can take up to n passes in the worst case. During a single pass, the algorithm iterates through all n methods to see if any can be removed. To determine if a specific method is an orphan, the described approach is to check every other of the n methods to see if it's a caller that hasn't been removed. This creates a nested loop, meaning a single pass costs O(n²). Therefore, the total time complexity is the number of passes multiplied by the cost of each pass, or n * n², which simplifies to O(n³).

Space Complexity

O(N) – The primary use of auxiliary space is the list storing the methods to be removed, as mentioned in the first step. Let N be the total number of methods in the project. This list is initially populated and then grows as more 'orphan' methods are found and added to it during subsequent passes. In the worst-case scenario, every method could eventually be added to the removal list, making its size proportional to N. Therefore, the auxiliary space required is O(N).

Optimal Solution

Approach

Instead of directly figuring out which methods to remove, the trick is to flip the problem. We find the largest possible group of methods that can work together without any circular dependencies. Any method left out of this group is one that must be removed.

Here's how the algorithm would work step-by-step:

Flip the Problem: Instead of minimizing removals, focus on finding the maximum number of methods you can keep. The goal is to find the largest possible group of methods that has no circular dependencies among them.
Build Up from Small Groups: We'll systematically check every possible group of methods, starting with individual methods and then progressively larger combinations.
The 'Safe Group' Test: A group is considered 'safe' (cycle-free) if it contains at least one 'starter method'. A starter method is one that doesn't depend on any other method within that same group.
Reuse Previous Results: This is the clever part. To test a new, larger group, find a starter method and imagine it's gone. This leaves a smaller group. If we've already determined that this smaller group is safe, then the larger group must be safe too.
Track the Largest Safe Group: Throughout this process of checking groups, keep a record of the biggest safe group you've found so far.
Get the Final Answer: Once all groups have been considered, the number to remove is the total number of methods minus the size of the largest safe group you found.

Code Implementation

def remove_methods_from_project(methods, dependencies):
    memoization_cache = {}

    def _find_one_cycle(nodes_in_subgraph, graph):
        path_parents = {}
        visited_nodes = set()
        
        for start_node in nodes_in_subgraph:
            if start_node in visited_nodes:
                continue

            recursion_stack_nodes = set()
            dfs_stack = [(start_node, iter(graph.get(start_node, [])))]
            visited_nodes.add(start_node)

            while dfs_stack:
                current_node, children_iterator = dfs_stack[-1]
                recursion_stack_nodes.add(current_node)

                try:
                    child_node = next(children_iterator)
                    if child_node not in nodes_in_subgraph:
                        continue

                    if child_node in recursion_stack_nodes:
                        cycle_path = [child_node]
                        parent = current_node
                        while parent != child_node:
                            cycle_path.append(parent)
                            parent = path_parents[parent]
                        cycle_path.reverse()
                        return cycle_path

                    if child_node not in visited_nodes:
                        visited_nodes.add(child_node)
                        path_parents[child_node] = current_node
                        dfs_stack.append((child_node, iter(graph.get(child_node, []))))
                except StopIteration:
                    recursion_stack_nodes.remove(current_node)
                    dfs_stack.pop()
        return []

    def _solve_recursive(current_methods_tuple):
        if current_methods_tuple in memoization_cache:
            return memoization_cache[current_methods_tuple]
        
        current_methods_set = set(current_methods_tuple)
        if not current_methods_set:
            return 0
        
        graph = {method: [] for method in current_methods_set}
        reverse_graph = {method: [] for method in current_methods_set}
        out_degree = {method: 0 for method in current_methods_set}
        in_degree = {method: 0 for method in current_methods_set}

        for caller, callee in dependencies:
            if caller in current_methods_set and callee in current_methods_set:
                graph[caller].append(callee)
                reverse_graph[callee].append(caller)
                out_degree[caller] += 1
                in_degree[callee] += 1
        
        # Pruning non-cyclic methods drastically reduces the search space, focusing only on the core problem.
        simplification_queue = [
            method for method in current_methods_set if in_degree[method] == 0 or out_degree[method] == 0
        ]
        
        tangled_methods = set(current_methods_set)
        queue_head_index = 0
        while queue_head_index < len(simplification_queue):
            method_to_remove = simplification_queue[queue_head_index]
            queue_head_index += 1
            
            if method_to_remove not in tangled_methods:
                continue
            
            tangled_methods.remove(method_to_remove)
            
            for callee_neighbor in graph.get(method_to_remove, []):
                if callee_neighbor in tangled_methods:
                    in_degree[callee_neighbor] -= 1
                    if in_degree[callee_neighbor] == 0:
                        simplification_queue.append(callee_neighbor)
            
            for caller_neighbor in reverse_graph.get(method_to_remove, []):
                if caller_neighbor in tangled_methods:
                    out_degree[caller_neighbor] -= 1
                    if out_degree[caller_neighbor] == 0:
                        simplification_queue.append(caller_neighbor)
                        
        if not tangled_methods:
            return 0
        
        # By breaking one cycle, we guarantee progress, and exploring its nodes as removal candidates is sufficient.
        cycle_to_break = _find_one_cycle(tangled_methods, graph)
        
        min_removals = float('inf')
        
        # We explore removing each node in the found cycle, as this guarantees breaking it in an optimal way.
        for method_in_cycle in cycle_to_break:
            next_methods = list(current_methods_set)
            next_methods.remove(method_in_cycle)
            next_methods_tuple = tuple(sorted(next_methods))
            
            removals_for_branch = 1 + _solve_recursive(next_methods_tuple)
            min_removals = min(min_removals, removals_for_branch)

        # Caching results for a given set of methods prevents re-solving identical subproblems.
        memoization_cache[current_methods_tuple] = min_removals
        return min_removals

    initial_methods_tuple = tuple(sorted(list(methods)))
    return _solve_recursive(initial_methods_tuple)

Big(O) Analysis

Time Complexity

O(n * 2^n) – The solution's cost is driven by the need to evaluate every possible subset of methods. With n methods, there are 2^n distinct subsets to consider, which forms the basis of the main calculation. For each of these 2^n subsets, the algorithm checks up to n methods within it to find a 'starter method' that validates the group's safety by looking up a previously computed result. This creates a nested dependency where the outer work is 2^n and the inner work is n. The total operations are therefore proportional to n multiplied by 2^n, which simplifies to O(n * 2^n).

Space Complexity

O(2^N) – The core of this algorithm is to 'Reuse Previous Results,' which requires storing the outcome of the 'Safe Group' test for every possible subset of methods. Where N is the number of methods, there are 2^N such subsets, which are typically represented by bitmasks. The auxiliary space is therefore dominated by a memoization table, such as an array or map, of size 2^N to store the pre-computed safety status of each subset. Although the recursion depth for checking a single subset is at most N, this is overshadowed by the exponential space required for the memoization table, making the overall space complexity O(2^N).

Edge Cases

Case	How to Handle
The 'invocations' array is empty.	The only suspicious method is k, which has no incoming edges, so it is always removed and the remaining n-1 methods are returned.
The method invocation graph is disconnected.	The traversal to find suspicious methods correctly explores only k's component, and the validity check properly considers invocations from other components.
The buggy method k is part of a dependency cycle.	A graph traversal using a visited set correctly identifies all methods in the cycle and those reachable from it as suspicious without getting into an infinite loop.
All methods in the project are suspicious.	The check for external invocations becomes trivial as no non-suspicious methods exist, so all methods are removed and an empty array is returned.
A non-suspicious method invokes a suspicious one.	This is the primary failure condition, causing the entire removal operation to be aborted and all n methods to be returned.
The buggy method k has no outgoing invocations.	The set of suspicious methods will only contain k, and its removal depends solely on whether any other method invokes it.
The buggy method k is an isolated node with no invocations.	The suspicious set only contains k, and since it has no incoming edges from other methods, it is always removed.
Large inputs with maximum n and number of invocations.	An efficient graph-based solution with O(n + \|invocations\|) complexity is required to avoid a timeout.

The 'invocations' array is empty.

How to Handle:

The only suspicious method is k, which has no incoming edges, so it is always removed and the remaining n-1 methods are returned.

The method invocation graph is disconnected.

How to Handle:

The traversal to find suspicious methods correctly explores only k's component, and the validity check properly considers invocations from other components.

The buggy method k is part of a dependency cycle.

How to Handle:

A graph traversal using a visited set correctly identifies all methods in the cycle and those reachable from it as suspicious without getting into an infinite loop.

All methods in the project are suspicious.

How to Handle:

The check for external invocations becomes trivial as no non-suspicious methods exist, so all methods are removed and an empty array is returned.

A non-suspicious method invokes a suspicious one.

How to Handle:

This is the primary failure condition, causing the entire removal operation to be aborted and all n methods to be returned.

The buggy method k has no outgoing invocations.

How to Handle:

The set of suspicious methods will only contain k, and its removal depends solely on whether any other method invokes it.

The buggy method k is an isolated node with no invocations.

How to Handle:

The suspicious set only contains k, and since it has no incoming edges from other methods, it is always removed.

Large inputs with maximum n and number of invocations.

How to Handle:

An efficient graph-based solution with O(n + |invocations|) complexity is required to avoid a timeout.