Taro Logo

Minimize Malware Spread II

Hard
Dropbox logo
Dropbox
2 views
Topics:
ArraysGraphs

You are given a network of n nodes represented as an n x n adjacency matrix graph, where the ith node is directly connected to the jth node if graph[i][j] == 1.

Some nodes initial are initially infected by malware. Whenever two nodes are directly connected, and at least one of those two nodes is infected by malware, both nodes will be infected by malware. This spread of malware will continue until no more nodes can be infected in this manner.

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops.

We will remove exactly one node from initial, completely removing it and any connections from this node to any other node.

Return the node that, if removed, would minimize M(initial). If multiple nodes could be removed to minimize M(initial), return such a node with the smallest index.

Example 1:

Input: graph = [[1,1,0],[1,1,0],[0,0,1]], initial = [0,1]
Output: 0

Example 2:

Input: graph = [[1,1,0],[1,1,1],[0,1,1]], initial = [0,1]
Output: 1

Example 3:

Input: graph = [[1,1,0,0],[1,1,1,0],[0,1,1,1],[0,0,1,1]], initial = [0,1]
Output: 1

Constraints:

  • n == graph.length
  • n == graph[i].length
  • 2 <= n <= 300
  • graph[i][j] is 0 or 1.
  • graph[i][j] == graph[j][i]
  • graph[i][i] == 1
  • 1 <= initial.length < n
  • 0 <= initial[i] <= n - 1
  • All the integers in initial are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum size of the `graph` (number of nodes) and the `initial` array?
  2. Can the nodes in the `graph` be disconnected, meaning some nodes might not be reachable from others, including those in `initial`?
  3. If after removing a single initially infected node, multiple nodes prevent the spread of malware equally, which one should I return? Is there a specific tie-breaking criteria, such as the node with the smallest index?
  4. Can the `initial` array contain duplicate nodes? If so, how should they be handled?
  5. If removing any node from `initial` does not reduce the spread of malware, what should I return?

Brute Force Solution

Approach

The brute force method for minimizing malware spread involves trying to remove each initially infected computer, one at a time, and then observing the resulting spread of malware. We want to find which removal leads to the least malware spread in the network.

Here's how the algorithm would work step-by-step:

  1. First, consider removing the first infected computer from the network.
  2. Then, simulate how the malware would spread from the remaining infected computers to the uninfected ones.
  3. Count the total number of computers that are infected after the spread.
  4. Repeat this process of removing and simulating the spread for each of the initially infected computers.
  5. After trying removing each infected computer individually, compare the total number of infected computers that resulted from each removal.
  6. Finally, select the infected computer whose removal led to the smallest total number of infected computers after the simulated spread. If multiple removals lead to the same smallest spread, pick any of them.

Code Implementation

def minimize_malware_spread_two(graph, initial):
    min_infected = float('inf')
    best_removal = -1

    for node_to_remove_index in range(len(initial)):
        node_to_remove = initial[node_to_remove_index]
        
        # Simulate removal by creating new initial list
        current_initial = initial[:node_to_remove_index] + initial[node_to_remove_index+1:]
        infected = set(current_initial)
        
        stack = list(current_initial)

        # Perform breadth-first search to simulate spread
        while stack:
            current_node = stack.pop(0)
            for neighbor, is_connected in enumerate(graph[current_node]):
                if is_connected == 1 and neighbor not in infected:
                    infected.add(neighbor)
                    stack.append(neighbor)

        # Update minimum infected count and node to remove
        if len(infected) < min_infected:
            min_infected = len(infected)
            best_removal = node_to_remove
        elif len(infected) == min_infected and node_to_remove < best_removal:
            best_removal = node_to_remove

    # Returning best removal
    if best_removal == -1:
        best_removal = min(initial)
    return best_removal

Big(O) Analysis

Time Complexity
O(n * m * n)The outer loop iterates through each initially infected node, which takes O(n) time, where n is the number of initially infected nodes. Inside the outer loop, simulating the malware spread involves traversing the graph using Breadth-First Search (BFS) or Depth-First Search (DFS). In the worst case, the graph can have m edges and n nodes where m can be O(n^2). Furthermore, for each infected node, we might have to check all other nodes to see if they are reachable and thus get infected. Thus, the BFS/DFS can take O(m + n) time. Thus the entire algorithm can be described as iterating through n elements, each of which may require O(m + n) operations. If m is less than n then this simplifies to O(n * n) and if m is O(n^2), then the algorithm is O(n * (n^2 + n)) = O(n * n^2) = O(n^3) . It is important to note that m is less than n^2, therefore the overall time complexity can be more precisely described as O(n * m * n) since BFS/DFS can visit each node once. This represents iterating through each infected node and computing the spread.
Space Complexity
O(N)The algorithm simulates malware spread by potentially using a visited set or similar data structure to keep track of infected nodes during the simulation. In the worst case, all nodes in the network could become infected, thus N nodes would need to be stored in the visited data structure, where N is the number of computers. Therefore, the auxiliary space required for the simulation is O(N).

Optimal Solution

Approach

The goal is to find the single infected computer that, if removed, would stop the spread of malware the most. The best way to do this involves figuring out, for each infected computer, which other computers it exclusively affects.

Here's how the algorithm would work step-by-step:

  1. First, identify all the connected components in the network based on which computers can directly or indirectly infect each other.
  2. For each infected computer, determine which other computers in the network are only reachable through that infected computer. These are the computers that would be saved if that specific infected computer were removed.
  3. Count the number of computers that are exclusively dependent on each infected computer. This tells us how effective removing that computer would be.
  4. Choose the infected computer that, when removed, would save the largest number of exclusively dependent computers. This computer should be removed to minimize malware spread.
  5. If there are multiple infected computers that would save the same maximum number of computers, choose the one with the smallest identification number.

Code Implementation

def minMalwareSpread(graph, initial):
    number_of_nodes = len(graph)
    infected = set(initial)

    def depthFirstSearch(node, visited, component):
        visited[node] = True
        component.add(node)
        for neighbor in range(number_of_nodes):
            if graph[node][neighbor] == 1 and not visited[neighbor]:
                depthFirstSearch(neighbor, visited, component)

    components = []
    visited = [False] * number_of_nodes

    # Identify all connected components.
    for node in range(number_of_nodes):
        if not visited[node]:
            component = set()
            depthFirstSearch(node, visited, component)
            components.append(component)

    node_to_component = {}
    for i, component in enumerate(components):
        for node in component:
            node_to_component[node] = i

    component_to_infected_nodes = {}
    for i in range(len(components)):
        component_to_infected_nodes[i] = []
    for infected_node in infected:
        component_index = node_to_component[infected_node]
        component_to_infected_nodes[component_index].append(infected_node)

    saved_nodes = [0] * number_of_nodes

    # Count computers exclusively dependent on each infected computer.
    for component_index, infected_nodes_in_component in component_to_infected_nodes.items():
        if len(infected_nodes_in_component) == 1:
            infected_node = infected_nodes_in_component[0]
            saved_nodes[infected_node] = len(components[component_index])

    max_saved = 0
    result = min(initial)

    # Choose the infected computer to remove to maximize saved computers.
    for infected_node in initial:
        if saved_nodes[infected_node] > max_saved:
            max_saved = saved_nodes[infected_node]
            result = infected_node
        elif saved_nodes[infected_node] == max_saved and infected_node < result:
            result = infected_node

    return result

Big(O) Analysis

Time Complexity
O(n²)The algorithm first identifies connected components, which can be done using Depth First Search (DFS) or Breadth First Search (BFS). In the worst case, DFS/BFS visits each node and edge once, taking O(n + E) time, where n is the number of computers and E is the number of edges. Since E can be O(n^2) in the worst case where all computers are connected, connected component identification is O(n^2). Then, for each infected computer, we check which other computers are only reachable through it. This involves traversing the graph again for each infected node in the worst case, leading to another O(n^2) operation. Thus, total operations approximate n*n, and are simplified to O(n²).
Space Complexity
O(N)The space complexity is dominated by the auxiliary data structures used to represent connected components and track visited nodes. The algorithm likely uses a Disjoint Set Union (DSU) data structure or Depth-First Search (DFS) or Breadth-First Search (BFS) for identifying connected components, each potentially requiring space proportional to the number of computers (N). Additionally, a data structure (e.g., a boolean array or a set) is needed to keep track of visited computers during the search process, which also scales with N. Therefore, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty graph (no connections)If the graph is empty, removing any initial node won't prevent any spread, so return -1 if all nodes are initially infected.
No initially infected nodesIf no nodes are initially infected, removing any node will not impact the spread; return 0 if no nodes are initially infected.
All nodes are initially infectedIf all nodes are initially infected, removing any single node will not change the final infection state; return -1.
Graph contains disconnected components, and some components have initially infected nodes while others don'tProcess each connected component separately and analyze the impact of removing a node within the initially infected components.
Single infected node connected to multiple uninfected nodesRemoving this infected node prevents all connected uninfected nodes from being infected, yielding a good result.
Large graph with many nodes and edges (scalability)Employ Union-Find or Depth-First Search (DFS) efficiently to handle large graphs without exceeding time limits by optimizing the component finding process.
Complete graph (every node connected to every other node)Removing a single infected node might only marginally decrease the malware spread if other infected nodes exist, so correctly account for overlapping infected areas.
Graph with cyclesDFS or BFS will correctly traverse cycles, and Union-Find naturally handles cycles during component merging, without stack overflow risks.