Taro Logo

Minimize Malware Spread

Hard
Salesforce logo
Salesforce
0 views
Topics:
Graphs

You are given a network of n nodes represented as an n x n adjacency matrix graph, where the ith node is directly connected to the jth node if graph[i][j] == 1.

Some nodes initial are initially infected by malware. Whenever two nodes are directly connected, and at least one of those two nodes is infected by malware, both nodes will be infected by malware. This spread of malware will continue until no more nodes can be infected in this manner.

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops. We will remove exactly one node from initial.

Return the node that, if removed, would minimize M(initial). If multiple nodes could be removed to minimize M(initial), return such a node with the smallest index.

Note that if a node was removed from the initial list of infected nodes, it might still be infected later due to the malware spread.

Example 1:

Input: graph = [[1,1,0],[1,1,0],[0,0,1]], initial = [0,1]
Output: 0

Example 2:

Input: graph = [[1,0,0],[0,1,0],[0,0,1]], initial = [0,2]
Output: 0

Example 3:

Input: graph = [[1,1,1],[1,1,1],[1,1,1]], initial = [1,2]
Output: 1

Constraints:

  • n == graph.length
  • n == graph[i].length
  • 2 <= n <= 300
  • graph[i][j] is 0 or 1.
  • graph[i][j] == graph[j][i]
  • graph[i][i] == 1
  • 1 <= initial.length <= n
  • 0 <= initial[i] <= n - 1
  • All the integers in initial are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum size of the `graph` and `initial` arrays? Are we expecting this to scale to very large graphs?
  2. Can the `initial` array contain duplicate nodes? If so, how should I handle them?
  3. Are the node labels in `graph` and `initial` guaranteed to be within the range [0, n-1], where n is the number of nodes?
  4. If no matter which initial node we choose to start from, the malware spread is the same, is there a tie-breaker to determine which node to return? For instance, should I return the node with the smallest index?
  5. If the `initial` array is empty, what should I return?

Brute Force Solution

Approach

The brute force method for minimizing malware spread involves checking every possible way to isolate one of the initially infected computers. We essentially try removing each infected computer one by one and then simulate how the malware spreads from the remaining infected computers to see how bad the outbreak becomes.

Here's how the algorithm would work step-by-step:

  1. First, consider each of the initially infected computers as if it were the one we are going to isolate.
  2. For each of these computers, temporarily remove it from the network and imagine the malware spreading from all the other initially infected computers.
  3. Simulate the spread by seeing which uninfected computers are reached by the remaining infected ones. Think of it like a chain reaction.
  4. Count how many computers end up being infected during this simulated spread.
  5. Once we've gone through all the initially infected computers and calculated the number of infections for each removal scenario, we compare the results.
  6. The computer whose removal results in the fewest overall infections after the simulated spread is the one we should isolate. If there are multiple options that result in the same (minimum) amount of infections, return the one with the smallest identification number.

Code Implementation

def minimize_malware_spread_brute_force(graph, initial):
    number_of_nodes = len(graph)
    min_infected = number_of_nodes + 1
    best_node = -1

    for node_to_remove in initial:
        # Try removing each initially infected node

        infected_count = 0
        visited = [False] * number_of_nodes
        remaining_infected = [node for node in initial if node != node_to_remove]

        queue = remaining_infected[:]
        for infected_node in remaining_infected:
            visited[infected_node] = True

        while queue:
            current_node = queue.pop(0)
            infected_count += 1

            # Standard BFS
            for neighbor in range(number_of_nodes):
                if graph[current_node][neighbor] == 1 and not visited[neighbor]:
                    visited[neighbor] = True
                    queue.append(neighbor)

        # Compare current infection count and choose the best
        if infected_count < min_infected:
            min_infected = infected_count
            best_node = node_to_remove
        elif infected_count == min_infected and node_to_remove < best_node:
            best_node = node_to_remove

    # Handle the case where the initial list is empty
    if best_node == -1:
        return min(initial) if initial else 0

    return best_node

Big(O) Analysis

Time Complexity
O(n * m)The outer loop iterates through each of the initially infected computers, which has a size of n. For each infected computer, we perform a breadth-first search (BFS) or depth-first search (DFS) to simulate the malware spread. The time complexity of BFS/DFS is O(V + E), where V is the number of vertices (computers) and E is the number of edges (connections) in the graph. In the worst case, we visit all computers, and the number of connections can be proportional to the number of computers, giving us O(m), where m is the size of the entire graph (V + E). Therefore, the overall time complexity is O(n * m).
Space Complexity
O(N)The algorithm simulates the spread of malware using a data structure to keep track of visited computers during the spread simulation. This typically involves a boolean array or set to mark visited nodes, which will have a size proportional to N, the number of computers in the network. Furthermore, recursion or a queue might be used for the spread simulation, potentially adding up to a maximum depth or size of N in the worst-case scenario. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

The goal is to minimize the spread of malware by strategically vaccinating computers. The core idea is to identify groups of computers that are only connected to each other and then vaccinate the most crucial computer in each group to prevent the spread of malware within that group.

Here's how the algorithm would work step-by-step:

  1. First, determine which computers are connected. Treat the network as groups of computers.
  2. For each group, find out which computer, if initially infected, would spread the malware to other infected computers outside of the group the most.
  3. If there is a tie for which computer is the most influential, do not include those computer(s) from consideration.
  4. Now, count the number of infected computers that are not included and that also belong to their own isolated group.
  5. Finally, return the computer with the smallest number representing the initial infected computer.

Code Implementation

def minimize_malware_spread(graph, initial_infected):    number_of_nodes = len(graph)
    def depth_first_search(start_node, visited_nodes):
        visited_nodes.add(start_node)
        for neighbor in range(number_of_nodes):
            if graph[start_node][neighbor] == 1 and neighbor not in visited_nodes:
                depth_first_search(neighbor, visited_nodes)

    components = []
    visited = set()
    for node in range(number_of_nodes):
        if node not in visited:
            component = set()
            depth_first_search(node, component)
            components.append(component)
            visited |= component

    influence = {}
    for initial in initial_infected:
        for component in components:
            if initial in component:
                for node in initial_infected:
                    if node not in component:
                        influence[initial] = influence.get(initial, 0) + 1

    max_influence = 0
    candidates = []
    for initial in influence:
        if influence[initial] > max_influence:
            max_influence = influence[initial]
            candidates = [initial]
        elif influence[initial] == max_influence:
            candidates.append(initial)

    # Remove infected nodes that have a tie in influence
    if len(candidates) == 1:
        return candidates[0]

    # Count nodes that are initially infected
    single_node_components_count = 0
    for initial in initial_infected:
        is_single_node_component = True
        for component in components:
            if initial in component and len(component) > 1:
                is_single_node_component = False
                break
        if is_single_node_component:
            single_node_components_count += 1

    # Ensure we return the smallest initial node
    return min(initial_infected)

Big(O) Analysis

Time Complexity
O(n^2)The dominant operations in this algorithm stem from determining connected components and identifying the most influential infected computer within each component. Finding connected components typically involves iterating through each node (n) and potentially exploring its neighbors (up to n in the worst case if all nodes are connected). Additionally, finding the 'most influential' infected computer within each component involves iterating through the infected nodes within that component (at most n) and simulating the spread of malware to other groups, which could again involve exploring up to n nodes. Thus, the overall time complexity is governed by these nested iterations, resulting in O(n^2) in the worst-case scenario where connections are dense and each infected node needs to be evaluated against all other nodes.
Space Complexity
O(N)The space complexity is primarily determined by the data structures used to represent the connected components and track visited nodes during the group identification and analysis. The algorithm uses a data structure to represent the connected graph, which could be an adjacency list or a similar structure. In the worst case, this structure can take O(N) space where N is the number of computers. Therefore, the auxiliary space used is O(N).

Edge Cases

CaseHow to Handle
Empty graph (no nodes or edges)Return the initial infected node if graph is empty; if the initial infected is also empty return -1
No initial infected nodesIf no initial nodes are infected, return the node that would minimize spread by calculating its connected component size.
All nodes are initially infectedIf all nodes are initially infected, no further spread can occur, so return the smallest index initial infected node.
Single connected component in the graphIf all nodes are reachable from each other, infecting any node minimizes spread to zero beyond the initially infected set.
Graph with multiple disjoint connected componentsThe algorithm should correctly identify and analyze each connected component separately to minimize the overall spread.
Large graph with many nodes and edges exceeding memory limitsConsider alternative data structures or algorithms (e.g., iterative deepening DFS) to reduce memory footprint or optimize space complexity.
Integer overflow when calculating the size of connected componentsUse a data type with sufficient capacity (e.g., long) to store the size of the components, or validate sizes to avoid overflow.
Cycles present in the graphThe DFS or BFS implementation should correctly handle cycles to prevent infinite loops or incorrect connected component size calculation using a 'visited' set.