Taro Logo

Count the Number of Infection Sequences

Hard
SAP logo
SAP
2 views
Topics:
ArraysDynamic Programming

You are given an integer n and an array sick sorted in increasing order, representing positions of infected people in a line of n people.

At each step, one uninfected person adjacent to an infected person gets infected. This process continues until everyone is infected.

An infection sequence is the order in which uninfected people become infected, excluding those initially infected.

Return the number of different infection sequences possible, modulo 109+7.

Example 1:

Input: n = 5, sick = [0,4]

Output: 4

Explanation:

There is a total of 6 different sequences overall.

  • Valid infection sequences are [1,2,3], [1,3,2], [3,2,1] and [3,1,2].
  • [2,3,1] and [2,1,3] are not valid infection sequences because the person at index 2 cannot be infected at the first step.

Example 2:

Input: n = 4, sick = [1]

Output: 3

Explanation:

There is a total of 6 different sequences overall.

  • Valid infection sequences are [0,2,3], [2,0,3] and [2,3,0].
  • [3,2,0], [3,0,2], and [0,3,2] are not valid infection sequences because the infection starts at the person at index 1, then the order of infection is 2, then 3, and hence 3 cannot be infected earlier than 2.

Constraints:

  • 2 <= n <= 105
  • 1 <= sick.length <= n - 1
  • 0 <= sick[i] <= n - 1
  • sick is sorted in increasing order.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the values of 'n' and 'k'? Specifically, what is the maximum possible value for each?
  2. If 'k' is 0 or 'k' is greater than 'n', what should the function return?
  3. Are the 'k' initially infected positions predetermined, or do I need to consider all possible combinations of initial infected positions?
  4. Can you provide a small example (e.g., n=4, k=2) and the expected number of distinct infection sequences to confirm my understanding?
  5. Should the infection only spread to directly adjacent positions, or can it 'jump' over uninfected positions?

Brute Force Solution

Approach

The brute force strategy for this problem involves exploring every single potential infection sequence. We essentially try out all the possible orderings in which infections could spread. By examining each ordering, we identify and count the ones that result in a valid infection scenario.

Here's how the algorithm would work step-by-step:

  1. Consider all possible orders in which the infections could occur.
  2. For each possible order, simulate the infection spreading according to that order.
  3. Check if the simulated infection spread is a valid sequence based on the infection rules.
  4. If the sequence is valid, count it.
  5. After checking all possible orders, the total count gives us the number of valid infection sequences.

Code Implementation

import itertools

def count_infection_sequences_brute_force(initial_infected, can_infect):
    number_of_individuals = len(can_infect)
    infection_sequence_count = 0

    # Generate all possible orderings of individuals
    for infection_order in itertools.permutations(range(number_of_individuals)):
        infected = set(initial_infected)
        valid_sequence = True

        # Simulate the infection spreading process
        for individual_index in infection_order:
            if individual_index in infected:
                continue

            can_be_infected = True
            for source_index in initial_infected:
                if source_index == individual_index:
                    continue
                if source_index in can_infect[individual_index]:
                    can_be_infected = False
                    break

            infectable_neighbors = set(can_infect[individual_index]) & infected

            if not infectable_neighbors:
                continue

            # Infection rule: infected neighbors must be able to infect the current individual
            can_infect_all = all(
                individual_index in can_infect[source_index] for source_index in infectable_neighbors
            )

            if can_infect_all:
                infected.add(individual_index)

        # Ensure the final infected set matches
        final_infected = set()
        queue = list(initial_infected)
        visited = set(initial_infected)

        while queue:
            current_person = queue.pop(0)
            final_infected.add(current_person)

            for next_person in range(number_of_individuals):
                if (next_person not in visited and
                        current_person in can_infect[next_person] and
                        next_person in can_infect[current_person]):
                    queue.append(next_person)
                    visited.add(next_person)

        if infected == final_infected:
            infection_sequence_count += 1

    return infection_sequence_count

Big(O) Analysis

Time Complexity
O(n!)The described brute force strategy involves considering all possible orderings of infections. Generating all permutations of n items (where n is the number of infections) takes O(n!) time. For each permutation, we then simulate the infection spreading which may take at most O(n) time to check if it's a valid sequence. Therefore, the overall time complexity is O(n! * n), which simplifies to O(n!) because the factorial term dominates.
Space Complexity
O(N)The brute force algorithm explores every possible permutation of infection order, which implicitly uses recursion. The maximum depth of the recursion call stack will be proportional to the number of infected locations, denoted as N, since each level of recursion represents making a decision about the next infection. Additionally, simulating the infection spread for each permutation requires temporarily storing visited or infected states, potentially using an auxiliary array or set of size N to track the infection status of each location. Therefore, the overall auxiliary space is O(N) due to the recursion depth and the temporary storage for simulation.

Optimal Solution

Approach

The problem asks us to count the valid ways an infection can spread. The key idea is to use dynamic programming to break down the problem into smaller, overlapping subproblems, avoiding redundant calculations.

Here's how the algorithm would work step-by-step:

  1. Recognize that the number of infection sequences up to a certain point depends on the number of infection sequences up to the previous points.
  2. Imagine building a table where each entry represents the number of infection sequences ending at a specific person.
  3. Start filling in the table from the beginning. The first infected person has only one possible sequence.
  4. For each subsequent person, check if they can be infected based on the given infection rules (e.g., they need to be adjacent to someone already infected).
  5. If a person can be infected, the number of infection sequences ending at that person is the sum of the infection sequences ending at all the people who could have infected them.
  6. Continue filling the table until you reach the end.
  7. The total number of infection sequences is the sum of all the entries in the table, representing all possible ways the infection could have spread to the entire group.

Code Implementation

def count_infection_sequences(number_of_healthy,
                                initial_infections):

    # Factorial function for calculating permutations
    def factorial(number):
        if number == 0:
            return 1
        else:
            result = 1
            for i in range(1, number + 1):
                result *= i
            return result

    # Calculate the number of infection sequences
    if number_of_healthy < 0 or initial_infections < 0:
        return 0

    # Number of ways to order the infections
    number_of_sequences = factorial(number_of_healthy)

    # This handles cases with no healthy individuals
    if number_of_healthy == 0:
        return 1

    # Sequences are only relevant if there are healthy people
    return number_of_sequences

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n people to determine the number of infection sequences ending at that person. For each person, it iterates through the previously infected people to check if they could have infected the current person based on adjacency rules. This leads to a nested loop structure where for each of the n people, we potentially examine up to n-1 other people. This results in approximately n * (n-1) operations, which simplifies to O(n²).
Space Complexity
O(N)The algorithm uses a table (dynamic programming array) to store the number of infection sequences ending at each person. This table has a size proportional to the number of people, which we denote as N. Therefore, the auxiliary space used is directly proportional to the input size N, resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
n <= 0 or k <= 0Return 0 as no sequence is possible with non-positive length or initial infections.
k > nReturn 0 as the number of initially infected positions cannot be greater than the sequence length.
n = 1, k = 1Return 1 as there is only one infected position and the sequence is already fully infected.
n = 2, k = 1Return 2 as there are two possible sequences depending on which initial position infects the other.
Integer overflow during intermediate calculations (factorial, powers)Use modulo operator (10^9 + 7) after each multiplication and power operation to prevent overflow.
Large n and k values leading to high computational complexityOptimize the calculations using dynamic programming or memoization to avoid redundant computations.
k = n (all positions initially infected)Return 1, as there is only one possible sequence: the initial state.
n is very large and using naive factorial calculation would be too slowPrecompute factorials and inverse factorials modulo 10^9+7 for optimization, and avoid recalculating them each time.