A gene string can be represented by an 8-character long string, with choices from 'A'
, 'C'
, 'G'
, and 'T'
.
Suppose we need to investigate a mutation from a gene string startGene
to a gene string endGene
where one mutation is defined as one single character changed in the gene string.
"AACCGGTT" --> "AACCGGTA"
is one mutation.There is also a gene bank bank
that records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank bank
, return the minimum number of mutations needed to mutate from startGene
to endGene
. If there is no such a mutation, return -1
.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Example 1:
Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"] Output: 1
Example 2:
Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"] Output: 2
Constraints:
0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene
, endGene
, and bank[i]
consist of only the characters ['A', 'C', 'G', 'T']
.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach for finding the minimum genetic mutation involves exploring every possible sequence of single-character mutations from the start gene to the end gene. We generate all possible mutations at each step and check if they lead to the target, ultimately finding the shortest path. This is like trying every possible path through a maze to find the exit, regardless of how long or winding it is.
Here's how the algorithm would work step-by-step:
def minimum_genetic_mutation_brute_force(start_gene, end_gene, gene_bank):
min_mutations = float('inf')
def find_mutation_paths(current_gene, mutations, visited):
nonlocal min_mutations
if current_gene == end_gene:
min_mutations = min(min_mutations, mutations)
return
# Need to avoid cycles in our search
if current_gene in visited:
return
visited.add(current_gene)
for gene in gene_bank:
difference = 0
for i in range(len(current_gene)):
if current_gene[i] != gene[i]:
difference += 1
# Explore possible single mutations from current_gene
if difference == 1:
find_mutation_paths(gene, mutations + 1, visited.copy())
# Begin recursive search from the starting gene
find_mutation_paths(start_gene, 0, set())
# If no path was found, return -1
if min_mutations == float('inf'):
return -1
else:
return min_mutations
The problem asks for the fewest changes to a starting word to reach a target word, using only words from a provided list. We find the shortest path by exploring possible mutations in a breadth-first manner, checking words one change away until we find the target.
Here's how the algorithm would work step-by-step:
def minMutation(start_gene, end_gene, gene_bank):
queue = [(start_gene, 0)]
visited_genes = {start_gene}
while queue:
current_gene, mutations_count = queue.pop(0)
if current_gene == end_gene:
return mutations_count
# Iterate through all possible genes in the bank
for next_gene in gene_bank:
difference_count = 0
for i in range(len(current_gene)):
if current_gene[i] != next_gene[i]:
difference_count += 1
# Check if the gene is one mutation away
if difference_count == 1:
# Avoid cycles by checking visited genes
if next_gene not in visited_genes:
queue.append((next_gene, mutations_count + 1))
visited_genes.add(next_gene)
# Return -1 if no mutation path is found
return -1
Case | How to Handle |
---|---|
Null start or end gene | Throw IllegalArgumentException or return -1 immediately, as the input is invalid. |
Null bank array | Throw IllegalArgumentException or return -1, as the bank is necessary for the search. |
Empty bank array | Return -1 if the end gene is not equal to the start gene, otherwise return 0. |
start gene is equal to end gene | Return 0 immediately as no mutation is required. |
End gene is not reachable from start gene | The BFS will terminate without finding the end gene, so return -1. |
Gene strings in bank are not all the same length | Throw IllegalArgumentException because this violates input constraints and comparison is invalid. |
Gene strings contain characters other than A, C, G, T | Throw IllegalArgumentException as this violates the input constraints and invalidates the gene comparisons. |
Large bank array leading to memory constraints | Ensure your BFS queue does not unnecessarily store duplicates or hold onto references longer than needed to prevent excessive memory usage. |