A gene string can be represented by an 8-character long string, with choices from 'A'
, 'C'
, 'G'
, and 'T'
. Suppose we need to investigate a mutation from a gene string startGene
to a gene string endGene
where one mutation is defined as one single character changed in the gene string.
"AACCGGTT" --> "AACCGGTA"
is one mutation.There is also a gene bank bank
that records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank bank
, return the minimum number of mutations needed to mutate from startGene
to endGene
. If there is no such a mutation, return -1
.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Example 1:
Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output: 1
Example 2:
Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output: 2
Here's how to solve this problem:
Problem Description
We are given two gene strings, startGene
and endGene
, both of length 8, consisting of characters 'A', 'C', 'G', and 'T'. We are also given a gene bank bank
containing valid gene strings. A mutation is defined as changing a single character in a gene string. The goal is to find the minimum number of mutations needed to transform startGene
to endGene
, using only gene strings present in the bank
. The starting gene might not be in the bank itself.
Example:
startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output: 1
startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output: 2
Approach
We can use Breadth-First Search (BFS) to find the shortest path (minimum number of mutations) from startGene
to endGene
. Each gene string is a node in the graph, and an edge exists between two gene strings if they differ by only one character and are in the bank (or one of them is the start gene).
startGene
to it.size
times:
endGene
, return the current mutation count.bank
and has not been visited:
endGene
, it means there is no valid mutation path. Return -1.Code
from collections import deque
class Solution:
def minMutation(self, startGene: str, endGene: str, bank: list[str]) -> int:
bank_set = set(bank)
if endGene not in bank_set:
return -1
queue = deque([(startGene, 0)])
visited = {startGene}
while queue:
gene, mutations = queue.popleft()
if gene == endGene:
return mutations
for i in range(8):
for char in 'ACGT':
new_gene = gene[:i] + char + gene[i+1:]
if new_gene in bank_set and new_gene not in visited:
queue.append((new_gene, mutations + 1))
visited.add(new_gene)
return -1
Explanation:
bank_set = set(bank)
: Converts the bank
list to a set for faster lookups.queue = deque([(startGene, 0)])
: Initializes a queue with the starting gene and mutation count 0.visited = {startGene}
: Keeps track of visited genes to avoid cycles.while queue
loop performs BFS.new_gene in bank_set and new_gene not in visited
: Checks if the new gene is valid (in the bank) and hasn't been visited before.endGene
is never reached, the function returns -1.Example Usage:
startGene = "AACCGGTT"
endGene = "AACCGGTA"
bank = ["AACCGGTA"]
solution = Solution()
result = solution.minMutation(startGene, endGene, bank)
print(result) # Output: 1
startGene = "AACCGGTT"
endGene = "AAACGGTA"
bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
result = solution.minMutation(startGene, endGene, bank)
print(result) # Output: 2
Big O Runtime Analysis
n
be the number of genes in the bank
. In the worst case, we might visit all the genes in the bank.Big O Space Analysis
queue
can hold, in the worst case, all the genes in the bank. So, the space used by the queue is O(n).visited
set can also hold all the genes in the bank. So, the space used by the set is O(n).