Given two strings word1
and word2
, return the minimum number of steps required to make word1
and word2
the same.
In one step, you can delete exactly one character in either string.
Example 1:
Input: word1 = "sea", word2 = "eat" Output: 2 Explanation: You need one step to make "sea" to "ea" and another step to make "eat" to "ea".
Example 2:
Input: word1 = "leetcode", word2 = "etco" Output: 4
Constraints:
1 <= word1.length, word2.length <= 500
word1
and word2
consist of only lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The problem asks for the fewest deletions to make two strings identical. The brute force method explores all possible ways to delete characters from both strings to find the shortest sequence of deletions that results in the same remaining string.
Here's how the algorithm would work step-by-step:
def min_distance(string1, string2):
minimum_deletions = float('inf')
def find_minimum_deletions(index1, index2, current_deletions, sub_string1, sub_string2):
nonlocal minimum_deletions
if index1 == len(string1) and index2 == len(string2):
if sub_string1 == sub_string2:
minimum_deletions = min(minimum_deletions, current_deletions)
return
# Explore the option of deleting a character from string1.
if index1 < len(string1):
find_minimum_deletions(index1 + 1, index2, current_deletions + 1, sub_string1, sub_string2)
# Include current character from string1 in the sub string.
if index1 < len(string1):
find_minimum_deletions(index1 + 1, index2, current_deletions, sub_string1 + string1[index1], sub_string2)
# Explore deleting a character from string2.
if index2 < len(string2):
find_minimum_deletions(index1, index2 + 1, current_deletions + 1, sub_string1, sub_string2)
# Include the current character from string2 in the substring.
if index2 < len(string2):
find_minimum_deletions(index1, index2 + 1, current_deletions, sub_string1, sub_string2 + string2[index2])
find_minimum_deletions(0, 0, 0, "", "")
return minimum_deletions
The most efficient way to find the minimum number of deletions is to find the longest common sequence between the two strings. By finding this sequence, we determine the characters that don't need to be deleted, and therefore what needs to be removed.
Here's how the algorithm would work step-by-step:
def min_distance(word1, word2):
length_of_word1 = len(word1)
length_of_word2 = len(word2)
# Initialize a table to store lengths of common subsequences
longest_common_subsequence_table = [[0] * (length_of_word2 + 1) for _ in range(length_of_word1 + 1)]
for i in range(1, length_of_word1 + 1):
for j in range(1, length_of_word2 + 1):
# If characters match, increment the length of the common subsequence
if word1[i - 1] == word2[j - 1]:
longest_common_subsequence_table[i][j] = longest_common_subsequence_table[i - 1][j - 1] + 1
# If characters don't match, take the max length from previous results
else:
longest_common_subsequence_table[i][j] = max(longest_common_subsequence_table[i - 1][j], longest_common_subsequence_table[i][j - 1])
# Extract the length of the longest common subsequence.
longest_common_subsequence_length = longest_common_subsequence_table[length_of_word1][length_of_word2]
# Calculate the number of deletions needed.
return length_of_word1 - longest_common_subsequence_length + length_of_word2 - longest_common_subsequence_length
Case | How to Handle |
---|---|
Both strings are null or empty | Return 0 since no operations are needed. |
One string is null or empty, the other is not | Return the length of the non-null/non-empty string, as that many deletions are needed. |
Strings are identical | Return 0 since no deletions are needed. |
Strings have one or two characters | The standard dynamic programming algorithm handles small string lengths correctly. |
Maximum string lengths allowed by constraints (e.g., 500) | Dynamic programming approach scales to the square of the input string lengths, so ensure sufficient memory and reasonable runtime complexity O(m*n). |
Strings have completely different characters | The longest common subsequence (LCS) will be 0, thus deletions will be the sum of string lengths. |
Strings have very long common prefix or suffix | Dynamic programming correctly accounts for long common subsequences, minimizing unnecessary deletions. |
String length close to integer overflow limits | The lengths are integers, so potential overflow during addition of string lengths should be considered, but usually within acceptable bounds considering typical string length constraints. |