Taro Logo

Edit Distance

Medium
Amazon logo
Amazon
5 views
Topics:
StringsDynamic Programming

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Example 1:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:

Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation: 
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Constraints:

  • 0 <= word1.length, word2.length <= 500
  • word1 and word2 consist of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input strings, `word1` and `word2`, be empty or null? If so, what should I return?
  2. Are the input strings case-sensitive? Should I treat 'apple' and 'Apple' as different?
  3. What is the maximum length of the input strings `word1` and `word2`? I'm thinking about the space complexity.
  4. Are there any character restrictions on the input strings? For example, are they limited to ASCII characters, or can they contain Unicode characters?
  5. If `word1` and `word2` are identical, should I return 0?

Brute Force Solution

Approach

The edit distance problem asks how many changes (insertions, deletions, substitutions) are needed to make two words identical. The brute force approach explores every single possible combination of edits to transform one word into the other, ensuring no possibility is missed.

Here's how the algorithm would work step-by-step:

  1. Start by considering the first characters of both words.
  2. Explore the possibility that these characters are already the same, so no edit is needed.
  3. Also explore the possibility that the first character of the first word needs to be changed to match the first character of the second word.
  4. Explore the possibility that the first character of the first word needs to be deleted.
  5. Explore the possibility that a new character needs to be inserted at the beginning of the first word to match the first character of the second word.
  6. For each of these possibilities, move to the next character (or characters) in the words and repeat the process.
  7. Continue this process, exploring all possible combinations of edits, until you've considered every character in both words.
  8. Count the number of edits needed for each possible transformation.
  9. Compare all the different possibilities and select the one with the smallest number of edits. This is the minimum edit distance.

Code Implementation

def edit_distance_brute_force(first_word, second_word):
    def calculate_distance(first_word_index, second_word_index):
        # If we've reached the end of the first word, insert remaining chars of second word
        if first_word_index == len(first_word):
            return len(second_word) - second_word_index

        # If we've reached the end of the second word, delete remaining chars of first word
        if second_word_index == len(second_word):
            return len(first_word) - first_word_index

        # If the characters match, no operation needed, move to next characters
        if first_word[first_word_index] == second_word[second_word_index]:
            return calculate_distance(first_word_index + 1, second_word_index + 1)

        # Explore all possible operations and take the minimum edit distance
        insert_operation = 1 + calculate_distance(first_word_index, second_word_index + 1)
        delete_operation = 1 + calculate_distance(first_word_index + 1, second_word_index)

        # Consider substitution operation to match characters
        substitution_operation = 1 + calculate_distance(first_word_index + 1, second_word_index + 1)

        return min(insert_operation, delete_operation, substitution_operation)

    return calculate_distance(0, 0)

Big(O) Analysis

Time Complexity
O(3^(m+n))The brute force approach explores all possible combinations of insertions, deletions, and substitutions to transform one string of length m into another of length n. At each character position in both strings, the algorithm branches into three possibilities: insertion, deletion, or substitution/match. This branching happens recursively. Therefore, the number of possible paths grows exponentially with the combined length of the two strings (m+n), resulting in a time complexity of O(3^(m+n)).
Space Complexity
O(N*M)The brute force approach explores all possible combinations of edits using recursion. Each recursive call creates a new stack frame, which stores the current state of the words being compared, and the number of edits made so far. In the worst case, the depth of the recursion can be proportional to the lengths of the two input words, call them N and M respectively (where N is the length of the first word and M is the length of the second word). Thus, the maximum number of stack frames required can be N*M, leading to a space complexity of O(N*M) due to the call stack.

Optimal Solution

Approach

The edit distance problem asks us to find the minimum number of changes needed to make two words identical. The clever approach avoids trying all possible combinations of edits by building up solutions for smaller parts of the problem and combining them to solve the bigger problem.

Here's how the algorithm would work step-by-step:

  1. Imagine creating a table where each row and column represent a letter from the two words we are comparing.
  2. Start by filling in the first row and column. These represent the number of edits needed to transform an empty word into the beginning of the other word (simply adding letters).
  3. Now, for each remaining cell in the table, consider the letters in the two words corresponding to that row and column.
  4. If the letters are the same, no edit is needed, so the cell's value is the same as the cell diagonally above and to the left.
  5. If the letters are different, we consider three possibilities: inserting, deleting, or replacing a letter. We take the minimum number of edits required from these three options.
  6. To calculate these options look at the adjacent cells, the one above represents deleting a letter, the one to the left represents inserting a letter, and the one diagonally above and to the left represents replacing the letter. We add one to the value of these cells as each edit contributes one operation.
  7. Choose the smallest value from those three possibilities and put that number in the cell in the table.
  8. Finally, the value in the bottom-right cell of the table represents the minimum number of edits required to transform the first word into the second word.

Code Implementation

def edit_distance(word1, word2):
    word1_length = len(word1)
    word2_length = len(word2)

    # Initialize a matrix to store edit distances between prefixes of the words.
    distance_matrix = [[0] * (word2_length + 1) for _ in range(word1_length + 1)]

    # Initialize the first row and column of the matrix.
    for i in range(word1_length + 1):
        distance_matrix[i][0] = i

    for j in range(word2_length + 1):
        distance_matrix[0][j] = j

    for i in range(1, word1_length + 1):
        for j in range(1, word2_length + 1):
            # If the characters are the same, no edit is needed.
            if word1[i - 1] == word2[j - 1]:
                distance_matrix[i][j] = distance_matrix[i - 1][j - 1]
            else:
                # Consider insertion, deletion, and substitution.
                distance_matrix[i][j] = min(
                    distance_matrix[i - 1][j] + 1,  # Deletion
                    distance_matrix[i][j - 1] + 1,  # Insertion
                    distance_matrix[i - 1][j - 1] + 1,  # Substitution
                )

    # The bottom-right cell contains the edit distance.
    return distance_matrix[word1_length][word2_length]

Big(O) Analysis

Time Complexity
O(mn)The algorithm constructs an (m+1) x (n+1) table, where 'm' is the length of the first word and 'n' is the length of the second word. Filling each cell in the table involves a constant number of operations: comparing characters and calculating the minimum of three values. Since we iterate through all m*n cells in the table once, the overall time complexity is proportional to the product of the lengths of the two input strings, or O(mn).
Space Complexity
O(mn)The algorithm constructs a table to store intermediate results, where m is the length of the first word and n is the length of the second word. The table has m rows and n columns, requiring memory proportional to the product of their lengths. Therefore, the auxiliary space used by the algorithm scales linearly with both the length of word 1 and word 2. This results in a space complexity of O(mn).

Edge Cases

CaseHow to Handle
Both strings are emptyThe edit distance is 0, return 0.
One string is empty, the other is notThe edit distance is the length of the non-empty string, due to insertions or deletions.
Strings are identicalThe edit distance is 0, return 0.
Strings of length 1Handle the base cases correctly in the recursion or dynamic programming initialization.
Strings with large lengths (close to maximum allowed string length)Ensure the solution uses dynamic programming to avoid exponential time complexity and possible stack overflow from recursion.
Strings contain identical charactersDynamic programming approach accounts for the matching characters by using previous subproblem results.
Strings that are very different (require many edits)Dynamic programming will efficiently calculate large edit distances.
Null input stringsThrow an IllegalArgumentException or return -1 to signify an invalid input.