Taro Logo

Delete Operation for Two Strings

Medium
Amazon logo
Amazon
2 views
Topics:
StringsDynamic Programming

Given two strings word1 and word2, return the minimum number of steps required to make word1 and word2 the same.

In one step, you can delete exactly one character in either string.

Example 1:

Input: word1 = "sea", word2 = "eat"
Output: 2
Explanation: You need one step to make "sea" to "ea" and another step to make "eat" to "ea".

Example 2:

Input: word1 = "leetcode", word2 = "etco"
Output: 4

Constraints:

  • 1 <= word1.length, word2.length <= 500
  • word1 and word2 consist of only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Are the input strings case-sensitive?
  2. Can the input strings be empty or null?
  3. What is the maximum length of the input strings?
  4. If multiple minimal deletion sequences exist, is any one acceptable?
  5. Should I return the number of deletion operations, or a string representing the operations themselves?

Brute Force Solution

Approach

The problem asks for the fewest deletions to make two strings identical. The brute force method explores all possible ways to delete characters from both strings to find the shortest sequence of deletions that results in the same remaining string.

Here's how the algorithm would work step-by-step:

  1. Imagine deleting nothing from either string and see if they are already equal. If they are, you're done; no deletions are needed.
  2. Then, consider deleting a single character from the first string and see if that makes it equal to the second string (or if it becomes equal after deleting characters from the second string).
  3. Next, try deleting a single character from the second string and perform a similar comparison.
  4. Expand this process to deleting two characters, then three, and so on, from each of the strings, trying every combination of deletions possible. So, you could delete two characters from the first string, or one from each, or two from the second string. Explore all such options.
  5. For each combination of deletions, compare the resulting strings. If they match, calculate the total number of characters deleted (from both original strings).
  6. Keep track of the fewest number of deletions needed across all matching strings that you found in the previous steps.
  7. Eventually, after trying all possible deletion combinations, the smallest count you've kept is the answer.

Code Implementation

def min_distance(string1, string2):
    minimum_deletions = float('inf')

    def find_minimum_deletions(index1, index2, current_deletions, sub_string1, sub_string2):
        nonlocal minimum_deletions

        if index1 == len(string1) and index2 == len(string2):
            if sub_string1 == sub_string2:
                minimum_deletions = min(minimum_deletions, current_deletions)
            return

        # Explore the option of deleting a character from string1.
        if index1 < len(string1):
            find_minimum_deletions(index1 + 1, index2, current_deletions + 1, sub_string1, sub_string2)

        # Include current character from string1 in the sub string.
        if index1 < len(string1):
            find_minimum_deletions(index1 + 1, index2, current_deletions, sub_string1 + string1[index1], sub_string2)

        # Explore deleting a character from string2.
        if index2 < len(string2):
            find_minimum_deletions(index1, index2 + 1, current_deletions + 1, sub_string1, sub_string2)

        # Include the current character from string2 in the substring.
        if index2 < len(string2):
            find_minimum_deletions(index1, index2 + 1, current_deletions, sub_string1, sub_string2 + string2[index2])

    find_minimum_deletions(0, 0, 0, "", "")
    return minimum_deletions

Big(O) Analysis

Time Complexity
O(2^(n+m))The brute force approach explores all possible subsets of characters that can be deleted from both strings. Given strings of length n and m, there are 2^n possible subsets of the first string and 2^m possible subsets of the second string. The algorithm essentially iterates through all possible pairs of subsets, comparing the resulting strings after deletion. Therefore, the time complexity is proportional to 2^n * 2^m, which simplifies to O(2^(n+m)).
Space Complexity
O(1)The brute force approach, as described, doesn't explicitly use auxiliary data structures like arrays, lists, or hash maps to store intermediate results. While the described process conceptually 'keeps track' of the fewest number of deletions, this is likely done via a single integer variable. The space used remains constant regardless of the length of the input strings, denoted as N, since we are not creating data structures dependent on input size. Thus, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The most efficient way to find the minimum number of deletions is to find the longest common sequence between the two strings. By finding this sequence, we determine the characters that don't need to be deleted, and therefore what needs to be removed.

Here's how the algorithm would work step-by-step:

  1. Imagine we're building a table where each cell represents the longest common sequence we can make up to a certain point in each word.
  2. We start by filling the table with zeros, because if either word is empty, the longest common sequence is empty.
  3. Then, we go through each letter of both words.
  4. If the letters are the same, we add one to the length of the longest common sequence we have so far (the one diagonally above and to the left).
  5. If the letters are different, we take the bigger of the two longest common sequences we have (either from the cell above or the cell to the left).
  6. Once we've filled the whole table, the number in the bottom-right corner is the length of the longest common sequence.
  7. Finally, we subtract the length of the longest common sequence from the length of each of the original words and add these differences. This result is the minimum number of deletions needed.

Code Implementation

def min_distance(word1, word2):
    length_of_word1 = len(word1)
    length_of_word2 = len(word2)

    # Initialize a table to store lengths of common subsequences
    longest_common_subsequence_table = [[0] * (length_of_word2 + 1) for _ in range(length_of_word1 + 1)]

    for i in range(1, length_of_word1 + 1):
        for j in range(1, length_of_word2 + 1):
            # If characters match, increment the length of the common subsequence
            if word1[i - 1] == word2[j - 1]:

                longest_common_subsequence_table[i][j] = longest_common_subsequence_table[i - 1][j - 1] + 1

            # If characters don't match, take the max length from previous results
            else:

                longest_common_subsequence_table[i][j] = max(longest_common_subsequence_table[i - 1][j], longest_common_subsequence_table[i][j - 1])

    # Extract the length of the longest common subsequence.
    longest_common_subsequence_length = longest_common_subsequence_table[length_of_word1][length_of_word2]

    # Calculate the number of deletions needed.
    return length_of_word1 - longest_common_subsequence_length + length_of_word2 - longest_common_subsequence_length

Big(O) Analysis

Time Complexity
O(m*n)The primary cost is building the table to determine the longest common subsequence. We iterate through each character of word1 (length m) and word2 (length n) using nested loops. For each pair of characters, we perform a constant-time comparison and update the table cell, taking either the maximum of two adjacent cells or incrementing a diagonal cell. Therefore, the time complexity is proportional to m*n, where m and n are the lengths of the two input strings, which simplifies to O(m*n).
Space Complexity
O(m*n)The algorithm uses a table (represented as a 2D array or matrix) to store the lengths of the longest common sequences for different prefixes of the two input strings. The dimensions of this table are determined by the lengths of the two strings, let them be m and n, respectively. Therefore, the space required to store this table is proportional to the product of these lengths (m*n). Thus, the space complexity is O(m*n) where m and n are the lengths of the input strings.

Edge Cases

CaseHow to Handle
Both strings are null or emptyReturn 0 since no operations are needed.
One string is null or empty, the other is notReturn the length of the non-null/non-empty string, as that many deletions are needed.
Strings are identicalReturn 0 since no deletions are needed.
Strings have one or two charactersThe standard dynamic programming algorithm handles small string lengths correctly.
Maximum string lengths allowed by constraints (e.g., 500)Dynamic programming approach scales to the square of the input string lengths, so ensure sufficient memory and reasonable runtime complexity O(m*n).
Strings have completely different charactersThe longest common subsequence (LCS) will be 0, thus deletions will be the sum of string lengths.
Strings have very long common prefix or suffixDynamic programming correctly accounts for long common subsequences, minimizing unnecessary deletions.
String length close to integer overflow limitsThe lengths are integers, so potential overflow during addition of string lengths should be considered, but usually within acceptable bounds considering typical string length constraints.