Taro Logo

Shortest Common Supersequence

Hard
Google logo
Google
3 views
Topics:
StringsDynamic Programming

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.

A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.

For example:

  • If str1 is "abac" and str2 is "cab", a valid output would be "cabac". "abac" is a subsequence of "cabac" (delete the first 'c'), and "cab" is a subsequence of "cabac" (delete 'ac'). The task is to find the shortest string satisfying these conditions.
  • If str1 is "aaaaaaaa" and str2 is "aaaaaaaa", the output should be "aaaaaaaa".

Write a function that efficiently computes the shortest common supersequence given two input strings. Consider edge cases, time and space complexity.

Solution


Shortest Common Supersequence

Problem Description

Given two strings str1 and str2, the goal is to find the shortest string that has both str1 and str2 as subsequences. If multiple such strings exist, any one of them can be returned.

A string s is a subsequence of string t if s can be formed by deleting some characters from t without changing the order of the remaining characters.

Example 1:

Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation:
str1 = "abac" is a subsequence of "cabac" (delete 'c').
str2 = "cab" is a subsequence of "cabac" (delete 'ac').

Example 2:

Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"

Naive Approach

A naive approach would involve generating all possible supersequences of str1 and str2 and then finding the shortest among them. However, this is highly inefficient due to the exponential number of possible supersequences. The time complexity would be exponential, making it impractical for even moderately sized strings.

Optimal Approach (Dynamic Programming)

The optimal approach uses dynamic programming to find the length of the longest common subsequence (LCS) of the two strings. Once the LCS is found, the shortest common supersequence (SCS) can be constructed.

  1. Find the Length of LCS:

    • Create a 2D DP table dp where dp[i][j] stores the length of the LCS of str1[0...i-1] and str2[0...j-1].
    • If str1[i-1] == str2[j-1], then dp[i][j] = dp[i-1][j-1] + 1.
    • Else, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
  2. Construct the SCS:

    • Start from the bottom-right cell of the dp table (i.e., dp[n][m], where n and m are the lengths of str1 and str2 respectively).
    • If str1[i-1] == str2[j-1], then this character is part of the LCS, so include it in the SCS and move diagonally up-left (i.e., i--, j--).
    • Else, if dp[i-1][j] > dp[i][j-1], then the character str1[i-1] is not part of the LCS, so include it in the SCS and move up (i.e., i--).
    • Otherwise, the character str2[j-1] is not part of the LCS, so include it in the SCS and move left (i.e., j--).
    • Append any remaining characters from str1 or str2 to the SCS.
    • Reverse the SCS to get the final result.

Code Example (Java)

class Solution {
    public String shortestCommonSupersequence(String str1, String str2) {
        int n = str1.length();
        int m = str2.length();

        // Compute the length of the LCS
        int[][] dp = new int[n + 1][m + 1];
        for (int i = 1; i <= n; i++) {
            for (int j = 1; j <= m; j++) {
                if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }

        // Construct the SCS
        StringBuilder scs = new StringBuilder();
        int i = n, j = m;
        while (i > 0 && j > 0) {
            if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                scs.append(str1.charAt(i - 1));
                i--;
                j--;
            } else if (dp[i - 1][j] > dp[i][j - 1]) {
                scs.append(str1.charAt(i - 1));
                i--;
            } else {
                scs.append(str2.charAt(j - 1));
                j--;
            }
        }

        // Append remaining characters
        while (i > 0) {
            scs.append(str1.charAt(i - 1));
            i--;
        }
        while (j > 0) {
            scs.append(str2.charAt(j - 1));
            j--;
        }

        return scs.reverse().toString();
    }
}

Time and Space Complexity

  • Time Complexity: O(m * n), where n is the length of str1 and m is the length of str2. This is due to the dynamic programming approach which requires filling a 2D table of size (n+1) x (m+1).
  • Space Complexity: O(m * n) to store the DP table. The string builder used to construct the SCS contributes O(m+n) to the space complexity, but since this is dominated by O(m*n), the overall space complexity remains O(m * n).

Edge Cases

  • If either str1 or str2 is empty, the SCS is simply the other string.
  • If str1 and str2 are identical, the SCS is the string itself.
  • Consider overlapping or completely different strings to ensure the algorithm handles all scenarios correctly.