Given two strings str1
and str2
, return the shortest string that has both str1
and str2
as subsequences. If there are multiple valid strings, return any of them.
A string s
is a subsequence of string t
if deleting some number of characters from t
(possibly 0) results in the string s
.
For example:
str1
is "abac" and str2
is "cab", a valid output would be "cabac". "abac" is a subsequence of "cabac" (delete the first 'c'), and "cab" is a subsequence of "cabac" (delete 'ac'). The task is to find the shortest string satisfying these conditions.str1
is "aaaaaaaa" and str2
is "aaaaaaaa", the output should be "aaaaaaaa".Write a function that efficiently computes the shortest common supersequence given two input strings. Consider edge cases, time and space complexity.
Given two strings str1
and str2
, the goal is to find the shortest string that has both str1
and str2
as subsequences. If multiple such strings exist, any one of them can be returned.
A string s
is a subsequence of string t
if s
can be formed by deleting some characters from t
without changing the order of the remaining characters.
Example 1:
Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation:
str1 = "abac" is a subsequence of "cabac" (delete 'c').
str2 = "cab" is a subsequence of "cabac" (delete 'ac').
Example 2:
Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"
A naive approach would involve generating all possible supersequences of str1
and str2
and then finding the shortest among them. However, this is highly inefficient due to the exponential number of possible supersequences. The time complexity would be exponential, making it impractical for even moderately sized strings.
The optimal approach uses dynamic programming to find the length of the longest common subsequence (LCS) of the two strings. Once the LCS is found, the shortest common supersequence (SCS) can be constructed.
Find the Length of LCS:
dp
where dp[i][j]
stores the length of the LCS of str1[0...i-1]
and str2[0...j-1]
.str1[i-1] == str2[j-1]
, then dp[i][j] = dp[i-1][j-1] + 1
.dp[i][j] = max(dp[i-1][j], dp[i][j-1])
.Construct the SCS:
dp
table (i.e., dp[n][m]
, where n
and m
are the lengths of str1
and str2
respectively).str1[i-1] == str2[j-1]
, then this character is part of the LCS, so include it in the SCS and move diagonally up-left (i.e., i--, j--
).dp[i-1][j] > dp[i][j-1]
, then the character str1[i-1]
is not part of the LCS, so include it in the SCS and move up (i.e., i--
).str2[j-1]
is not part of the LCS, so include it in the SCS and move left (i.e., j--
).str1
or str2
to the SCS.class Solution {
public String shortestCommonSupersequence(String str1, String str2) {
int n = str1.length();
int m = str2.length();
// Compute the length of the LCS
int[][] dp = new int[n + 1][m + 1];
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// Construct the SCS
StringBuilder scs = new StringBuilder();
int i = n, j = m;
while (i > 0 && j > 0) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
scs.append(str1.charAt(i - 1));
i--;
j--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
scs.append(str1.charAt(i - 1));
i--;
} else {
scs.append(str2.charAt(j - 1));
j--;
}
}
// Append remaining characters
while (i > 0) {
scs.append(str1.charAt(i - 1));
i--;
}
while (j > 0) {
scs.append(str2.charAt(j - 1));
j--;
}
return scs.reverse().toString();
}
}
str1
and m is the length of str2
. This is due to the dynamic programming approach which requires filling a 2D table of size (n+1) x (m+1).str1
or str2
is empty, the SCS is simply the other string.str1
and str2
are identical, the SCS is the string itself.