Taro Logo

Shortest Common Supersequence

Hard
Amazon logo
Amazon
1 view
Topics:
StringsDynamic Programming

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.

A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.

For example:

  • str1 = "abac", str2 = "cab" should return "cabac"
  • str1 = "aaaaaaaa", str2 = "aaaaaaaa" should return "aaaaaaaa"

Your solution should be efficient in terms of time and space complexity. Consider edge cases such as:

  • Empty strings
  • Identical strings
  • Strings with no common characters

Explain your approach, its time complexity, and its space complexity.

Solution


Shortest Common Supersequence

Problem

Given two strings, str1 and str2, the goal is to find the shortest string that has both str1 and str2 as subsequences. If multiple such strings exist, any one of them can be returned.

Naive Approach

A straightforward, though inefficient, approach involves generating all possible supersequences of str1 and str2 and then identifying the shortest among them that contains both input strings as subsequences. This can be achieved recursively by considering all possible interleavings of the characters of the two strings. The time complexity of this approach is exponential, rendering it impractical for larger inputs.

Optimal Solution

The problem can be efficiently solved using dynamic programming. The core idea is to find the Longest Common Subsequence (LCS) of str1 and str2. Once the LCS is known, the shortest common supersequence can be constructed by including all characters of str1 and str2, but including the characters of LCS only once.

Algorithm

  1. Compute LCS: Use dynamic programming to find the LCS of str1 and str2.
  2. Construct SCS: Build the shortest common supersequence using str1, str2, and their LCS.

Dynamic Programming for LCS

A 2D array dp is created where dp[i][j] stores the length of the LCS of str1[0...i-1] and str2[0...j-1]. The DP table is filled as follows:

  • If str1[i-1] == str2[j-1], then dp[i][j] = dp[i-1][j-1] + 1
  • Else, dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Constructing the SCS

Starting from the bottom-right cell of the dp table (i.e., dp[n][m], where n and m are the lengths of str1 and str2 respectively), the SCS is constructed by tracing back through the table:

  • If str1[i-1] == str2[j-1], then this character is part of the LCS, so it is included in the SCS, and i and j are decremented.
  • Else, if dp[i-1][j] > dp[i][j-1], then str1[i-1] is included in the SCS, and i is decremented.
  • Else, str2[j-1] is included in the SCS, and j is decremented.
  • Any remaining characters in str1 or str2 are appended to the SCS.

Example

str1 = "abac", str2 = "cab"

  1. LCS Calculation: The LCS is "ab".
  2. SCS Construction: The SCS is "cabac".

Code Implementation

def shortestCommonSupersequence(str1: str, str2: str) -> str:
    n, m = len(str1), len(str2)

    # Compute LCS
    dp = [[''] * (m + 1) for _ in range(n + 1)]
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if str1[i - 1] == str2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + str1[i - 1]
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1], key=len)

    lcs = dp[n][m]

    # Construct SCS
    i, j = 0, 0
    k = 0
    result = ""
    while i < n or j < m:
        if k < len(lcs) and i < n and j < m and str1[i] == lcs[k] and str2[j] == lcs[k]:
            result += str1[i]
            i += 1
            j += 1
            k += 1
        elif i < n and (k >= len(lcs) or j == m or str1[i] != lcs[k]):
            result += str1[i]
            i += 1
        elif j < m and (k >= len(lcs) or i == n or str2[j] != lcs[k]):
            result += str2[j]
            j += 1

    return result

Complexity Analysis

  • Time Complexity: O(m * n), where m and n are the lengths of str1 and str2, respectively, due to the dynamic programming approach for computing the LCS.
  • Space Complexity: O(m * n) to store the DP table.

Edge Cases

  • If either of the strings is empty, the shortest common supersequence is the other string.
  • If the strings are equal, the shortest common supersequence is either of the strings.