Given two strings str1
and str2
, return the shortest string that has both str1
and str2
as subsequences. If there are multiple valid strings, return any of them.
A string s
is a subsequence of string t
if deleting some number of characters from t
(possibly 0
) results in the string s
.
For example:
str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation:
str1 = "abac"
is a subsequence of "cabac"
because we can delete the first "c"
.
str2 = "cab"
is a subsequence of "cabac"
because we can delete the last "ac"
.
The answer provided is the shortest such string that satisfies these properties.
str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"
Write a function to solve the shortest common supersequence problem. Your function should take two strings as input and return the shortest common supersequence.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to find the shortest string that contains both of the original strings as subsequences. The brute force approach is to explore every possible combination of characters from the two strings, checking if that combination is a valid supersequence. We keep looking at every possible combination until we find the absolute shortest.
Here's how the algorithm would work step-by-step:
def shortest_common_supersequence_brute_force(string1, string2):
shortest_supersequence = None
def generate_merged_strings(index1, index2, current_string):
nonlocal shortest_supersequence
# If we've reached the end of both strings, check the supersequence
if index1 == len(string1) and index2 == len(string2):
if is_subsequence(string1, current_string) and is_subsequence(string2, current_string):
if shortest_supersequence is None or len(current_string) < len(shortest_supersequence):
shortest_supersequence = current_string
return
# If we still have characters in string1, explore including them
if index1 < len(string1):
generate_merged_strings(index1 + 1, index2, current_string + string1[index1])
# If we still have characters in string2, explore including them
if index2 < len(string2):
generate_merged_strings(index1, index2 + 1, current_string + string2[index2])
def is_subsequence(subsequence, supersequence):
subsequence_index = 0
supersequence_index = 0
while subsequence_index < len(subsequence) and supersequence_index < len(supersequence):
if subsequence[subsequence_index] == supersequence[supersequence_index]:
subsequence_index += 1
supersequence_index += 1
# Ensure all characters in subsequence are found
return subsequence_index == len(subsequence)
generate_merged_strings(0, 0, "")
# Return the shortest supersequence found
return shortest_supersequence
The best way to find the shortest combined string containing two given strings is by recognizing the overlap between them. We can find the longest shared part, and cleverly merge the strings around that.
Here's how the algorithm would work step-by-step:
def shortest_common_supersequence(string1, string2):
length_string1 = len(string1)
length_string2 = len(string2)
# Initialize a table to store lengths of common subsequences.
longest_common_subsequence_lengths = [([0] * (length_string2 + 1)) for _ in range(length_string1 + 1)]
for i in range(1, length_string1 + 1):
for j in range(1, length_string2 + 1):
if string1[i - 1] == string2[j - 1]:
# If characters match, increment length of common subsequence.
longest_common_subsequence_lengths[i][j] = longest_common_subsequence_lengths[i - 1][j - 1] + 1
else:
# Pick the longest amongst the possible options.
longest_common_subsequence_lengths[i][j] = max(longest_common_subsequence_lengths[i - 1][j], longest_common_subsequence_lengths[i][j - 1])
# Backtrack to construct the shortest common supersequence.
i = length_string1
j = length_string2
shortest_common_supersequence_string = ""
while i > 0 and j > 0:
if string1[i - 1] == string2[j - 1]:
# If characters match, include the char and move diagonally.
shortest_common_supersequence_string = string1[i - 1] + shortest_common_supersequence_string
i -= 1
j -= 1
else:
# Move to the cell with the longer common subsequence.
if longest_common_subsequence_lengths[i - 1][j] > longest_common_subsequence_lengths[i][j - 1]:
# Add the character from string1.
shortest_common_supersequence_string = string1[i - 1] + shortest_common_supersequence_string
i -= 1
else:
# Add the character from string2.
shortest_common_supersequence_string = string2[j - 1] + shortest_common_supersequence_string
j -= 1
# Add any remaining characters from string1.
while i > 0:
shortest_common_supersequence_string = string1[i - 1] + shortest_common_supersequence_string
i -= 1
# Add any remaining characters from string2.
while j > 0:
shortest_common_supersequence_string = string2[j - 1] + shortest_common_supersequence_string
j -= 1
return shortest_common_supersequence_string
Case | How to Handle |
---|---|
Both input strings are empty | Return an empty string since the shortest common supersequence of two empty strings is an empty string. |
One input string is empty, the other is not | Return the non-empty string as it is the shortest common supersequence. |
Input strings are identical | Return either of the input strings as the shortest common supersequence. |
Very long input strings (potential memory issues with DP table) | Consider using a more space-efficient algorithm or technique such as rolling arrays or divide and conquer if the strings are extremely long. |
Input strings contain non-ASCII characters | Ensure the character comparison and string concatenation handles Unicode/UTF-8 characters correctly. |
Strings with overlapping characters but no common subsequence | The LCS-based approach correctly identifies the overlapping characters to construct the SCS. |
Multiple valid shortest common supersequences exist | Return any one of the valid shortest common supersequences since the problem statement doesn't require a specific one. |
Null input strings | Throw an IllegalArgumentException or return null/empty string depending on the API contract, after checking for null inputs. |