You are given two strings, text1
and text2
. Your task is to find the length of their longest common subsequence (LCS). A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
A common subsequence of two strings is a subsequence that is common to both strings. If there is no common subsequence, return 0.
Example 1:
text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace", and its length is 3.
Example 2:
text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc", and its length is 3.
Example 3:
text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no common subsequence, so the result is 0.
Could you describe your approach to solving this problem and provide an implementation?
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The longest common subsequence problem asks us to find the longest sequence of elements that appear in the same order within two different sequences. A brute force approach means we'll explore absolutely every possible subsequence of one sequence and check if it is a subsequence of the other sequence.
Here's how the algorithm would work step-by-step:
def longest_common_subsequence_brute_force(first_sequence, second_sequence):
all_subsequences = generate_all_subsequences(first_sequence)
common_subsequences = []
for subsequence in all_subsequences:
if is_subsequence(subsequence, second_sequence):
common_subsequences.append(subsequence)
if not common_subsequences:
return ""
longest_subsequence = ""
for subsequence in common_subsequences:
if len(subsequence) > len(longest_subsequence):
longest_subsequence = subsequence
return longest_subsequence
def generate_all_subsequences(sequence):
subsequences = []
number_of_elements = len(sequence)
for i in range(1 << number_of_elements):
subsequence = ""
for j in range(number_of_elements):
# If the j-th bit is set in i, include the element
if (i >> j) & 1:
subsequence += sequence[j]
subsequences.append(subsequence)
return subsequences
def is_subsequence(subsequence, sequence):
subsequence_index = 0
sequence_index = 0
# Iterate through both sequences to check
while subsequence_index < len(subsequence) and sequence_index < len(sequence):
if subsequence[subsequence_index] == sequence[sequence_index]:
subsequence_index += 1
sequence_index += 1
# If we've reached the end of the subsequence, it is a subsequence
return subsequence_index == len(subsequence)
if __name__ == '__main__':
sequence_one = "ABCDGH"
sequence_two = "AEDFHR"
# Finding the LCS will validate impl
longest_common = longest_common_subsequence_brute_force(sequence_one, sequence_two)
print(f"Longest Common Subsequence: {longest_common}")
sequence_one = "AGGTAB"
sequence_two = "GXTXAYB"
longest_common = longest_common_subsequence_brute_force(sequence_one, sequence_two)
# Finding the LCS will validate impl
print(f"Longest Common Subsequence: {longest_common}")
sequence_one = "ABCDAF"
sequence_two = "ACBCF"
longest_common = longest_common_subsequence_brute_force(sequence_one, sequence_two)
# Finding the LCS will validate impl
print(f"Longest Common Subsequence: {longest_common}")
The best way to find the longest common subsequence is to build up a table that shows the lengths of common subsequences for different parts of the input sequences. This avoids recomputing the same information over and over again, leading to a much faster solution.
Here's how the algorithm would work step-by-step:
def longest_common_subsequence(string1, string2):
length_of_string1 = len(string1)
length_of_string2 = len(string2)
# Initialize a table to store lengths of common subsequences
dp_table = [([0] * (length_of_string2 + 1)) for _ in range(length_of_string1 + 1)]
for i in range(1, length_of_string1 + 1):
for j in range(1, length_of_string2 + 1):
# If characters match, increment length from diagonal
if string1[i - 1] == string2[j - 1]:
dp_table[i][j] = dp_table[i - 1][j - 1] + 1
# If no match, take the max from top or left
else:
dp_table[i][j] = max(dp_table[i - 1][j], dp_table[i][j - 1])
# Start building the subsequence backwards
index = dp_table[length_of_string1][length_of_string2]
longest_subsequence = [''] * index
i = length_of_string1
j = length_of_string2
# Traceback to construct the LCS
while i > 0 and j > 0:
if string1[i - 1] == string2[j - 1]:
# If matched, add character to LCS
longest_subsequence[index - 1] = string1[i - 1]
i -= 1
j -= 1
index -= 1
# If not matched, move to the larger value
elif dp_table[i - 1][j] > dp_table[i][j - 1]:
i -= 1
else:
j -= 1
return "".join(longest_subsequence)
Case | How to Handle |
---|---|
One or both input strings are null or empty | Return an empty string or null depending on requirements when either input is invalid. |
One or both input strings are very long (e.g., exceeding reasonable memory limits) | The dynamic programming table can consume significant memory; consider iterative approaches or check for potential out-of-memory errors. |
Input strings are identical | The longest common subsequence should be the entire string in this case. |
Input strings have no common characters | The longest common subsequence should be an empty string. |
Input strings contain non-ASCII characters | Ensure the character comparison and storage are Unicode-aware to handle diverse character sets correctly. |
The same character appears multiple times in both strings, but not in the same order. | The dynamic programming approach correctly finds the longest subsequence, even if characters are interleaved. |
Integer overflow in the DP table (if lengths are close to the maximum integer value) | Use appropriate data types (e.g., long) or check for potential overflows during the DP table updates if very long strings are expected. |
One string is a substring of the other. | The solution will correctly find the shorter string as the LCS. |