Taro Logo

Longest Common Subsequence

Medium
Meta logo
Meta
Topics:
StringsDynamic Programming

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

  • For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both strings.

Example 1:

Input: text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2:

Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" and its length is 3.

Example 3:

Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.

Constraints:

  • 1 <= text1.length, text2.length <= 1000
  • text1 and text2 consist of only lowercase English characters.

Solution


Longest Common Subsequence

Naive Approach (Recursion)

A straightforward approach is to use recursion. We can compare the last characters of the two strings. If they match, we increment the length of the LCS and recursively find the LCS of the remaining prefixes. If they don't match, we take the maximum of the LCS obtained by excluding the last character from either text1 or text2.

Code (Python):

def longestCommonSubsequence_recursive(text1, text2):
    def lcs(i, j):
        if i == 0 or j == 0:
            return 0
        if text1[i-1] == text2[j-1]:
            return 1 + lcs(i-1, j-1)
        else:
            return max(lcs(i-1, j), lcs(i, j-1))

    return lcs(len(text1), len(text2))

Time Complexity: O(2^(m+n)), where m and n are the lengths of text1 and text2 respectively. This is because, in the worst case, each call branches into two recursive calls.

Space Complexity: O(m+n) due to the recursion depth.

Optimal Approach (Dynamic Programming)

The recursive approach has overlapping subproblems, making it inefficient. Dynamic programming can significantly improve performance by storing the results of subproblems and reusing them.

We can create a 2D table dp where dp[i][j] stores the length of the LCS of text1[0...i-1] and text2[0...j-1]. The table is filled as follows:

  • If text1[i-1] == text2[j-1], then dp[i][j] = dp[i-1][j-1] + 1
  • Else, dp[i][j] = max(dp[i-1][j], dp[i][j-1])

The base case is dp[i][0] = 0 and dp[0][j] = 0 for all i and j.

Code (Python):

def longestCommonSubsequence_dp(text1, text2):
    m = len(text1)
    n = len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i-1] == text2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])

    return dp[m][n]

Time Complexity: O(m*n), where m and n are the lengths of text1 and text2 respectively. We iterate through the dp table once.

Space Complexity: O(m*n) to store the dp table.

Edge Cases:

  1. Empty strings: If either text1 or text2 is empty, the LCS is 0. The DP solution handles this case correctly because the dp table is initialized with 0s.
  2. No common subsequence: If text1 and text2 have no characters in common, the LCS is 0. The DP solution correctly computes this because dp[i][j] will remain 0 for all i and j.
  3. Identical strings: If text1 and text2 are identical, the LCS is equal to the length of either string. The DP solution will correctly compute this.

Further Optimization (Space)

The space complexity can be further optimized to O(min(m, n)) by using only two rows of the DP table at a time, as dp[i][j] only depends on the current row and the previous row. However, this optimization makes the code slightly more complex.