Given two strings s and t, return the number of distinct subsequences of s which equals t.
For example:
Input: s = "rabbbit", t = "rabbit" Output: 3 Explanation: As shown below, there are 3 ways you can generate "rabbit" from s. rabbbit rabbbit rabbbit
Input: s = "babgbag", t = "bag" Output: 5 Explanation: As shown below, there are 5 ways you can generate "bag" from s. babgbag babgba babgb babgb babgbag
Write a function that takes two strings as input and returns an integer representing the number of distinct subsequences. Consider edge cases such as empty strings and different lengths of input strings. Explain the time and space complexity of your solution.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to find how many times one string appears as a subsequence within another. The brute force approach involves checking every possible way to select characters from the longer string to see if they match the shorter string.
Here's how the algorithm would work step-by-step:
def distinct_subsequences_brute_force(longer_string, shorter_string):
number_of_subsequences = 0
def generate_subsequence_and_check(index, current_subsequence):
nonlocal number_of_subsequences
# Base case: If the subsequence matches target.
if len(current_subsequence) == len(shorter_string):
if current_subsequence == shorter_string:
number_of_subsequences += 1
return
# Base case: If we've exhausted the longer string.
if index == len(longer_string):
return
# Explore including the current character.
generate_subsequence_and_check(
index + 1,
current_subsequence + longer_string[index],
)
# Explore excluding the current character.
# We explore both with and without the current char.
generate_subsequence_and_check(index + 1, current_subsequence)
generate_subsequence_and_check(0, "")
return number_of_subsequences
The most efficient way to solve this problem involves figuring out how many times the smaller string appears as a subsequence within the larger string. Instead of brute-force searching every possible subsequence, we use a clever method to reuse already computed results. The key is to break the problem down into smaller, overlapping subproblems and store their solutions to avoid redundant calculations.
Here's how the algorithm would work step-by-step:
def distinct_subsequences(larger_string, smaller_string):
larger_string_length = len(larger_string)
smaller_string_length = len(smaller_string)
# dp_table[i][j] is the number of distinct subsequences
# of larger_string[:i] that equals smaller_string[:j]
dp_table = [[0] * (smaller_string_length + 1) for _ in range(larger_string_length + 1)]
# Initialization: Empty smaller_string is a subsequence of any larger_string
for i in range(larger_string_length + 1):
dp_table[i][0] = 1
for i in range(1, larger_string_length + 1):
for j in range(1, smaller_string_length + 1):
# If chars match, we add two possibilities.
if larger_string[i - 1] == smaller_string[j - 1]:
dp_table[i][j] = dp_table[i - 1][j - 1] + dp_table[i - 1][j]
# No match, inherit from the previous larger_string state.
else:
dp_table[i][j] = dp_table[i - 1][j]
return dp_table[larger_string_length][smaller_string_length]
Case | How to Handle |
---|---|
Empty string S or T | If T is empty, there's one subsequence (empty string); if S is empty but T isn't, there are zero subsequences. |
S is shorter than T | Return 0 immediately, as T cannot be a subsequence of S. |
S and T are identical | Return 1, as S is trivially a subsequence of itself. |
T contains characters not in S | Return 0, as T cannot be formed as a subsequence of S. |
S and T are very long strings (close to the maximum allowed length) | Ensure that the dynamic programming table's size does not exceed available memory and that integer overflow is handled correctly in the calculation of subsequence counts. |
S and T contain only a single repeating character (e.g., S = 'aaaa', T = 'aa') | Handle the combinatorial calculation correctly, where the number of subsequences will involve binomial coefficients. |
Integer overflow when calculating the number of subsequences | Use a larger integer data type (e.g., long long in C++, long in Java) or apply modulo operation to prevent overflow. |
Large number of duplicate characters in S and T | Dynamic programming approach naturally handles duplicates, counting each distinct subsequence instance. |