Given a string s
, return the number of distinct non-empty subsequences of s
. Since the answer may be very large, return it modulo 10^9 + 7
.
A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., "ace" is a subsequence of "abcde" while "aec" is not).
For example:
s = "abc"
, the output should be 7. The distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".s = "aba"
, the output should be 6. The distinct subsequences are "a", "b", "ab", "aa", "ba", and "aba".s = "aaa"
, the output should be 3. The distinct subsequences are "a", "aa" and "aaa".Constraints:
1 <= s.length <= 2000
s
consists of lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach to finding distinct subsequences involves checking absolutely every possible combination of characters from the input. We essentially create and examine every conceivable subsequence, regardless of its length or content. This ensures that we do not miss any potential solution, even if it's computationally expensive.
Here's how the algorithm would work step-by-step:
def distinct_subsequences_brute_force(input_string):
distinct_subsequence_set = set()
distinct_subsequence_set.add('')
string_length = len(input_string)
for subsequence_length in range(1, string_length + 1):
for i in range(1 << string_length):
if bin(i).count('1') == subsequence_length:
subsequence = ''
index = 0
# Construct the subsequence based on the bitmask
for j in range(string_length):
if (i >> j) & 1:
subsequence += input_string[j]
index += 1
# Add the subsequence to the set if it's distinct
distinct_subsequence_set.add(subsequence)
# Remove the empty string as requested
distinct_subsequence_set.remove('')
return len(distinct_subsequence_set)
To find the number of unique subsequences in a string, we'll build the subsequences character by character. The key idea is to use a mathematical relationship to keep track of the number of distinct subsequences ending with each character we've seen so far.
Here's how the algorithm would work step-by-step:
def distinct_subsequences_two(input_string):
modulo = 10**9 + 7
end_with = {}
total_subsequences = 1
for char in input_string:
new_subsequences = total_subsequences
total_subsequences = (total_subsequences + new_subsequences) % modulo
# Avoid double counting subsequences
if char in end_with:
total_subsequences = (total_subsequences - end_with[char] + modulo) % modulo
end_with[char] = new_subsequences
# Remove the empty string from the final count
return (total_subsequences - 1 + modulo) % modulo
Case | How to Handle |
---|---|
Empty string input | Return 0 since there are no non-empty subsequences. |
String with a single character | Return 1 since the only subsequence is the character itself. |
String with all identical characters (e.g., 'aaaa') | The number of distinct subsequences is equal to the character itself, thus return 1. |
Maximum length string with distinct characters | Ensure the chosen algorithm can handle the maximum input size efficiently, potentially using dynamic programming to avoid exponential time complexity. |
String with mixed case characters (if case sensitivity is an issue) | Clearly define the case sensitivity of the problem (treat 'a' and 'A' as distinct or not) and handle the characters accordingly. |
Potential integer overflow when calculating the number of subsequences. | Use the modulo operator (10^9 + 7) during intermediate calculations to prevent integer overflow. |
String containing special characters or unicode. | Verify the algorithm correctly handles any valid UTF-8 character, especially if relying on character code offsets for indexing. |
Very long string with repeating subsequences (e.g., 'ababababab') | The solution should correctly handle overlapping subsequences and avoid overcounting distinct subsequences using dynamic programming. |