Given a string s
and an array of strings words
, return the number of words[i]
that is a subsequence of s
. A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters. For example, "ace"
is a subsequence of "abcde"
.
Example 1: Input: s = "abcde", words = ["a","bb","acd","ace"] Output: 3 Explanation: There are three strings in words that are a subsequence of s: "a", "acd", "ace".
Example 2: Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"] Output: 2
Constraints:
1 <= s.length <= 5 * 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 50
s
and words[i]
consist of only lowercase English letters.The naive solution is to iterate through each word in the words
array and check if it is a subsequence of the string s
. For each word, we can use two pointers, one for the string s
and one for the current word. We iterate through s
, and if we find a character that matches the current character in the word, we advance the word pointer. If the word pointer reaches the end of the word, it means the word is a subsequence of s
.
def is_subsequence(s, word):
i = 0
j = 0
while i < len(s) and j < len(word):
if s[i] == word[j]:
j += 1
i += 1
return j == len(word)
def num_matching_subsequence_naive(s, words):
count = 0
for word in words:
if is_subsequence(s, word):
count += 1
return count
# Example usage
s = "abcde"
words = ["a", "bb", "acd", "ace"]
result = num_matching_subsequence_naive(s, words)
print(f"Number of matching subsequences: {result}") # Output: 3
An optimized solution is to use a hash map to store the indices of each character in the string s
. Then, for each word in the words
array, we can iterate through the characters of the word and use binary search to find the next index in s
that is greater than the previous index we found. This avoids iterating through the entire string s
for each character in the word.
from collections import defaultdict
def num_matching_subsequence_optimal(s, words):
char_indices = defaultdict(list)
for i, char in enumerate(s):
char_indices[char].append(i)
count = 0
for word in words:
prev_index = -1
is_subseq = True
for char in word:
indices = char_indices[char]
# Binary search for the next valid index
l, r = 0, len(indices) - 1
next_index = -1
while l <= r:
mid = (l + r) // 2
if indices[mid] > prev_index:
next_index = indices[mid]
r = mid - 1
else:
l = mid + 1
if next_index == -1:
is_subseq = False
break
prev_index = next_index
if is_subseq:
count += 1
return count
# Example usage
s = "abcde"
words = ["a", "bb", "acd", "ace"]
result = num_matching_subsequence_optimal(s, words)
print(f"Number of matching subsequences: {result}") # Output: 3
s = "dsahjpjauf"
words = ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"]
result = num_matching_subsequence_optimal(s, words)
print(f"Number of matching subsequences: {result}") # Output: 2
is_subsequence
function takes O(n) time, where n is the length of the string s
.num_matching_subsequence_naive
function iterates through each word in the words
array, and for each word, it calls the is_subsequence
function. Therefore, the overall time complexity is O(m * n * k), where m is the number of words in the words
array, n is the average length of the strings in words
, and k is the length of string s
.char_indices
hash map takes O(k) time, where k is the length of the string s
.words
array, we iterate through the characters of the word and use binary search to find the next index in s
. The binary search takes O(log p) time, where p is the number of occurrences of the character in s
. In the worst case, p can be k.words
array, n is the average length of the strings in words
, and k is the length of the string s
.is_subsequence
function takes O(1) space.num_matching_subsequence_naive
function takes O(1) additional space. Thus, the overall space complexity is O(1).char_indices
hash map takes O(k) space, where k is the length of the string s
(in the worst case, all characters in s
are distinct, and we store an index for each character).num_matching_subsequence_optimal
function takes O(1) additional space. Thus, the overall space complexity is O(k).s
: If the string s
is empty, then no word can be a subsequence of s
, unless the word is also empty. Handle this explicitly in the beginning.words
: An empty word is always a subsequence of any string s
. Count it towards result.s
which amortizes the cost of searching within s
.s
string in a nested manner for each word.s
: If a word contains a character not present in s
, it's automatically not a subsequence. The optimal solution efficiently handles this due to the char_indices
hashmap; a missing key immediately flags the word as not being a subsequence.