You are given a string s
and an array of strings words
. All the strings of words
are of the same length. A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation:
There is no concatenated substring.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation:
The substring starting at 6 is "foobarthe"
. It is the concatenation of ["foo","bar","the"]
.
The substring starting at 9 is "barthefoo"
. It is the concatenation of ["bar","the","foo"]
.
The substring starting at 12 is "thefoobar"
. It is the concatenation of ["the","foo","bar"]
.
What is the most efficient way to implement this function?
from collections import Counter
def find_substring(s: str, words: list[str]) -> list[int]:
"""Finds the starting indices of concatenated substrings in s.
Args:
s: The string to search in.
words: A list of words to search for.
Returns:
A list of the starting indices of the concatenated substrings in s.
"""
if not s or not words:
return []
word_length = len(words[0])
num_words = len(words)
substring_length = word_length * num_words
result = []
word_count = Counter(words)
for i in range(len(s) - substring_length + 1):
seen_words = Counter()
for j in range(num_words):
start_index = i + j * word_length
word = s[start_index:start_index + word_length]
if word in word_count:
seen_words[word] += 1
if seen_words[word] > word_count[word]:
break
else:
break
else:
if seen_words == word_count:
result.append(i)
return result
# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring(s, words)) # Output: [0, 9]
# s = "wordgoodgoodgoodbestword"
# words = ["word", "good", "best", "word"]
# print(find_substring(s, words)) # Output: []
# s = "barfoofoobarthefoobarman"
# words = ["bar", "foo", "the"]
# print(find_substring(s, words)) # Output: [6, 9, 12]
The most straightforward approach is to generate all possible permutations of the words
array and then search for each permutation as a substring within the given string s
. This method is inefficient but serves as a baseline for understanding the problem.
import itertools
def find_substring_brute_force(s: str, words: list[str]) -> list[int]:
"""Finds the starting indices of concatenated substrings in s using brute force.
Args:
s: The string to search in.
words: A list of words to search for.
Returns:
A list of the starting indices of the concatenated substrings in s.
"""
if not s or not words:
return []
word_length = len(words[0])
num_words = len(words)
substring_length = word_length * num_words
result = []
for permutation in itertools.permutations(words):
concatenated_string = ''.join(permutation)
for i in range(len(s) - substring_length + 1):
if s[i:i + substring_length] == concatenated_string:
result.append(i)
return sorted(list(set(result)))
# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring_brute_force(s, words)) # Output: [0, 9]
The optimal solution uses a sliding window approach combined with a hash map (Counter in Python) to efficiently check for concatenated substrings. This approach avoids generating permutations and reduces redundant comparisons.
from collections import Counter
def find_substring(s: str, words: list[str]) -> list[int]:
"""Finds the starting indices of concatenated substrings in s.
Args:
s: The string to search in.
words: A list of words to search for.
Returns:
A list of the starting indices of the concatenated substrings in s.
"""
if not s or not words:
return []
word_length = len(words[0])
num_words = len(words)
substring_length = word_length * num_words
result = []
word_count = Counter(words)
for i in range(len(s) - substring_length + 1):
seen_words = Counter()
for j in range(num_words):
start_index = i + j * word_length
word = s[start_index:start_index + word_length]
if word in word_count:
seen_words[word] += 1
if seen_words[word] > word_count[word]:
break
else:
break
else:
if seen_words == word_count:
result.append(i)
return result
# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring(s, words)) # Output: [0, 9]
# s = "wordgoodgoodgoodbestword"
# words = ["word", "good", "best", "word"]
# print(find_substring(s, words)) # Output: []
# s = "barfoofoobarthefoobarman"
# words = ["bar", "foo", "the"]
# print(find_substring(s, words)) # Output: [6, 9, 12]
words
array, which takes O(n!) time, where n is the number of words. Then, for each permutation, it searches for the concatenated string in the input string s
, which takes O(m * k) time, where m is the length of s
and k is the length of the concatenated string. Thus, the overall runtime complexity is O(n! * m * k), which is extremely inefficient.s
once with a sliding window of size substring_length
. Inside the loop, it iterates through the words in the words
array. The outer loop runs m - k + 1
times, where m
is the length of s
and k
is the total length of all words combined (word_length * num_words
). The inner loop runs num_words
times. Thus, the overall runtime complexity is O(m * n * l), where m
is the length of the string s
, n
is the number of words in the words
array and l
is the length of each word. Because n
and l
are generally much smaller than m
, this can be seen as approximately O(m) in many practical cases, where m is the length of the string s
.words
array, which takes O(n * k) space, where n is the number of words and k is the average length of the words. Additionally, it stores the result list, which can take up to O(m) space in the worst case. Thus, the overall space complexity is O(n * k + m).Counter
) to store the counts of words in the words
array, which takes O(n * l) space, where n
is the number of words and l
is the length of each word. It also uses another hash map to store the seen words, which also takes O(n * l) space. The result list can take up to O(m) space in the worst case. Thus, the overall space complexity is O(n * l + m).s
or the words
array is empty, the function should return an empty list.words
Array: If the words
array contains empty strings, the function may behave unpredictably. The provided code assumes that all words have a non-zero length and the same length.words
array have the same length. If this condition is violated, the function may produce incorrect results. An additional check can be added at the beginning of the function to ensure this condition is met.words
array contains repeated words by using Counter
to keep track of the frequency of each word.s
does not contain any concatenated substrings of the words in the words
array, the function should return an empty list.