You are given a string s
and an array of strings words
. All the strings of words
are of the same length.
A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"] Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"] Output: []
Explanation: There is no concatenated substring.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"] Output: [6,9,12]
Explanation:
The substring starting at 6 is "foobarthe"
. It is the concatenation of ["foo","bar","the"]
.
The substring starting at 9 is "barthefoo"
. It is the concatenation of ["bar","the","foo"]
.
The substring starting at 12 is "thefoobar"
. It is the concatenation of ["the","foo","bar"]
.
Constraints:
1 <= s.length <= 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
and words[i]
consist of lowercase English letters.from collections import Counter
def find_substring(s: str, words: list[str]) -> list[int]:
"""Finds the starting indices of concatenated substrings in s."""
if not s or not words:
return []
word_length = len(words[0])
num_words = len(words)
substring_length = word_length * num_words
result = []
# Count the frequency of each word in the words array.
word_count = Counter(words)
# Iterate through the string s.
for i in range(len(s) - substring_length + 1):
# Create a substring of length substring_length.
substring = s[i : i + substring_length]
# Create a counter for the words in the substring.
substring_word_count = Counter()
# Iterate through the substring and count the words.
for j in range(0, substring_length, word_length):
word = substring[j : j + word_length]
substring_word_count[word] += 1
# If the word counts are the same, add the index to the result.
if substring_word_count == word_count:
result.append(i)
return result
# Example Usage
s = "barfoothefoobarman"
words = ["foo", "bar"]
print(find_substring(s, words)) # Output: [0, 9]
s = "wordgoodgoodgoodbestword"
words = ["word", "good", "best", "word"]
print(find_substring(s, words)) # Output: []
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
print(find_substring(s, words)) # Output: [6, 9, 12]
s
or the words
array is empty. It calculates the length of each word, the total number of words, and the total length of the concatenated substring we're looking for.Counter
object, word_count
, stores the frequency of each word in the words
array. This will be used to compare against the frequency of words in the substrings of s
.s
using a sliding window approach. The window size is equal to the total length of the concatenated substring.Counter
object, substring_word_count
, is created to store the frequency of each word in the current substring. The substring is divided into chunks of size word_length
, and each chunk is treated as a word.substring_word_count
is compared with the word_count
. If they are identical, it means the current substring is a valid concatenation of the words in words
, and the starting index i
is added to the result
list.result
list containing the starting indices of all valid concatenated substrings.A brute-force solution would involve generating all possible permutations of the words
array and then searching for each permutation within the string s
. This approach is highly inefficient because generating permutations has a time complexity of O(n!), where n is the number of words. Moreover, searching for each permutation in s
would further increase the time complexity.
The provided solution is an optimized approach that avoids generating permutations. Instead, it uses a sliding window and frequency counting to efficiently identify concatenated substrings. By using Counter
objects, the solution achieves a time complexity of O(N * M), where N is the length of the string s
and M is the number of words.
Outer Loop: The outer loop iterates through the string s
up to len(s) - substring_length + 1
times. This loop runs O(N - M * L) times, where N is the length of s
, M is the number of words, and L is the length of each word.
Inner Loop: The inner loop iterates through the substring in steps of word_length
. This loop runs M times, where M is the number of words.
Counter Comparison: Comparing the substring_word_count
with word_count
takes O(M) time in the worst case.
Thus, the overall time complexity is O((N - M * L) * M), which simplifies to O(N * M) when M * L is relatively small compared to N. If we consider L a constant, then the runtime is O(N*M).
word_count: The word_count
Counter stores the frequency of each word in words
. The space required is O(M), where M is the number of words.
substring_word_count: The substring_word_count
Counter stores the frequency of each word in the current substring. The space required is O(M).
result: The result
list stores the starting indices of the concatenated substrings. In the worst case, it could store up to N - M * L indices, where N is the length of s
, M is the number of words, and L is the length of each word. However, in many practical cases, the number of concatenated substrings is much smaller than N - M * L.
Therefore, the overall space complexity is O(M).
s
or the words
array is empty, the function should return an empty list.words
array have different lengths, the function should return an empty list because the problem statement specifies that all words have the same length.words
array, the substring_word_count
will not match the word_count
, and the substring will be skipped.Counter
objects handle duplicate words correctly by keeping track of their frequencies.result
list.