You are given a string s
and an array of strings words
. All the strings of words
are of the same length.
A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
For example:
s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: There is no concatenated substring.s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation:
The substring starting at 6 is "foobarthe"
. It is the concatenation of ["foo","bar","the"]
.
The substring starting at 9 is "barthefoo"
. It is the concatenation of ["bar","the","foo"]
.
The substring starting at 12 is "thefoobar"
. It is the concatenation of ["the","foo","bar"]
.Solve this problem with the optimal time and space complexity, taking into account edge cases. Explain your solution.
A naive approach would be to generate all possible permutations of the words
array and then check if each permutation exists as a substring in the given string s
. This method is highly inefficient, especially when the words
array is large, due to the factorial time complexity of generating permutations.
words
array.s
.words
, k is the length of each word, and M is the length of the string s
. Generating all permutations takes O(N!), concatenating the words takes O(N * k), and searching for the substring takes O(M).This approach is not practical for any reasonably sized input.
A more efficient approach involves using a sliding window and a hash map to track the frequency of words. This avoids the costly generation of permutations.
words
array.word_length
, num_words
, total_length
.s
using a sliding window of size total_length
.word_length
at a time, checking if each word in the window exists in the original hash map and updating the temporary hash map.from collections import defaultdict
def find_substring(s, words):
if not s or not words:
return []
word_length = len(words[0])
num_words = len(words)
total_length = word_length * num_words
result = []
word_count = defaultdict(int)
for word in words:
word_count[word] += 1
for i in range(len(s) - total_length + 1):
seen_words = defaultdict(int)
words_found = 0
for j in range(num_words):
word_start = i + j * word_length
word = s[word_start:word_start + word_length]
if word in word_count:
seen_words[word] += 1
if seen_words[word] > word_count[word]:
break
words_found += 1
else:
break
if words_found == num_words:
result.append(i)
return result
s
, N is the number of words in words
, and k is the length of each word. The outer loop iterates through s
in O(M) time, and the inner loop iterates through words
in O(N) time, with each word comparison taking O(k) time. Because k is bounded and typically much smaller than M or N, this is often considered O(M * N).words
and k is the length of each word, to store the hash maps word_count
and seen_words
.s
or empty words
array.words
array containing words of different lengths.s
is shorter than the concatenated length of all words in words
.words
containing duplicate words.