You are given a string s
and an array of strings words
. All the strings of words
are of the same length.
A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation:
There is no concatenated substring.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation:
The substring starting at 6 is "foobarthe"
. It is the concatenation of ["foo","bar","the"]
.
The substring starting at 9 is "barthefoo"
. It is the concatenation of ["bar","the","foo"]
.
The substring starting at 12 is "thefoobar"
. It is the concatenation of ["the","foo","bar"]
.
Constraints:
1 <= s.length <= 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
and words[i]
consist of lowercase English letters.from collections import Counter
def find_substring(s: str, words: list[str]) -> list[int]:
"""Finds the starting indices of concatenated substrings in s."""
if not s or not words:
return []
word_len = len(words[0])
num_words = len(words)
total_len = word_len * num_words
result = []
# Count the frequency of each word in the 'words' array
word_count = Counter(words)
# Iterate through the string 's'
for i in range(len(s) - total_len + 1):
# Create a counter for the words found in the current substring of 's'
seen_words = Counter()
# Check if the current substring is a valid concatenation of words
for j in range(num_words):
# Extract a word from the current substring
word = s[i + j * word_len:i + (j + 1) * word_len]
# If the word is not in the original 'words' array, break the loop
if word not in word_count:
break
# Update the count of the seen word
seen_words[word] += 1
# If the count of the seen word exceeds the count in the original 'words' array, break the loop
if seen_words[word] > word_count[word]:
break
else:
# If the inner loop completes without breaking, it means a valid concatenation is found
result.append(i)
return result
# Example Usage
s = "barfoothefoobarman"
words = ["foo", "bar"]
print(find_substring(s, words))
s = "wordgoodgoodgoodbestword"
words = ["word", "good", "best", "word"]
print(find_substring(s, words))
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
print(find_substring(s, words))
Initialization:
word_len
: Length of each word in the words
array.num_words
: Number of words in the words
array.total_len
: Total length of the concatenated string (i.e., word_len * num_words
).result
: List to store the starting indices of the concatenated substrings.word_count
: A Counter
object that stores the frequency of each word in the words
array.Outer Loop:
i = 0
to len(s) - total_len + 1
. This ensures that we consider all possible starting positions for a concatenated substring of length total_len
within the string s
.Inner Loop and Validation:
i
, the inner loop iterates num_words
times.word
of length word_len
from the string s
starting at position i + j * word_len
.word
is present in the word_count
.
word_count
, it means this substring cannot be a valid concatenation, so it breaks the inner loop.seen_words
counter to keep track of the frequency of each word seen in the current substring.word
in seen_words
exceeds its count in word_count
, it means this substring cannot be a valid concatenation, so it breaks the inner loop.Result:
else
block of the for
loop is executed), it means the substring starting at index i
is a valid concatenation of the words in words
.i
is added to the result
list.Return Value:
result
list containing the starting indices of all concatenated substrings found in s
.s
, M is the number of words in the words
array, and K is the length of each word. The outer loop iterates N - M*K times, and the inner loop iterates M times, extracting a substring of length K in each iteration.words
array. This is due to the space used by the word_count
and seen_words
counters, which store at most M words.s
is empty, the function should return an empty list because there can be no concatenated substrings.words
is empty, the function should return an empty list for the same reason.words
list have the same length. If this condition is violated, the algorithm may not function correctly. You might want to add a check to ensure all words have the same length before proceeding.words
list correctly by using Counter
to keep track of word frequencies. If the frequency of a word in the substring exceeds its frequency in the word list, the substring is considered invalid.