You are given a string s
and an array of strings words
. All the strings of words
are of the same length. A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: There is no concatenated substring.
Constraints:
1 <= s.length <= 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
and words[i]
consist of lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
We want to find where a specific combination of words appears inside a larger piece of text. The brute force method checks every possible starting point in the larger text to see if our word combination is there. It's like trying to fit a puzzle piece starting at every possible location to see if it fits.
Here's how the algorithm would work step-by-step:
def find_substring_brute_force(text, words):
if not text or not words:
return []
word_length = len(words[0])
number_of_words = len(words)
substring_length = word_length * number_of_words
result_indices = []
# Iterate through all possible start indices
for index in range(len(text) - substring_length + 1):
words_seen = []
words_found = 0
# Check if the concatenation of words exists starting at current index
for word_index in range(number_of_words):
current_word_start = index + word_index * word_length
current_word = text[current_word_start:current_word_start + word_length]
# Ensure the words are in order and present
if current_word in words:
words_seen.append(current_word)
words_found += 1
# Verify all the words are present
if words_found == number_of_words:
# Check if the counts of each word matches
temp_words = words[:] # make a copy
is_concat = True
for seen_word in words_seen:
if seen_word in temp_words:
temp_words.remove(seen_word)
else:
is_concat = False
break
if is_concat:
result_indices.append(index)
return result_indices
The key is to avoid checking every single possible starting position. Instead, we use a sliding window approach, focusing on potential starting points that align with the length of the words we are looking for. This significantly reduces redundant checks and computation.
Here's how the algorithm would work step-by-step:
def find_substring(string, words):
if not string or not words:
return []
word_length = len(words[0])
number_of_words = len(words)
substring_length = word_length * number_of_words
result_indices = []
# Store word counts for quick reference
word_count = {}
for word in words:
word_count[word] = word_count.get(word, 0) + 1
# Only check starting positions that are multiples of word_length
for i in range(len(string) - substring_length + 1):
words_seen = {}
words_found = 0
# Iterate through the substring to check each word
for j in range(number_of_words):
word_start = i + j * word_length
current_word = string[word_start:word_start + word_length]
# If the current word is not in the list, break
if current_word not in word_count:
break
words_seen[current_word] = words_seen.get(current_word, 0) + 1
# Break if we see a word more times than in target list
if words_seen[current_word] > word_count[current_word]:
break
words_found += 1
# If all words are found, add the starting index to results
if words_found == number_of_words:
result_indices.append(i)
return result_indices
Case | How to Handle |
---|---|
s or words is null or empty | Return an empty list immediately as there can be no concatenation. |
words array is empty | If words is empty, return an empty list, indicating no valid concatenations can be found. |
s is shorter than the total length of the concatenated words | Return an empty list since s cannot contain the concatenation. |
words contains an empty string | If a word is empty the length calculation must account for zero length strings and the main algorithm should handle comparisons against empty strings. |
words contains duplicate words | Use a frequency map to count the occurrences of each word and compare against that during the search. |
No valid starting indices exist | Return an empty list when no valid starting indices are found after searching the entire string s. |
All words in 'words' are the same. | The frequency map ensures correct counting and the sliding window logic accurately checks for consecutive repetitions. |
Very long string s and large word array, potential for timeout | Optimize the algorithm using a sliding window approach to avoid redundant calculations and improve time complexity. |