Taro Logo

Substring with Concatenation of All Words

Hard
Google logo
Google
0 views
Topics:
ArraysStringsSliding Windows

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

For example:

  1. s = "barfoothefoobarman", words = ["foo","bar"] Output: [0,9] Explanation: The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words. The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
  2. s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"] Output: [] Explanation: There is no concatenated substring.
  3. s = "barfoofoobarthefoobarman", words = ["bar","foo","the"] Output: [6,9,12] Explanation: The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"]. The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"]. The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].

Solve this problem with the optimal time and space complexity, taking into account edge cases. Explain your solution.

Solution


Brute Force Solution

A naive approach would be to generate all possible permutations of the words array and then check if each permutation exists as a substring in the given string s. This method is highly inefficient, especially when the words array is large, due to the factorial time complexity of generating permutations.

Algorithm

  1. Generate all permutations of the words array.
  2. For each permutation, concatenate the words into a single string.
  3. Check if the concatenated string is a substring of s.
  4. If it is, record the starting index.

Complexity Analysis

  • Time Complexity: O(N! * k * M), where N is the number of words in words, k is the length of each word, and M is the length of the string s. Generating all permutations takes O(N!), concatenating the words takes O(N * k), and searching for the substring takes O(M).
  • Space Complexity: O(N! * N * k), to store all permutations.

This approach is not practical for any reasonably sized input.

Optimal Solution: Sliding Window with Hash Map

A more efficient approach involves using a sliding window and a hash map to track the frequency of words. This avoids the costly generation of permutations.

Algorithm

  1. Create a hash map to store the frequency of each word in the words array.
  2. Initialize variables: word_length, num_words, total_length.
  3. Iterate through the string s using a sliding window of size total_length.
  4. For each window, create a temporary hash map to track the frequency of words within the window.
  5. Slide the window by word_length at a time, checking if each word in the window exists in the original hash map and updating the temporary hash map.
  6. If the temporary hash map matches the original hash map, record the starting index of the window.

Code Implementation (Python)

from collections import defaultdict

def find_substring(s, words):
    if not s or not words:
        return []

    word_length = len(words[0])
    num_words = len(words)
    total_length = word_length * num_words
    result = []

    word_count = defaultdict(int)
    for word in words:
        word_count[word] += 1

    for i in range(len(s) - total_length + 1):
        seen_words = defaultdict(int)
        words_found = 0

        for j in range(num_words):
            word_start = i + j * word_length
            word = s[word_start:word_start + word_length]

            if word in word_count:
                seen_words[word] += 1
                if seen_words[word] > word_count[word]:
                    break
                words_found += 1
            else:
                break

        if words_found == num_words:
            result.append(i)

    return result

Complexity Analysis

  • Time Complexity: O(M * N * k), where M is the length of the string s, N is the number of words in words, and k is the length of each word. The outer loop iterates through s in O(M) time, and the inner loop iterates through words in O(N) time, with each word comparison taking O(k) time. Because k is bounded and typically much smaller than M or N, this is often considered O(M * N).
  • Space Complexity: O(N * k), where N is the number of words in words and k is the length of each word, to store the hash maps word_count and seen_words.

Edge Cases

  • Empty input string s or empty words array.
  • words array containing words of different lengths.
  • s is shorter than the concatenated length of all words in words.
  • words containing duplicate words.