Taro Logo

Substring with Concatenation of All Words

Medium
2 months ago

You are given a string s and an array of strings words. All the strings of words are of the same length. A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]

Output: [0,9]

Explanation: The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words. The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Output: []

Explanation: There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Output: [6,9,12]

Explanation: The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"]. The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"]. The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].

Sample Answer
from collections import defaultdict

def find_substring(s, words):
    if not s or not words:
        return []

    word_len = len(words[0])
    num_words = len(words)
    total_len = word_len * num_words
    result = []
    
    # Create a dictionary to store the frequency of each word in the 'words' array
    word_count = defaultdict(int)
    for word in words:
        word_count[word] += 1

    # Iterate through the string 's'
    for i in range(len(s) - total_len + 1):
        seen_words = defaultdict(int)
        words_found = 0

        # Check if the substring is a concatenation of words in the 'words' array
        for j in range(num_words):
            start = i + j * word_len
            word = s[start:start + word_len]

            if word in word_count:
                seen_words[word] += 1
                if seen_words[word] <= word_count[word]:
                    words_found += 1
                else:
                    break  # Current word exceeds its count in 'words'
            else:
                break  # Current word is not in 'words'

        # If all words were found in the correct frequency, add the starting index to the result
        if words_found == num_words:
            result.append(i)

    return result

# Example Usage:
s = "barfoothefoobarman"
words = ["foo","bar"]
print(find_substring(s, words))  # Output: [0, 9]

s = "wordgoodgoodgoodbestword"
words = ["word","good","best","word"]
print(find_substring(s, words))  # Output: []

s = "barfoofoobarthefoobarman"
words = ["bar","foo","the"]
print(find_substring(s, words))  # Output: [6, 9, 12]

Explanation:

  1. Initialization:
    • word_len: Length of each word in the words array.
    • num_words: Number of words in the words array.
    • total_len: Total length of the concatenated string.
    • result: List to store the starting indices of the concatenated substrings.
    • word_count: Dictionary to store the frequency of each word in the words array.
  2. Word Frequency Count:
    • Iterate through the words array and store the frequency of each word in the word_count dictionary.
  3. String Traversal:
    • Iterate through the string s from index 0 to len(s) - total_len + 1.
    • For each index i, check if the substring starting at i is a concatenation of words in the words array.
  4. Substring Check:
    • seen_words: Dictionary to store the frequency of each word encountered in the current substring.
    • words_found: Counter to track the number of words found in the correct frequency.
    • Iterate through the num_words words in the current substring.
    • Extract each word of length word_len from the substring.
    • If the word is in the word_count dictionary, increment its count in the seen_words dictionary.
    • If the count of the word in the seen_words dictionary is less than or equal to its count in the word_count dictionary, increment the words_found counter.
    • If the count of the word in the seen_words dictionary exceeds its count in the word_count dictionary, break the inner loop.
    • If the word is not in the word_count dictionary, break the inner loop.
  5. Result:
    • If all words were found in the correct frequency (words_found == num_words), add the starting index i to the result list.
  6. Return:
    • Return the result list.

Big O Time Complexity:

The time complexity of the find_substring function can be analyzed as follows:

  • Outer Loop: The outer loop iterates through the string s from index 0 to len(s) - total_len + 1. In the worst case, this loop runs O(n) times, where n is the length of the string s.
  • Inner Loop: The inner loop iterates through the num_words words in the current substring. This loop runs O(m) times, where m is the number of words in the words array.
  • Word Frequency Count: Creating the word_count dictionary takes O(m) time, where m is the number of words in the words array.
  • Dictionary Operations: The dictionary operations (e.g., word in word_count, seen_words[word] += 1) take O(1) time on average.

Therefore, the overall time complexity of the find_substring function is O(n * m), where n is the length of the string s and m is the number of words in the words array.

Big O Space Complexity:

The space complexity of the find_substring function can be analyzed as follows:

  • word_count Dictionary: The word_count dictionary stores the frequency of each word in the words array. In the worst case, the dictionary contains all the words in the words array. Therefore, the space complexity of the word_count dictionary is O(m), where m is the number of words in the words array.
  • seen_words Dictionary: The seen_words dictionary stores the frequency of each word encountered in the current substring. In the worst case, the dictionary contains all the words in the words array. Therefore, the space complexity of the seen_words dictionary is O(m), where m is the number of words in the words array.
  • result List: The result list stores the starting indices of the concatenated substrings. In the worst case, all the substrings in s are concatenated strings. Therefore, the space complexity of the result list is O(n), where n is the length of the string s.

Therefore, the overall space complexity of the find_substring function is O(m + n), where m is the number of words in the words array and n is the length of the string s.