Taro Logo

Substring with Concatenation of All Words

Medium
1 views
a month ago

You are given a string s and an array of strings words. All the strings of words are of the same length. A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]

Output: [0,9]

Explanation:

The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words. The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Output: []

Explanation:

There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Output: [6,9,12]

Explanation:

The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"]. The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"]. The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].

What is the most efficient way to implement this function?

Sample Answer
from collections import Counter

def find_substring(s: str, words: list[str]) -> list[int]:
    """Finds the starting indices of concatenated substrings in s.

    Args:
        s: The string to search in.
        words: A list of words to search for.

    Returns:
        A list of the starting indices of the concatenated substrings in s.
    """
    if not s or not words:
        return []

    word_length = len(words[0])
    num_words = len(words)
    substring_length = word_length * num_words
    result = []

    word_count = Counter(words)

    for i in range(len(s) - substring_length + 1):
        seen_words = Counter()
        for j in range(num_words):
            start_index = i + j * word_length
            word = s[start_index:start_index + word_length]
            if word in word_count:
                seen_words[word] += 1
                if seen_words[word] > word_count[word]:
                    break
            else:
                break
        else:
            if seen_words == word_count:
                result.append(i)

    return result

# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring(s, words))  # Output: [0, 9]

# s = "wordgoodgoodgoodbestword"
# words = ["word", "good", "best", "word"]
# print(find_substring(s, words))  # Output: []

# s = "barfoofoobarthefoobarman"
# words = ["bar", "foo", "the"]
# print(find_substring(s, words))  # Output: [6, 9, 12]

Brute Force Solution

The most straightforward approach is to generate all possible permutations of the words array and then search for each permutation as a substring within the given string s. This method is inefficient but serves as a baseline for understanding the problem.

import itertools

def find_substring_brute_force(s: str, words: list[str]) -> list[int]:
    """Finds the starting indices of concatenated substrings in s using brute force.

    Args:
        s: The string to search in.
        words: A list of words to search for.

    Returns:
        A list of the starting indices of the concatenated substrings in s.
    """
    if not s or not words:
        return []

    word_length = len(words[0])
    num_words = len(words)
    substring_length = word_length * num_words
    result = []

    for permutation in itertools.permutations(words):
        concatenated_string = ''.join(permutation)
        for i in range(len(s) - substring_length + 1):
            if s[i:i + substring_length] == concatenated_string:
                result.append(i)

    return sorted(list(set(result)))

# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring_brute_force(s, words))  # Output: [0, 9]

Optimal Solution

The optimal solution uses a sliding window approach combined with a hash map (Counter in Python) to efficiently check for concatenated substrings. This approach avoids generating permutations and reduces redundant comparisons.

from collections import Counter

def find_substring(s: str, words: list[str]) -> list[int]:
    """Finds the starting indices of concatenated substrings in s.

    Args:
        s: The string to search in.
        words: A list of words to search for.

    Returns:
        A list of the starting indices of the concatenated substrings in s.
    """
    if not s or not words:
        return []

    word_length = len(words[0])
    num_words = len(words)
    substring_length = word_length * num_words
    result = []

    word_count = Counter(words)

    for i in range(len(s) - substring_length + 1):
        seen_words = Counter()
        for j in range(num_words):
            start_index = i + j * word_length
            word = s[start_index:start_index + word_length]
            if word in word_count:
                seen_words[word] += 1
                if seen_words[word] > word_count[word]:
                    break
            else:
                break
        else:
            if seen_words == word_count:
                result.append(i)

    return result

# Example Usage:
# s = "barfoothefoobarman"
# words = ["foo", "bar"]
# print(find_substring(s, words))  # Output: [0, 9]

# s = "wordgoodgoodgoodbestword"
# words = ["word", "good", "best", "word"]
# print(find_substring(s, words))  # Output: []

# s = "barfoofoobarthefoobarman"
# words = ["bar", "foo", "the"]
# print(find_substring(s, words))  # Output: [6, 9, 12]

Big(O) Runtime Analysis

  • Brute Force: The brute force solution involves generating all permutations of the words array, which takes O(n!) time, where n is the number of words. Then, for each permutation, it searches for the concatenated string in the input string s, which takes O(m * k) time, where m is the length of s and k is the length of the concatenated string. Thus, the overall runtime complexity is O(n! * m * k), which is extremely inefficient.
  • Optimal Solution: The optimal solution iterates through the string s once with a sliding window of size substring_length. Inside the loop, it iterates through the words in the words array. The outer loop runs m - k + 1 times, where m is the length of s and k is the total length of all words combined (word_length * num_words). The inner loop runs num_words times. Thus, the overall runtime complexity is O(m * n * l), where m is the length of the string s, n is the number of words in the words array and l is the length of each word. Because n and l are generally much smaller than m, this can be seen as approximately O(m) in many practical cases, where m is the length of the string s.

Big(O) Space Usage Analysis

  • Brute Force: The brute force solution stores the permutations of the words array, which takes O(n * k) space, where n is the number of words and k is the average length of the words. Additionally, it stores the result list, which can take up to O(m) space in the worst case. Thus, the overall space complexity is O(n * k + m).
  • Optimal Solution: The optimal solution uses a hash map (Counter) to store the counts of words in the words array, which takes O(n * l) space, where n is the number of words and l is the length of each word. It also uses another hash map to store the seen words, which also takes O(n * l) space. The result list can take up to O(m) space in the worst case. Thus, the overall space complexity is O(n * l + m).

Edge Cases

  1. Empty Input: If either the input string s or the words array is empty, the function should return an empty list.
  2. Empty Words in words Array: If the words array contains empty strings, the function may behave unpredictably. The provided code assumes that all words have a non-zero length and the same length.
  3. Words with Different Lengths: The problem statement specifies that all words in the words array have the same length. If this condition is violated, the function may produce incorrect results. An additional check can be added at the beginning of the function to ensure this condition is met.
  4. Repeated Words: The code correctly handles cases where the words array contains repeated words by using Counter to keep track of the frequency of each word.
  5. No Concatenated Substring: If the input string s does not contain any concatenated substrings of the words in the words array, the function should return an empty list.
  6. Overlapping Concatenated Substrings: If there are overlapping concatenated substrings, the function should return the starting indices of all such substrings.