Taro Logo

Concatenated Words

Hard
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+2
More companies
Profile picture
Profile picture
24 views
Topics:
ArraysStringsDynamic Programming

Given an array of strings words (without duplicates), return all the concatenated words in the given list of words.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words (not necessarily distinct) in the given array.

Example 1:

Input: words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]
Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
"dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Example 2:

Input: words = ["cat","dog","catdog"]
Output: ["catdog"]

Constraints:

  • 1 <= words.length <= 104
  • 1 <= words[i].length <= 30
  • words[i] consists of only lowercase English letters.
  • All the strings of words are unique.
  • 1 <= sum(words[i].length) <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can a word be concatenated with itself to form a concatenated word? For example, if "cat" is in the dictionary, is "catcat" considered a concatenated word?
  2. Are the words in the dictionary guaranteed to be non-empty strings?
  3. If a word can be formed by multiple concatenations of other words in the dictionary, do I need to find all possible concatenations, or is it sufficient to find any one valid concatenation?
  4. What is the expected maximum length of a single word in the dictionary, and what is the expected maximum number of words in the dictionary?
  5. If a word in the input array is itself present in the dictionary, should it be considered a concatenated word (since it is 'concatenated' from a single word)? Alternatively, does a concatenated word need to be composed of at least two other words from the dictionary?

Brute Force Solution

Approach

The brute force approach to concatenated words is to try every single way to build a longer word by combining smaller words from the given list. We check each possibility to see if it matches the target word.

Here's how the algorithm would work step-by-step:

  1. Take each word from the list of words.
  2. See if this word can be formed by joining other words from the list together.
  3. To do this, try all possible combinations of words from the list.
  4. For each combination, join the words together and check if the result is equal to the original word we picked.
  5. If we find a combination that matches, we know the original word is a concatenated word.
  6. If we try all possible combinations and none of them match, the word is not a concatenated word.
  7. Repeat this process for every word in the list to find all concatenated words.

Code Implementation

def find_concatenated_words_brute_force(words):
    concatenated_words = []

    for target_word in words:
        if is_concatenated(target_word, words):
            concatenated_words.append(target_word)

    return concatenated_words

def is_concatenated(target_word, word_list):
    if not target_word:
        return False

    for i in range(1, len(target_word) + 1):
        prefix = target_word[:i]

        # Need to avoid infinite recursion with the same word
        if prefix in word_list and prefix != target_word:

            suffix = target_word[i:]

            # Check if the suffix is either a word or can be concatenated
            if not suffix or suffix in word_list or is_concatenated(suffix, word_list):
                return True

    return False

Big(O) Analysis

Time Complexity
O(n^2 * m^k)Let n be the number of words in the input list and m be the average length of a word. The brute-force approach iterates through each of the n words. For each word, it explores all possible combinations of other words to see if the word can be formed by concatenating them. In the worst case, to check if a word is concatenated, we might have to try all possible subsets of the other words. The number of subsets can grow exponentially with the number of words, potentially leading to a cost proportional to 2^n but let us approximate this as trying k other words. Furthermore, each combination has its concatenation cost m*k which gives O(n * m^k * some subset factor of other words) but worst case is O(n^2*m^k). We simplify, noting the potential subset factor is encompassed in n^2.
Space Complexity
O(N^2)The brute force approach, as described, involves trying all possible combinations of words from the input list to construct each target word. In the worst-case scenario, forming these combinations might require storing intermediate concatenated strings, potentially leading to a string of length up to the length of the target word. Since we try all possible combinations of the N words, and a combination can have at most N words joined together, the space complexity can be visualized as storing all the prefixes which are at most N^2 in size, where N is the number of words in the input list. Therefore, the auxiliary space complexity is O(N^2).

Optimal Solution

Approach

The goal is to find all the words in a list that can be formed by concatenating other words from the same list. We'll use a dynamic programming and a set lookup to efficiently check if a word can be constructed from smaller words in the list.

Here's how the algorithm would work step-by-step:

  1. First, put all the words from the input list into a collection where we can quickly check if a word exists (like a set).
  2. Then, for each word in the list, check if it can be made by combining other smaller words from the same list.
  3. To do this, go through the letters of the word one by one. At each point, see if the part of the word we've looked at so far is a valid word from our collection.
  4. If it is a valid word, check if the remaining part of the word can also be made from other words in the list. Keep breaking down the remaining part until you have checked all the possible combinations of words.
  5. If you can completely break down the original word into smaller words from the collection, then the original word is a concatenated word. Add it to our list of concatenated words.
  6. Repeat this process for every word in the list. By checking each word this way, we find all the concatenated words efficiently.

Code Implementation

def find_concatenated_words(words):
    word_set = set(words)
    concatenated_words = []

    def can_form(word):
        if not word:
            return True

        # Use DP to avoid redundant calculations
        dp = [False] * (len(word) + 1)
        dp[0] = True

        for i in range(1, len(word) + 1):
            for j in range(i):
                # Check if prefix exists and remaining can be formed
                if dp[j] and word[j:i] in word_set:
                    dp[i] = True
                    break
        return dp[len(word)]

    # Iterate through words, check if concatenated
    for word in words:
        word_set.remove(word)
        if can_form(word):
            concatenated_words.append(word)
        # Restore the word set for subsequent checks
        word_set.add(word)

    return concatenated_words

Big(O) Analysis

Time Complexity
O(n * l^2)Let n be the number of words in the input list and l be the average length of the words. Building the word set takes O(n*l) time in the worst case since we are copying the string into the set. For each of the n words, we iterate through all possible prefixes (up to length l). For each prefix, we perform a set lookup which takes O(1) time. After the lookup, we recursively check the remaining suffix which again may take O(l^2) time due to considering all prefixes of the suffix with memoization implicit in the canForm check via dp. Thus, the total complexity for checking one word is O(l^2). Since we do this for each of the n words, the time complexity is O(n * l^2).
Space Complexity
O(N)The algorithm uses a set to store all the words from the input list, which takes O(N) space where N is the number of words. In the worst-case scenario, the recursion or iterative implementation to check if a word is concatenated may also require space proportional to the length of the word being checked. However, the set dominates the space complexity, therefore the auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input word listReturn an empty list immediately as there are no words to concatenate.
Word list containing an empty stringHandle empty strings gracefully by either skipping them during concatenation or defining their contribution as null depending on context; this prevents errors caused by empty string length operations or incorrect substring calculations.
Word list with a single very long wordThis might cause the DP or Trie-based search space to be very large so consider limiting the maximum word length to a reasonable value to prevent excessive memory allocation and runtime issues.
A word is a concatenation of itself (e.g., 'abab' from ['ab', 'ab'])Ensure the solution handles self-concatenation correctly, potentially requiring a check that the word isn't solely formed by repeating the same element or is equal to an existing word.
Very large input word list size causing memory limitsEmploy space optimization techniques or consider the use of external memory data structures if the word list is too large to fit into memory.
A word is a prefix of another word (e.g., 'bar' and 'barfoo')Prefix/suffix relationships can lead to incorrect concatenation detection, so explicitly verify that the full string has been formed during the concatenation process to avoid partial word matches being considered complete concatenations.
Deep recursion with very long words, potentially leading to stack overflowConvert recursive solutions to iterative ones to avoid stack overflow errors, or implement tail-call optimization if the language supports it.
Word list contains extremely long words near the allowed memory limit.Check for integer overflows in calculations related to string lengths or memory allocations, and use appropriate data types or guard conditions.