Taro Logo

Longest String Chain

Medium
Two Sigma logo
Two Sigma
0 views
Topics:
ArraysStringsDynamic Programming

You are given an array of words where each word consists of lowercase English letters.

wordA is a predecessor of wordB if and only if we can insert exactly one letter anywhere in wordA without changing the order of the other characters to make it equal to wordB.

  • For example, "abc" is a predecessor of "abac", while "cba" is not a predecessor of "bcad".

A word chain is a sequence of words [word1, word2, ..., wordk] with k >= 1, where word1 is a predecessor of word2, word2 is a predecessor of word3, and so on. A single word is trivially a word chain with k == 1.

Return the length of the longest possible word chain with words chosen from the given list of words.

Example 1:

Input: words = ["a","b","ba","bca","bda","bdca"]
Output: 4
Explanation: One of the longest word chains is ["a","ba","bda","bdca"].

Example 2:

Input: words = ["xbc","pcxbcf","xb","cxbc","pcxbc"]
Output: 5
Explanation: All the words can be put in a word chain ["xb", "xbc", "cxbc", "pcxbc", "pcxbcf"].

Example 3:

Input: words = ["abcd","dbqca"]
Output: 1
Explanation: The trivial word chain ["abcd"] is one of the longest word chains.
["abcd","dbqca"] is not a valid word chain because the ordering of the letters is changed.

Constraints:

  • 1 <= words.length <= 1000
  • 1 <= words[i].length <= 16
  • words[i] only consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a string in the `words` array?
  2. Can the input array `words` be empty or null?
  3. Are duplicate words allowed in the input array `words`? If so, how should they be handled in finding the longest string chain?
  4. If no string chain exists (e.g., the input is empty or no words form a chain), what value should I return?
  5. Are all words in the input array guaranteed to be lowercase English letters?

Brute Force Solution

Approach

The brute force approach to finding the longest string chain involves checking every possible combination of words. We explore all possible sequences to find the longest one where each word can be formed by adding one letter to the previous word. It's like trying every path through the words until we find the longest valid chain.

Here's how the algorithm would work step-by-step:

  1. Start by considering each word as a possible beginning of a chain.
  2. For each word, check if any other word can be formed by adding exactly one letter to it.
  3. If a word can be extended, try adding it to the chain and see if we can keep extending from there.
  4. Continue exploring all possible extensions of the chain until no more words can be added.
  5. Keep track of the length of each chain we find.
  6. After exploring all possible starting words and chain extensions, choose the longest chain we found.

Code Implementation

def longest_string_chain_brute_force(words):
    max_chain_length = 0

    def is_predecessor(word1, word2):
        if len(word1) + 1 != len(word2):
            return False
        
        index_word1 = 0
        index_word2 = 0
        difference_count = 0
        
        while index_word1 < len(word1) and index_word2 < len(word2):
            if word1[index_word1] == word2[index_word2]:
                index_word1 += 1
                index_word2 += 1
            else:
                difference_count += 1
                index_word2 += 1
        
        difference_count += (len(word2) - index_word2)
        return difference_count == 1

    def find_chain_length(current_word, remaining_words):
        current_chain_length = 1
        max_next_chain_length = 0

        # Iterate through each remaining word.
        for next_word in remaining_words:

            # Check if the next word is a predecessor of the current word.
            if is_predecessor(current_word, next_word):

                # Create a new set of remaining words without the current word.
                new_remaining_words = remaining_words.copy()
                new_remaining_words.remove(next_word)
                
                next_chain_length = find_chain_length(next_word, new_remaining_words)
                max_next_chain_length = max(max_next_chain_length, next_chain_length)

        return current_chain_length + max_next_chain_length

    # Iterate through each word to start a chain
    for start_word in words:
        remaining_words = words.copy()
        remaining_words.remove(start_word)

        # Calculate the length of the chain starting from start_word.
        chain_length = find_chain_length(start_word, remaining_words)

        # Update the maximum chain length.
        max_chain_length = max(max_chain_length, chain_length)
    
    return max_chain_length

Big(O) Analysis

Time Complexity
O(n*n*L)The brute force approach considers each of the n words as a potential starting point for a chain. For each starting word, we iterate through the remaining n-1 words to see if any can extend the chain. The isPredecessor check between two words involves comparing strings of average length L, taking O(L) time. Therefore, for each of the n words, we are potentially doing O(n*L) work to see if any other word could be added to extend the chain, leading to a total time complexity of O(n*n*L).
Space Complexity
O(1)The described brute force approach primarily involves iterating through the input words and tracking the length of the longest chain found so far. It doesn't appear to use any auxiliary data structures that scale with the number of words (N). The algorithm seems to only store a few constant-sized variables, such as the current chain length and the maximum chain length found, regardless of how many words are provided as input. Therefore, the space complexity remains constant. This means the extra memory used does not increase with the size of the input.

Optimal Solution

Approach

The goal is to find the longest possible sequence of words where each word is formed by adding exactly one letter to the previous word. The efficient solution avoids checking all possible sequences by building the chain step-by-step, always trying to extend the longest chain found so far. Think of it like building the tallest tower by always adding the largest possible brick that fits.

Here's how the algorithm would work step-by-step:

  1. First, organize all the words by their length, grouping words of the same length together.
  2. Next, consider each word as a potential starting point for a chain. The key is to start with the shortest words first.
  3. For each word, check if any longer words can be formed by adding a single letter to it. If you find one, consider it as the next word in the chain.
  4. Keep extending the chain, always looking for the next possible word that is exactly one letter longer and can be formed by adding a single character to the current word.
  5. Remember the length of the longest chain you've found so far. As you explore different chains starting from different words, update the maximum length whenever you find a longer chain.
  6. By starting with shorter words and building chains, you automatically prioritize longer chains and avoid redundant checks of shorter, suboptimal chains.

Code Implementation

def longest_string_chain(words):
    word_set = set(words)
    word_lengths = {}
    for word in words:
        length = len(word)
        if length not in word_lengths:
            word_lengths[length] = []
        word_lengths[length].append(word)

    # Store the length of the longest chain ending at each word.
    chain_lengths = {}
    longest_chain_so_far = 1

    # Process words in order of length to build chains efficiently.
    for word_length in sorted(word_lengths.keys()):
        for current_word in word_lengths[word_length]:
            chain_lengths[current_word] = 1

            # Check all possible predecessors by removing one character.
            for i in range(word_length):
                predecessor = current_word[:i] + current_word[i+1:]

                # Update the length of the chain if a valid predecessor exists
                if predecessor in word_set:
                    chain_lengths[current_word] = max(
                        chain_lengths[current_word],
                        chain_lengths.get(predecessor, 0) + 1
                    )

            # Update the overall longest chain found.
            longest_chain_so_far = max(
                longest_chain_so_far, chain_lengths[current_word]
            )

    return longest_chain_so_far

Big(O) Analysis

Time Complexity
O(n * m^2)Let n be the number of words and m be the maximum length of a word. Grouping words by length takes O(n) time. Iterating through each word (n) as a potential starting point drives the primary cost. For each word, we generate all possible predecessor words by removing one character (at most m possibilities) and checking if that predecessor exists in the words with length one less. This substring generation and search takes O(m^2) time in the worst case (m to generate the substring + m to search for its existence). Therefore the overall time complexity becomes O(n * m^2).
Space Complexity
O(N)The algorithm organizes words by length, implying a need to store words grouped by their length. This could be implemented with a hash map or a list of lists, where the keys/indices represent lengths, and the values are lists of words of that length. In the worst case, all words could have different lengths, meaning we effectively store all N words in this length-organized structure. Furthermore, a separate data structure to keep track of the longest chain ending at each word is needed, which could also have a size proportional to the number of words N, thereby the space complexity is O(N).

Edge Cases

CaseHow to Handle
words is null or emptyReturn 0, as no chain can be formed.
words contains only one wordReturn 1, as a single word forms a chain of length 1.
words contains duplicate wordsThe algorithm should correctly process duplicates, as only the longest chain is important, duplicates do not affect the longest sequence.
words contains words of the same lengthThe algorithm should correctly compare and determine predecessor relationships between words of the same length.
words contains no possible chain (e.g., only very long words)The algorithm should return 1, as the longest possible chain would be a single word.
words contains words that are substrings of each otherThe algorithm should correctly identify predecessor relationships even when one word is a substring of another with gaps in between.
words contains very long strings (approaching memory limits)Ensure the algorithm has reasonable space complexity, possibly avoiding storing large intermediate data structures.
words contains many words, approaching time limitOptimize the algorithm to have efficient time complexity, such as O(N * L^2) where N is number of words and L is the maximum length of a word.