Taro Logo

Longest String Chain

Medium
Google logo
Google
3 views
Topics:
ArraysStringsDynamic Programming

You are given an array of words where each word consists of lowercase English letters.

wordA is a predecessor of wordB if and only if we can insert exactly one letter anywhere in wordA without changing the order of the other characters to make it equal to wordB.

  • For example, "abc" is a predecessor of "abac", while "cba" is not a predecessor of "bcad".

A word chain is a sequence of words [word1, word2, ..., wordk] with k >= 1, where word1 is a predecessor of word2, word2 is a predecessor of word3, and so on. A single word is trivially a word chain with k == 1.

Return the length of the longest possible word chain with words chosen from the given list of words.

Example 1:

Input: words = ["a","b","ba","bca","bda","bdca"]
Output: 4
Explanation: One of the longest word chains is ["a","ba","bda","bdca"].

Example 2:

Input: words = ["xbc","pcxbcf","xb","cxbc","pcxbc"]
Output: 5
Explanation: All the words can be put in a word chain ["xb", "xbc", "cxbc", "pcxbc", "pcxbcf"].

Example 3:

Input: words = ["abcd","dbqca"]
Output: 1
Explanation: The trivial word chain ["abcd"] is one of the longest word chains.
["abcd","dbqca"] is not a valid word chain because the ordering of the letters is changed.

Constraints:

  • 1 <= words.length <= 1000
  • 1 <= words[i].length <= 16
  • words[i] only consists of lowercase English letters.

Solution


Naive Solution

A brute-force approach would involve generating all possible subsequences of the given words and then checking if each subsequence forms a valid word chain. The longest valid word chain would then be the answer. This involves a lot of redundant computations.

Complexity Analysis

  • Time Complexity: O(2N * M), where N is the number of words and M is the average length of the words (for checking predecessor relationship).
  • Space Complexity: O(N) for storing the subsequences.

Optimal Solution

We can use dynamic programming to solve this problem efficiently. The key idea is to build a DP table where dp[word] stores the length of the longest word chain ending at word. We can iterate through the words, sorted by length, and for each word, check if it's a predecessor of any other word. If it is, we update the dp table accordingly.

Algorithm

  1. Sort the words by length in ascending order.
  2. Initialize a dp table (e.g., a hash map) to store the length of the longest word chain ending at each word. Initially, the length is 1 for each word.
  3. Iterate through the sorted words.
  4. For each word, generate all possible predecessors by removing one character at each possible position.
  5. If the predecessor exists in the dp table, update dp[word] as max(dp[word], dp[predecessor] + 1).
  6. Keep track of the maximum value in the dp table, which will be the length of the longest word chain.

Code

def longestStrChain(words):
    word_set = set(words)
    dp = {}
    max_length = 0

    for word in sorted(words, key=len):
        dp[word] = 1
        for i in range(len(word)):
            predecessor = word[:i] + word[i+1:]
            if predecessor in word_set:
                if predecessor in dp:
                    dp[word] = max(dp[word], dp[predecessor] + 1)
        max_length = max(max_length, dp[word])

    return max_length

Complexity Analysis

  • Time Complexity: O(N * M2), where N is the number of words and M is the maximum length of a word. We iterate through each word (O(N)), and for each word, we generate all possible predecessors by removing one character. This takes O(M) time. Checking if the predecessor exists in the word set and updating the DP table takes O(M) in the worst case, where M is the average length of words.
  • Space Complexity: O(N), where N is the number of words, to store the dp table and the word_set.

Edge Cases

  • Empty input list: Return 0.
  • Input list with a single word: Return 1.
  • No possible word chain: Return 1 (each word is a chain of length 1).

Example

Consider the input words = ["a","b","ba","bca","bda","bdca"].

  1. Words are sorted by length: ["a", "b", "ba", "bca", "bda", "bdca"]
  2. dp table is initialized.
  3. Iterating through words:
    • a: dp["a"] = 1. No predecessors.
    • b: dp["b"] = 1. No predecessors.
    • ba: dp["ba"] = 1. Predecessors: a, b. `dp["ba"] = max(1, 1 + 1) = 2.
    • bca: dp["bca"] = 1. Predecessors: bc, ba, ca. `dp["bca"] = max(1, 2 + 1) = 3 (from "ba") if "ba" is predecessor of "bca".
    • bda: dp["bda"] = 1. Predecessors: bd, ba, da. `dp["bda"] = max(1, 2 + 1) = 3 (from "ba") if "ba" is predecessor of "bda".
    • bdca: dp["bdca"] = 1. Predecessors: bdc, bda, bca, bd. dp["bdca"] = max(1, 3+1) from bda, dp["bdca"] = max(dp["bdca"], dp['bca'] + 1) = max(1, 3+1) = 4
  4. Maximum length = 4.