Taro Logo

Find the Shortest Superstring

Hard
Meta logo
Meta
1 view
Topics:
StringsDynamic ProgrammingGreedy Algorithms

Given an array of strings words, return the smallest string that contains each string in words as a substring. If there are multiple valid strings of the smallest length, return any of them.

You may assume that no string in words is a substring of another string in words.

Example 1:

Input: words = ["alex","loves","leetcode"]
Output: "alexlovesleetcode"
Explanation: All permutations of "alex","loves","leetcode" would also be accepted.

Example 2:

Input: words = ["catg","ctaagt","gcta","ttca","atgcatc"]
Output: "gctaagttcatgcatc"

Clarifications:

  1. What should I return if the input array words is empty?
  2. What should I return if there is only one word in the input array?
  3. Are the strings in the input array unique, or could there be duplicates?
  4. Can you provide constraints on the length of the words and the number of words?
  5. Should the returned superstring be the absolute smallest or is near-optimal acceptable?
  6. What is the character set? Only lowercase English letters?

Solution


Let's explore approaches to solve the problem of finding the shortest superstring containing all strings in a given array.

1. Naive Approach: Brute Force

The most straightforward approach is to generate all possible permutations of the input strings and check for each permutation, how the strings can be merged to form a superstring. The shortest of all such superstrings will be our result.

  • Algorithm:
    1. Generate all permutations of the input words array.
    2. For each permutation, merge the strings in the order they appear in the permutation. The merging involves finding the maximum overlap between consecutive strings and concatenating them accordingly.
    3. Keep track of the shortest superstring found so far.
  • Code (Python):
from itertools import permutations

def overlap(s1, s2):
    n1, n2 = len(s1), len(s2)
    for i in range(min(n1, n2), 0, -1):
        if s1[n1-i:] == s2[:i]:
            return i
    return 0

def solve_brute_force(words):
    shortest = None
    for p in permutations(words):
        current = list(p)
        merged = current[0]
        for i in range(1, len(current)):
            o = overlap(merged, current[i])
            merged += current[i][o:]
        if shortest is None or len(merged) < len(shortest):
            shortest = merged
    return shortest
  • Time Complexity: O(n! * m^2), where n is the number of words and m is the maximum length of a word. Generating all permutations takes O(n!) time, and merging strings takes O(m^2) time for each permutation.
  • Space Complexity: O(m), where m is the length of the shortest superstring.

2. Optimal Approach: Greedy with Overlap Calculation

A more efficient approach utilizes a greedy strategy combined with dynamic programming principles to compute overlaps.

  • Algorithm:

    1. Compute the overlap between each pair of strings.
    2. Greedily select the pair of strings with the maximum overlap and merge them.
    3. Repeat until all strings are merged into a single superstring.
  • Code (Python):

def solve_optimal(words):
    n = len(words)
    overlaps = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i != j:
                overlaps[i][j] = overlap(words[i], words[j])

    dp = [[('') for _ in range(n)] for _ in range(1 << n)]

    for mask in range(1, 1 << n):
        for i in range(n):
            if (mask >> i) & 1:
                if mask == (1 << i):
                    dp[mask][i] = words[i]
                else:
                    for j in range(n):
                        if i != j and ((mask >> j) & 1):
                            prev_mask = mask ^ (1 << i)
                            candidate = dp[prev_mask][j] + words[i][overlaps[j][i]:]
                            if dp[mask][i] == '' or len(candidate) < len(dp[mask][i]):
                                dp[mask][i] = candidate

    result = min(dp[(1 << n) - 1], key=len)
    return result
  • Time Complexity: O(n^2 * 2^n + n^2 * m), where n is the number of words and m is the maximum length of a word. The complexity arises from computing overlaps (O(n^2 * m)) and the dynamic programming part (O(n^2 * 2^n)).
  • Space Complexity: O(n * 2^n * m), where n is the number of words and m is the maximum length of a word. The space is dominated by the dp table.

Edge Cases:

  • Empty input array: Return an empty string.
  • Single word: Return the word itself.
  • Words that are substrings of others (as the problem states, we can assume that no word is a substring of another, but it is good to consider).

Summary

The optimal solution uses a greedy approach combined with dynamic programming to efficiently compute and combine strings based on their overlaps. This method significantly improves performance compared to the brute-force approach, especially for larger input sets.