Taro Logo

Find the Shortest Superstring

Hard
Asked by:
Profile picture
Profile picture
Profile picture
24 views
Topics:
ArraysDynamic Programming

Given an array of strings words, return the smallest string that contains each string in words as a substring. If there are multiple valid strings of the smallest length, return any of them.

You may assume that no string in words is a substring of another string in words.

Example 1:

Input: words = ["alex","loves","leetcode"]
Output: "alexlovesleetcode"
Explanation: All permutations of "alex","loves","leetcode" would also be accepted.

Example 2:

Input: words = ["catg","ctaagt","gcta","ttca","atgcatc"]
Output: "gctaagttcatgcatc"

Constraints:

  • 1 <= words.length <= 12
  • 1 <= words[i].length <= 20
  • words[i] consists of lowercase English letters.
  • All the strings of words are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the size limits of the input array of strings, and the maximum length of each individual string?
  2. Can the input strings contain characters other than lowercase English letters?
  3. If multiple shortest superstrings exist, is any one acceptable, or is there a specific criterion for choosing one?
  4. If the input array is empty, what should the function return?
  5. Is there a guaranteed solution, or should the function handle cases where no superstring can be formed from the input strings, and if so, how should that be indicated?

Brute Force Solution

Approach

The brute force approach to finding the shortest superstring is all about trying every single possible way to combine the given strings. We systematically create all possible orderings of the input strings and then, for each ordering, we find the shortest way to merge them into a single string. Finally, we pick the shortest superstring from all the combinations we explored.

Here's how the algorithm would work step-by-step:

  1. First, list out every single possible order the words could be in. This means trying every combination of the words.
  2. For each of those orderings, try overlapping the words in different ways. For example, see if the end of one word matches the beginning of the next word, and if it does, combine them so they overlap.
  3. When combining the words, always aim to make the resulting string as short as possible by finding the biggest possible overlap between consecutive words.
  4. After combining all the words in a specific order, you'll have one possible 'superstring'.
  5. Repeat this process for every single ordering of the original words.
  6. Finally, compare all the 'superstrings' you created and choose the shortest one. That's your shortest superstring.

Code Implementation

def find_shortest_superstring(strings):
    import itertools

    shortest_superstring = None

    # Generate all possible permutations of the input strings.
    for permutation in itertools.permutations(strings):
        superstring_for_permutation = permutation[0]
        for i in range(1, len(permutation)):
            string1 = superstring_for_permutation
            string2 = permutation[i]
            overlap_length = 0

            # Find the maximum overlap between the current strings.
            for k in range(1, min(len(string1), len(string2)) + 1):
                if string1[-k:] == string2[:k]:
                    overlap_length = k

            superstring_for_permutation += string2[overlap_length:]

        # Update shortest superstring if a shorter one is found.
        if shortest_superstring is None or len(superstring_for_permutation) < len(shortest_superstring):
            shortest_superstring = superstring_for_permutation

    return shortest_superstring

Big(O) Analysis

Time Complexity
O(n! * m)The algorithm explores all permutations of the input strings, which results in n! (n factorial) possible orderings, where n is the number of strings. For each permutation, the algorithm finds the maximum overlap between consecutive strings in the order. Finding the maximum overlap between two strings involves comparing prefixes and suffixes, which takes O(m) time, where m is the maximum length of a string. Consequently, for each permutation, there are n-1 overlap computations, and each computation takes O(m) time. Therefore, each permutation takes O(n * m) time. Multiplying by the total number of permutations (n!), the overall time complexity is O(n! * n * m). Since we are only interested in the dominating term and n is smaller compared to n!, it simplifies to O(n! * m).
Space Complexity
O(N * N!)The algorithm generates all possible permutations of the input strings, which requires storing N! orderings, where N is the number of input strings. For each ordering, intermediate superstrings are created and stored. In the worst case, we might store a list of N strings, each with length potentially summing to N, for each permutation. Therefore, the space complexity is dominated by storing all permutations and intermediate strings, resulting in O(N * N!).

Optimal Solution

Approach

The goal is to find the shortest string that contains all the given smaller strings as parts of it. The key idea is to cleverly combine the given strings by identifying overlaps between them, minimizing the overall length of the resulting superstring.

Here's how the algorithm would work step-by-step:

  1. First, figure out how much each pair of strings overlaps. For example, 'abcdef' and 'defghi' overlap by 'def'.
  2. Create a picture that shows how much each string overlaps with every other string.
  3. Think of the problem as finding the best path through these overlaps to include all the strings while getting the maximum overlap possible.
  4. Try building the superstring by starting with one string and then adding another string that has the most overlap with it. Repeat this until all strings are used.
  5. Because the first string you pick might change the final superstring, try this process starting with each string in the list.
  6. Pick the shortest superstring from all the ones you created. This is your answer.

Code Implementation

def find_shortest_superstring(strings):

    def calculate_overlap(string1, string2):
        for overlap_length in range(min(len(string1), len(string2)), -1, -1):
            if string1.endswith(string2[:overlap_length]):
                return overlap_length
        return 0

    def build_adjacency_matrix(strings):
        adjacency_matrix = {}
        for first_string in strings:
            adjacency_matrix[first_string] = {}
            for second_string in strings:
                if first_string != second_string:
                    overlap = calculate_overlap(first_string, second_string)
                    adjacency_matrix[first_string][second_string] = overlap
        return adjacency_matrix

    def solve_tsp(start_node, nodes, adjacency_matrix, current_superstring, visited):
        # If we have visited all nodes, return the current superstring.
        if len(visited) == len(nodes):
            return current_superstring

        shortest_superstring = None

        for next_node in nodes:
            if next_node not in visited:
                overlap = adjacency_matrix[start_node][next_node]

                # Build new superstring by adding the non-overlapping part
                new_superstring = current_superstring + next_node[overlap:]
                new_visited = set(visited)
                new_visited.add(next_node)

                result = solve_tsp(next_node, nodes, adjacency_matrix, new_superstring, new_visited)

                if result:
                    if shortest_superstring is None or len(result) < len(shortest_superstring):
                        shortest_superstring = result

        return shortest_superstring

    adjacency_matrix = build_adjacency_matrix(strings)
    shortest_superstring = None

    # Try starting with each string.
    for start_node in strings:
        visited = {start_node}
        result = solve_tsp(start_node, strings, adjacency_matrix, start_node, visited)

        # Update the shortest string if a shorter one is found
        if result:
            if shortest_superstring is None or len(result) < len(shortest_superstring):
                shortest_superstring = result

    return shortest_superstring

Big(O) Analysis

Time Complexity
O(n² * m)Finding the overlap between each pair of strings takes O(m) time, where m is the maximum length of any string. Since we need to compute the overlap between every pair of strings, and there are n strings resulting in n * (n-1) / 2 pairs, this step alone is O(n² * m). Building the superstring by considering each starting string as a potential start will have O(n) runs, but this does not dominate the initial pairing runtime. Thus the overall time complexity is dominated by calculating all the overlaps between string pairs, which becomes O(n² * m).
Space Complexity
O(N^2)The algorithm constructs a picture (adjacency matrix) showing overlaps between each pair of the N input strings, requiring O(N^2) space to store these overlap lengths. The construction of superstrings in step 4 might require temporary string storage, but this is dominated by the overlap matrix. Recursion is not explicitly mentioned, so recursion stack space is not a significant factor. Therefore, the auxiliary space is primarily determined by the N x N matrix of overlap lengths, resulting in O(N^2) space complexity.

Edge Cases

Null or empty input array
How to Handle:
Return an empty string or throw an IllegalArgumentException, depending on the problem's specifications.
Array with only one string
How to Handle:
Return the single string as the shortest superstring.
Input strings are identical
How to Handle:
Return any of the strings since overlapping the same strings returns that string.
Strings with very large lengths
How to Handle:
Ensure the algorithm scales efficiently with string length to avoid exceeding time or memory limits.
Cyclic overlaps (e.g., 'abc', 'bca', 'cab')
How to Handle:
The algorithm should be able to handle cycles without infinite loops or incorrect results.
No overlap between any pair of strings
How to Handle:
Concatenate all strings in any order to produce a valid, though potentially not optimal, superstring.
Exponential number of possible superstrings
How to Handle:
Heuristic or approximation algorithms are required since the optimal solution requires traversing an exponential number of permutations.
Overlaps are not transitive (A overlaps B, B overlaps C, but A does not overlap C)
How to Handle:
The algorithm must correctly account for non-transitive overlaps when computing the shortest superstring.