Taro Logo

Shortest Way to Form String

Medium
Pinterest logo
Pinterest
4 views
Topics:
StringsGreedy Algorithms

Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task cannot be completed, return -1.

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., "ace" is a subsequence of "abcde" while "aec" is not).

Example 1:

Input: source = "abc", target = "abcbc"
Output: 2
Explanation: The target "abcbc" can be formed by "abc" and "bc", which are subsequences of source "abc".

Example 2:

Input: source = "abc", target = "acdbc"
Output: -1
Explanation: The target string cannot be constructed from the subsequences of source string due to the character 'd'.

Constraints:

  • Both the source and target strings consist of only lowercase English letters.
  • 1 <= source.length, target.length <= 1000

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible characters in the source and target strings? Is the case of characters significant?
  2. Can the source or target strings be empty or null? What should I return in such cases?
  3. If it's not possible to form the target string from the source string, what should I return?
  4. Are there any constraints on the lengths of the source and target strings?
  5. If multiple shortest ways exist to form the target string, is any one acceptable, or is there a specific shortest way the interviewer would like as the output?

Brute Force Solution

Approach

We're trying to build one string (the target) from smaller strings (the source). The brute force method involves checking every conceivable way to assemble the source strings to see if any of them match the target. It's like trying every possible combination until we find one that works.

Here's how the algorithm would work step-by-step:

  1. Start by seeing if the entire target string can be made from just one single substring from the source.
  2. If that doesn't work, check if the target string can be made using two substrings from the source.
  3. We'll try every possible combination of two source substrings, and see if they add up to form the target string.
  4. If using two substrings doesn't work, move onto three substrings and try all possible combinations of three source substrings to build the target.
  5. Continue this process, increasing the number of substrings used each time.
  6. We stop when we either find a combination of source substrings that exactly forms the target string, or when we have exhausted all the possibilities (meaning the target string can't be formed).
  7. If we found several ways to form the target string, we pick the shortest one (using the fewest number of source substrings).

Code Implementation

def shortest_way_to_form_string_brute_force(source, target):
    target_length = len(target)

    for number_of_substrings in range(1, target_length + 1):
        for i in range(number_of_substrings):
            
            # Create all possible start indices for each substring combination
            possible_indices = get_possible_start_indices(target_length, number_of_substrings)
            for indices in possible_indices:
                formed_string = ""
                substring_count = 0
                for j in range(number_of_substrings):
                    start_index = indices[j]
                    end_index = indices[j+1] if j+1 < number_of_substrings else target_length
                    substring = target[start_index:end_index]

                    # Check if substring exists in source
                    if substring in source:
                        formed_string += substring
                        substring_count += 1
                    else:
                        substring_count = 0
                        break

                # This check validates that all target characters are formed
                if formed_string == target and substring_count == number_of_substrings:
                    return number_of_substrings

    return -1

def get_possible_start_indices(target_length, number_of_substrings):
    indices = []
    get_combinations(target_length, number_of_substrings, [], indices, 0)
    return indices

def get_combinations(target_length, number_of_substrings, current_combination, all_combinations, start):
    if len(current_combination) == number_of_substrings:
        all_combinations.append(current_combination[:])
        return
    
    for i in range(start, target_length):
        current_combination.append(i)
        get_combinations(target_length, number_of_substrings, current_combination, all_combinations, i + 1)
        current_combination.pop()

def shortest_way_to_form_string_brute_force_optimized(source, target):
    number_of_substrings = 0
    target_index = 0

    # Keep iterating as long as we haven't fully covered the target
    while target_index < len(target):
        substring_index = -1

        # Greedily find longest matching prefix
        for i in range(len(source)):

            if target[target_index] == source[i]:
                j = 0
                while (target_index + j < len(target) and
                       i + j < len(source) and
                       target[target_index + j] == source[i + j]):
                    j += 1

                if j > 0 and (substring_index == -1 or j > substring_index):
                    substring_index = j

        # If no matching substring from source, the target can't be formed
        if substring_index == -1:
            return -1

        target_index += substring_index
        number_of_substrings += 1

    return number_of_substrings

Big(O) Analysis

Time Complexity
O(m^n)The algorithm iteratively checks if the target string can be formed by 1, 2, 3, ..., up to n substrings from the source, where n is the length of the target string. In the worst case, for each k (from 1 to n), we consider all possible combinations of k substrings from the source. Let m be the length of the source string. Finding all possible substrings within the source takes O(m^2). Checking if 'k' substrings from the source equal the target involves iterating through combinations of substring indices and string comparisons, which is O(m^k) for k substrings, summed across k = 1 to n. Thus the overall time complexity can approach O(m^n) in the worst case because we have to test many, many different ways to make the target string.
Space Complexity
O(1)The provided algorithm iterates through different numbers of substrings from the source to form the target, but the plain English explanation doesn't indicate the use of auxiliary data structures to store intermediate substring combinations or track visited states. The algorithm primarily relies on string comparisons and doesn't create new data structures whose size depends on the input string lengths. Therefore, the auxiliary space used is constant, independent of the length of the source or target strings. This constant space is primarily for loop counters or temporary variables used within the comparisons.

Optimal Solution

Approach

The most efficient way to solve this problem is to greedily construct the target string. We repeatedly find the longest prefix of the target string that can be formed from a subsequence of the source string. This avoids unnecessary checks and focuses on maximizing progress at each step.

Here's how the algorithm would work step-by-step:

  1. Start with the target string.
  2. Find the longest part at the beginning of the target that can be made by picking characters from the source string in the right order. You can only use each character from the source once.
  3. If you can't find any part of the target string in the source, then it's impossible to make the target string.
  4. If you can find a part, remove that part from the target string.
  5. Count how many times you needed to use the source string to make that part.
  6. Repeat the process with the remaining target string until you've made the entire string, or you realize you can't.
  7. The number of times you had to use the source string to make all the parts is your answer.

Code Implementation

def shortest_way_to_form_string(source_string, target_string):
    number_of_subsequences = 0
    target_string_index = 0

    while target_string_index < len(target_string):
        source_string_index = 0
        temp_target_string_index = target_string_index

        # Greedily match the longest prefix of target
        while source_string_index < len(source_string) and \
                temp_target_string_index < len(target_string):
            if source_string[source_string_index] == \
                    target_string[temp_target_string_index]:
                temp_target_string_index += 1
            source_string_index += 1

        # If no character of target is in source
        if temp_target_string_index == target_string_index:
            return -1

        # Count subsequence, advance target index
        number_of_subsequences += 1
        target_string_index = temp_target_string_index

    return number_of_subsequences

Big(O) Analysis

Time Complexity
O(m*n)The outer loop iterates up to m times, where m is the length of the target string. Inside the outer loop, we search for the longest prefix of the remaining target string that is a subsequence of the source string. Finding this longest prefix potentially requires scanning the entire source string of length n in the worst case, comparing each character of the remaining target string against each character of the source. Therefore, the overall time complexity is O(m*n).
Space Complexity
O(1)The provided algorithm operates primarily on the input strings (source and target) using index manipulation. It does not create any auxiliary data structures like arrays, lists, or hash maps that scale with the input size. The space used is limited to a few variables (e.g., counters, indices) which require a constant amount of memory regardless of the lengths of the source and target strings. Therefore, the space complexity is constant.

Edge Cases

CaseHow to Handle
Source and target strings are both emptyReturn 0 because an empty target can be formed with 0 appends from the source.
Source string is empty, target string is not emptyReturn -1, indicating that the target cannot be formed.
Target string is empty, source string is not emptyReturn 0, as an empty target can always be formed.
Source string is longer than the target stringThe standard algorithm will still work, potentially requiring fewer appends if the target is a substring.
Source string contains repeated charactersThe subsequence search handles repetitions correctly, choosing the earliest occurrence.
Target string contains a character not present in the source stringThe subsequence search will fail and the algorithm should return -1, indicating impossibility of forming the target string.
Target string is a very long stringThe solution should still work but it could be time-consuming; consider optimizing the subsequence search if performance becomes a bottleneck.
Source and target strings are identicalReturn 1 as the target string can be formed with a single append of the entire source string.