Taro Logo

Apply Substitutions

Medium
Google logo
Google
7 views
Topics:
StringsTrees

You are given a string s and a list of substitutions. Each substitution is a tuple (old, new) where old is the string to be replaced and new is the string to replace it with. Apply the substitutions to the string s in order. If multiple substitutions overlap, apply the one that starts earliest. If two overlapping substitutions start at the same index, prefer the longer old string.

For example:

  • s = "banana", substitutions = [("ana", "xyz"), ("ban", "abc")] should return "abcxyzna".
  • s = "ababa", substitutions = [("aba", "c"), ("ab", "d")] should return "cba".
  • s = "leetcode", substitutions = [("leet", "code"), ("code", "leet")] should return "codecode".
  • s = "aaa", substitutions = [("a", "b"), ("aa", "c")] should return "cba" because "aa" is considered first.

Describe an algorithm to efficiently apply these substitutions and return the modified string. What is the time and space complexity of your solution? Consider the case where the number of substitutions is very large. How would you handle edge cases like overlapping substitutions or empty input strings?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the data types of the substitutions and the input string? Can I assume they are all strings?
  2. What happens if a substitution key is a prefix of another substitution key? Which one should be applied first?
  3. If multiple substitutions can be applied at the same position in the string, which one should take precedence?
  4. What should I return if no substitutions can be made, or if the input string is empty or null?
  5. Are the substitution keys guaranteed to be non-empty?

Brute Force Solution

Approach

We're given a main string and a list of substitution rules. The brute force method is all about trying every possible combination of applying those substitutions in the correct places in the main string.

Here's how the algorithm would work step-by-step:

  1. Start by looking at the main string.
  2. Check if any of the substitution rules apply to the very beginning of the string.
  3. If a rule applies, make the substitution and see what the new string looks like.
  4. Now, go back to the original string, and this time, see if any rule applies starting at the second position, then the third, and so on, all the way to the end of the string.
  5. Each time a rule applies, make the substitution and see what the new string looks like.
  6. Keep track of every possible string you can create by applying the rules in different orders and positions.
  7. Once you've explored all the possibilities of applying rules to the main string, pick the 'best' string according to the problem's rules.

Code Implementation

def apply_substitutions_brute_force(main_string, substitution_rules):

    possible_strings = {main_string}

    # We continue until no new strings are generated
    while True:
        new_strings = set()
        for current_string in possible_strings:
            for starting_position in range(len(current_string) + 1):
                for old_substring, new_substring in substitution_rules:
                    # Check if the rule applies at the current position
                    if current_string[starting_position:].startswith(old_substring):

                        # Apply the substitution
                        substituted_string = current_string[:starting_position] + \
                            new_substring + \
                            current_string[starting_position + len(old_substring):]
                        new_strings.add(substituted_string)

        # If no new strings were generated, we're done
        if not new_strings:
            break

        # Update the set of possible strings
        possible_strings.update(new_strings)

    # Return the lexicographically smallest string
    return min(possible_strings) if possible_strings else ""

Big(O) Analysis

Time Complexity
O(R^N * N^2)Let N be the length of the main string and R be the number of substitution rules. In the worst-case scenario, each position in the main string could potentially match one or more substitution rules. For each match, a new string is created. Since substitutions can potentially overlap or create new matches, the number of possible strings could grow exponentially with the length of the main string and the number of rules, leading to potentially R^N different strings. Generating each string after a substitution requires copying the relevant parts of the original string, which takes O(N) time. Furthermore, for each generated string, we iterate through it (O(N)) to find all potential substitution application points, and for each point we iterate through the rules to check for a match (O(R)). Thus each substitution application has a time complexity of O(N*R). Combining the exponential growth of generated strings with the time complexity of string creation and substitution detection results in a very high time complexity, approximated by O(R^N * N^2).
Space Complexity
O(R^N)The algorithm explores all possible combinations of applying substitution rules. It keeps track of every possible string created by applying these rules, effectively building a tree of possible states. In the worst case, if each position in the main string of length N can potentially trigger a substitution, and we have R substitution rules, the number of possible strings we need to store grows exponentially with N, specifically up to R^N. This exponential growth represents the space required to store all these intermediate strings. Therefore, the space complexity is O(R^N).

Optimal Solution

Approach

We want to efficiently replace parts of a string with given substitutions. The key idea is to apply the substitutions in a single pass through the string, ensuring no substitution interferes with another. To achieve this, we sort the substitutions by their starting positions in the string.

Here's how the algorithm would work step-by-step:

  1. First, organize the substitutions by where they start in the original string.
  2. Sort these substitutions from the earliest starting point to the latest.
  3. Go through the sorted list of substitutions. For each one, carefully check if it overlaps with any substitutions that have already been applied. If it does, skip this substitution to avoid conflicts.
  4. If a substitution doesn't overlap, apply it to create a new, modified string. Keep track of how the original string has been changed by this substitution.
  5. Continue this process until all substitutions have been considered. The final modified string will have all the non-overlapping substitutions applied in the correct order.

Code Implementation

def apply_substitutions(original_string, substitutions):
    sorted_substitutions = sorted(substitutions, key=lambda x: x[0])
    
    modified_string = list(original_string)
    applied_substitutions = []
    
    for start_index, source, target in sorted_substitutions:
        overlap = False
        
        # Check for overlaps with previously applied substitutions
        for applied_start, applied_end in applied_substitutions:
            if max(start_index, applied_start) < min(start_index + len(source), applied_end):
                overlap = True
                break

        # Skip substitutions that overlap with previously applied ones
        if overlap:
            continue

        original_substring = "".join(modified_string[start_index:start_index + len(source)])
        if original_substring == source:

            # Avoid string slicing by using lists
            modified_string[start_index:start_index + len(source)] = list(target)
            applied_substitutions.append((start_index, start_index + len(target)))

    return "".join(modified_string)

Big(O) Analysis

Time Complexity
O(m log m + m*n)Let m be the number of substitutions and n be the length of the original string. Sorting the substitutions by their starting positions takes O(m log m) time. After sorting, we iterate through the sorted list of substitutions (m). For each substitution, we check for overlaps with previously applied substitutions. In the worst case, we might need to compare against all previously applied substitutions, which can be at most m. The overlap check between two substitutions takes O(n) time in worst case (comparing substring lengths). Therefore, the overlap check within the outer loop results in O(m*n). Hence the total time complexity will be O(m log m + m*n).
Space Complexity
O(N)The dominant space usage comes from constructing the new, modified string. In the worst case, all substitutions might apply sequentially, leading to a new string whose length could be significantly larger than the original, but still proportional to the length of the original string (N) plus the sum of the lengths of the replacement strings. Also, storing the sorted substitutions, a list of length K (where K is the number of substitutions), in the worst case is O(K), which can be at most N. The algorithm also needs to store the indices of applied substitutions which is O(K) in the worst case, hence O(N). Therefore, the auxiliary space used is O(N).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn the original string immediately if the input is null or empty.
Null or empty substitutions listReturn the original string immediately if the substitutions list is null or empty.
Substitutions list contains null keys or valuesIgnore substitutions with null keys or values to prevent NullPointerExceptions.
Substitutions overlap each other in the stringIterate through the string and apply substitutions in order of appearance, handling potential conflicts appropriately.
Substitution value is an empty stringTreat the substitution as a deletion of the key from the input string.
Extremely long input string or large number of substitutions leading to potential performance issuesUse StringBuilder for efficient string concatenation and consider optimizing the search algorithm for large inputs.
Substitution key is a substring of another substitution keySort the substitutions by the length of the key in descending order to prioritize longer matches.
Substitution results in an infinite loop (e.g., substituting 'a' with 'b' and 'b' with 'a')Introduce a limit on the number of substitutions to prevent infinite loops and log a warning if the limit is reached.