Taro Logo

Minimum Cost to Convert String I

Medium
Atlassian logo
Atlassian
3 views
Topics:
StringsGraphsDynamic Programming

You are given two 0-indexed strings source and target, both of length n and consisting of lowercase English letters. You are also given two 0-indexed character arrays original and changed, and an integer array cost, where cost[i] represents the cost of changing the character original[i] to the character changed[i].

You start with the string source. In one operation, you can pick a character x from the string and change it to the character y at a cost of z if there exists any index j such that cost[j] == z, original[j] == x, and changed[j] == y.

Return the minimum cost to convert the string source to the string target using any number of operations. If it is impossible to convert source to target, return -1.

Note that there may exist indices i, j such that original[j] == original[i] and changed[j] == changed[i].

Example 1:

Input: source = "abcd", target = "acbe", original = ["a","b","c","c","e","d"], changed = ["b","c","b","e","b","e"], cost = [2,5,5,1,2,20]
Output: 28
Explanation: To convert the string "abcd" to string "acbe":
- Change value at index 1 from 'b' to 'c' at a cost of 5.
- Change value at index 2 from 'c' to 'e' at a cost of 1.
- Change value at index 2 from 'e' to 'b' at a cost of 2.
- Change value at index 3 from 'd' to 'e' at a cost of 20.
The total cost incurred is 5 + 1 + 2 + 20 = 28.
It can be shown that this is the minimum possible cost.

Example 2:

Input: source = "aaaa", target = "bbbb", original = ["a","c"], changed = ["c","b"], cost = [1,2]
Output: 12
Explanation: To change the character 'a' to 'b' change the character 'a' to 'c' at a cost of 1, followed by changing the character 'c' to 'b' at a cost of 2, for a total cost of 1 + 2 = 3. To change all occurrences of 'a' to 'b', a total cost of 3 * 4 = 12 is incurred.

Example 3:

Input: source = "abcd", target = "abce", original = ["a"], changed = ["e"], cost = [10000]
Output: -1
Explanation: It is impossible to convert source to target because the value at index 3 cannot be changed from 'd' to 'e'.

Constraints:

  • 1 <= source.length == target.length <= 105
  • source, target consist of lowercase English letters.
  • 1 <= cost.length == original.length == changed.length <= 2000
  • original[i], changed[i] are lowercase English letters.
  • 1 <= cost[i] <= 106
  • original[i] != changed[i]

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible characters in the input strings, and what are the ranges for the conversion costs?
  2. Can the input strings be empty or null?
  3. If it's impossible to convert string `s` to string `t`, what should the return value be? Should I return -1, throw an exception, or something else?
  4. Are the conversion costs symmetric? In other words, if the cost to convert 'a' to 'b' is X, is the cost to convert 'b' to 'a' also guaranteed to be X?
  5. Could you please define more precisely what is meant by 'minimum cost'? Is there any possibility of overflow given the input constraints?

Brute Force Solution

Approach

The brute force approach to finding the minimum cost to convert a string involves exploring every possible sequence of character conversions. We systematically try all combinations of converting characters until we find the lowest cost solution. This means we will try every possible path until we find the optimal path.

Here's how the algorithm would work step-by-step:

  1. Imagine we start with the original string.
  2. Now, consider changing the first character to every other possible character, one at a time. For each of these changes, calculate the cost based on the given cost rules.
  3. For each changed string, repeat the process for the second character, and then the third, and so on, until we've considered changing every character in the string to every other possibility.
  4. Keep track of the overall cost for each sequence of changes we make to the string.
  5. After trying every possible sequence of changes, compare all the total costs we've calculated.
  6. The sequence of changes that resulted in the lowest overall cost is the solution, and that cost is the minimum cost.

Code Implementation

def minimum_cost_convert_string_brute_force(source, target, cost):
    minimum_total_cost = float('inf')

    def solve(current_string, current_cost):
        nonlocal minimum_total_cost

        # If the current string matches the target, update min cost
        if current_string == target:
            minimum_total_cost = min(minimum_total_cost, current_cost)
            return

        for index in range(len(source)):
            if current_string[index] != target[index]:

                # Try changing character at index to every other possible character
                for char_code in range(ord('a'), ord('z') + 1):
                    new_char = chr(char_code)

                    # Creates new string by replacing one character at time
                    modified_string_list = list(current_string)
                    modified_string_list[index] = new_char
                    modified_string = "".join(modified_string_list)

                    additional_cost = cost[ord(current_string[index]) - ord('a')][ord(new_char) - ord('a')]

                    # Explore this new string by recursively solving
                    solve(modified_string, current_cost + additional_cost)

    solve(source, 0)

    if minimum_total_cost == float('inf'):
        return -1
    else:
        return minimum_total_cost

Big(O) Analysis

Time Complexity
O(26^n) – The algorithm explores every possible string that can be formed by changing characters in the original string. For each of the n characters in the string, we can change it to any of the 26 possible characters (assuming lowercase English alphabet). This leads to a branching factor of 26 at each of the n positions. Therefore, the total number of possible strings explored is 26 * 26 * ... * 26 (n times), which is 26^n. Consequently, the time complexity is O(26^n).
Space Complexity
O(N^N) – The brute force approach explores every possible sequence of character conversions. For a string of length N, each character can potentially be changed to any other character, leading to a branching factor proportional to N at each of the N positions in the string. The recursive calls will generate strings where the maximum depth of the recursion is N, and at each level, we're storing the modified string. Therefore, the space complexity is driven by the exponential growth of the number of strings generated and stored, which is approximately N^N. The algorithm keeps track of these possibilities which causes exponential space complexity with the size of the input string.

Optimal Solution

Approach

The core idea is to figure out the cheapest way to change one character to another by precomputing the cost to change between every pair of characters. We can then apply this knowledge to efficiently calculate the minimum cost to convert one string to another by dynamically choosing the optimal conversions.

Here's how the algorithm would work step-by-step:

  1. First, determine the cheapest way to change any character into any other character. Think of this as building a lookup table of conversion costs.
  2. Imagine characters are locations and conversion costs are travel distances. Find the shortest path between every pair of locations.
  3. Now, for each character in the first string, figure out the lowest cost to transform it into the corresponding character in the second string, using our pre-calculated table.
  4. Sum all of these lowest costs together. This final sum represents the minimum cost to transform the first string into the second.

Code Implementation

def minimum_cost_to_convert_string(source_string, target_string, conversion_costs):
    character_set_size = 26
    lowest_conversion_costs = [[float('inf')] * character_set_size for _ in range(character_set_size)]

    # Initialize direct conversion costs.
    for source_character_index in range(character_set_size):
        lowest_conversion_costs[source_character_index][source_character_index] = 0

    for source_character, target_character, cost in conversion_costs:
        source_index = ord(source_character) - ord('a')
        target_index = ord(target_character) - ord('a')
        lowest_conversion_costs[source_index][target_index] = min(lowest_conversion_costs[source_index][target_index], cost)

    # Use Floyd-Warshall to find shortest paths between all characters.
    for intermediate_character_index in range(character_set_size):
        for source_character_index in range(character_set_size):
            for target_character_index in range(character_set_size):
                lowest_conversion_costs[source_character_index][target_character_index] = min(
                    lowest_conversion_costs[source_character_index][target_character_index],
                    lowest_conversion_costs[source_character_index][intermediate_character_index] + \
                    lowest_conversion_costs[intermediate_character_index][target_character_index]
                )

    total_conversion_cost = 0

    # Iterate through the strings and sum the lowest conversion costs.
    for i in range(len(source_string)):
        if source_string[i] == target_string[i]:
            continue

        source_index = ord(source_string[i]) - ord('a')
        target_index = ord(target_string[i]) - ord('a')
        cost = lowest_conversion_costs[source_index][target_index]

        # If no path exists, it's impossible to convert.
        if cost == float('inf'):
            return -1

        total_conversion_cost += cost

    return total_conversion_cost

Big(O) Analysis

Time Complexity
O(n*m + a*a + len1 + len2) – First, the algorithm iterates through `original`, `changed`, and `cost` to compute the initial cost matrix which takes O(n*m) where n is the size of the alphabet and m is the length of these three arrays. Next, it performs the Floyd-Warshall algorithm to find the shortest path between characters, taking O(a*a*a) time, where a is the size of the alphabet (26 in this case). Finally, it iterates through the two strings to calculate the total conversion cost, taking O(len1 + len2) time, where len1 and len2 are the lengths of the two input strings. Since a is a constant (alphabet size), the Floyd-Warshall part becomes O(a*a), and the overall complexity is O(n*m + a*a + len1 + len2).
Space Complexity
O(1) – The algorithm primarily relies on pre-computing a cost table and accumulating conversion costs. However, the plain English explanation does not imply the use of dynamic data structures that scale with the input string size. The space used is therefore dominated by a fixed-size table which is independent of the input string length, implying constant auxiliary space, or O(1).

Edge Cases

CaseHow to Handle
Null or empty source stringReturn 0 since no conversion is needed.
Null or empty target stringReturn infinity (or a large value) as it's impossible to convert.
Cost matrix is null or empty.Throw an exception or return a designated error value (e.g., infinity) to indicate invalid input.
Source and target strings are identical.Return 0 as no conversion is required.
Character set size in cost matrix is smaller than distinct characters in source/target.Handle this case by defining a default cost for characters not present in the cost matrix, or throwing an exception if such characters are disallowed.
Cost matrix contains negative costs.Either throw an exception or handle negative cost cycles appropriately (e.g. using Bellman-Ford if shortest paths are needed).
No possible conversion exists (no path from source[i] to target[i] in cost graph)Return infinity (or a large value) to indicate that the conversion is impossible.
Integer overflow in cost calculation (especially with large strings and/or high costs).Use long or a similar data type with larger capacity for cost calculations to prevent overflow.