Taro Logo

String Compression II

Hard
Asked by:
Profile picture
Profile picture
Profile picture
17 views
Topics:
StringsDynamic Programming

Run-length encoding is a string compression method that works by replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string "aabccc" we replace "aa" by "a2" and replace "ccc" by "c3". Thus the compressed string becomes "a2bc3".

Notice that in this problem, we are not adding '1' after single characters.

Given a string s and an integer k. You need to delete at most k characters from s such that the run-length encoded version of s has minimum length.

Find the minimum length of the run-length encoded version of s after deleting at most k characters.

Example 1:

Input: s = "aaabcccd", k = 2
Output: 4
Explanation: Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.

Example 2:

Input: s = "aabbaa", k = 2
Output: 2
Explanation: If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.

Example 3:

Input: s = "aaaaaaaaaaa", k = 0
Output: 3
Explanation: Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.

Constraints:

  • 1 <= s.length <= 100
  • 0 <= k <= s.length
  • s contains only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string `s` and the maximum value of `k` (the number of characters we can remove)?
  2. Can the input string `s` be empty or null? What should I return in those cases?
  3. Can the string `s` contain only one distinct character, and if so, how should the compression work?
  4. If there are multiple optimal compressed string lengths after removing `k` characters, is any of them acceptable, or is there a specific criterion for selecting one?
  5. Does the string `s` contain only lowercase English letters, or can it contain other characters (uppercase, digits, special characters)?

Brute Force Solution

Approach

The core idea is to try every possible combination of removals to see which one results in the shortest compressed string. We explore all possibilities, without trying to be smart or efficient about it. It's like checking every single grain of sand on a beach to find a specific one.

Here's how the algorithm would work step-by-step:

  1. Consider no removals, then one removal, then two removals, and so on, up to the allowed number of removals.
  2. For each number of removals, try all possible combinations of characters to remove from the string.
  3. After each combination of removals, compress the resulting string.
  4. Calculate the length of the compressed string.
  5. Compare the length of this compressed string with the current shortest compressed string length found so far.
  6. If the new compressed string is shorter, store its length.
  7. After considering all combinations of removals, the shortest compressed string length found is the answer.

Code Implementation

def string_compression_two_brute_force(input_string, allowed_removals):
    shortest_compressed_length = len(input_string) + 1

    for number_of_removals in range(allowed_removals + 1):
        # Try all combinations for current number of removals
        string_length = len(input_string)
        possible_indices_to_remove = [
            index for index in range(string_length)
        ]
        
        def generate_combinations(start_index, current_combination):
            nonlocal shortest_compressed_length

            if len(current_combination) == number_of_removals:
                modified_string = ""
                for index in range(string_length):
                    if index not in current_combination:
                        modified_string += input_string[index]

                compressed_string = compress(modified_string)
                compressed_length = len(compressed_string)
                shortest_compressed_length = min(
                    shortest_compressed_length, compressed_length
                )
                return

            for index in range(start_index, string_length):
                generate_combinations(
                    index + 1, current_combination + [index]
                )

        generate_combinations(0, [])

    return shortest_compressed_length

def compress(uncompressed_string):
    if not uncompressed_string:
        return ""

    compressed = ""
    current_char = uncompressed_string[0]
    current_count = 1

    for index in range(1, len(uncompressed_string)):
        if uncompressed_string[index] == current_char:
            current_count += 1
        else:
            compressed += current_char
            # Append count only if it's greater than 1
            if current_count > 1:
                compressed += str(current_count)
            current_char = uncompressed_string[index]
            current_count = 1

    compressed += current_char
    if current_count > 1:
        compressed += str(current_count)
    return compressed

# The main function, string_compression_two_brute_force, will return
# the shortest possible length that can be achieved after removals.

# Auxiliary function: compress(), creates compressed string
# from the uncompressed input string.

Big(O) Analysis

Time Complexity
O(nCk * n)The solution explores all possible combinations of removals. Given a string of length n and allowing up to k removals, the number of combinations to try is n choose k (nCk), which is n! / (k! * (n-k)!). For each of these combinations, the string is compressed, and the length of the compressed string is calculated. The compression step takes O(n) time in the worst case, where n is the length of the string after removals. Therefore, the overall time complexity is O(nCk * n), dominated by the combination generation and string compression for each generated string.
Space Complexity
O(N^2)The plain English explanation implies the algorithm explores all possible combinations of removals, suggesting a recursive approach or the use of dynamic programming. The number of removals can range from zero to some maximum K. The combinations of removals, in the worst case, could lead to exploring substrings of the input string multiple times, potentially caching results in a memoization table or a DP table. This table would need to store intermediate results for each possible substring (defined by its start and end indices), leading to O(N^2) space where N is the length of the input string. Additionally, recursive call stack could contribute O(N) in the worst case, therefore the total space complexity is O(N^2) + O(N) which simplifies to O(N^2).

Optimal Solution

Approach

The best way to compress the string is to use a method called dynamic programming. We consider substrings of the original string and compute the minimum length compressed version for each substring, building up to the full string.

Here's how the algorithm would work step-by-step:

  1. Start by considering the first character and figure out the cost of keeping it (possibly part of a longer repeating sequence) or deleting it.
  2. For each character, we consider two choices: either include it in a repeating sequence or delete it and try to form shorter strings with the remaining characters.
  3. Calculate the compressed length based on these choices. If we include it, check how many repeating characters there are and then use the correct number to express the length, if needed.
  4. Because there is a deletion limit, for each character, we may delete it and increase the number of deletions and calculate the length for the rest of the string.
  5. Repeat the process for the remaining characters, using the previously calculated compressed lengths to build up the optimal solution, trying to decide for each character what is the optimal way to compress it.
  6. The optimal solution is obtained by picking the best length that results after considering all characters.

Code Implementation

def string_compression_two(input_string, allowed_deletions):
    string_length = len(input_string)

    # Initialize a DP table to store minimum compressed lengths
    dp_table = [[float('inf')] * (allowed_deletions + 1) for _ in range(string_length + 1)]
    dp_table[0][0] = 0

    def get_length_of_run(run_length):
        if run_length == 1:
            return 1
        elif run_length < 10:
            return 2
        elif run_length < 100:
            return 3
        else:
            return 4

    for index in range(1, string_length + 1):
        for deletions_used in range(allowed_deletions + 1):
            # Option 1: Delete the current character
            if deletions_used > 0:
                dp_table[index][deletions_used] = dp_table[index - 1][deletions_used - 1]

            # Option 2: Keep the current character and extend a run
            current_char = input_string[index - 1]
            run_length = 0
            deletions_needed = 0
            for previous_index in range(index, 0, -1):

                if input_string[previous_index - 1] == current_char:
                    run_length += 1
                else:
                    deletions_needed += 1

                if deletions_used >= deletions_needed:
                    
                    # Key decision point: Calculate cost of adding to run
                    length_increase = get_length_of_run(run_length) if previous_index == 1 or input_string[previous_index-2] != current_char else 0
                    dp_table[index][deletions_used] = min(dp_table[index][deletions_used],
                                                           dp_table[previous_index - 1][deletions_used - deletions_needed] + 1 if length_increase == 0 else dp_table[previous_index - 1][deletions_used - deletions_needed] + length_increase)

    # Find the minimum compressed length among all possible deletions
    minimum_compressed_length = min(dp_table[string_length])

    return minimum_compressed_length

Big(O) Analysis

Time Complexity
O(n^2 * k)The dynamic programming approach iterates through substrings of the input string of length n. For each substring, it explores two options: including a character in a repeating sequence or deleting it. Finding the repeating sequence can take up to O(n) in the worst case. Additionally, at each state in the dynamic programming table, we iterate through all possible remaining deletions k, where k is the maximum number of deletions allowed. Therefore, the overall time complexity is O(n * n * k), which simplifies to O(n^2 * k).
Space Complexity
O(N*K)The dynamic programming approach described stores intermediate results for substrings in a table or memoization structure. The size of this table is determined by the length of the string, N, and the number of allowed deletions, K. Therefore, the auxiliary space is proportional to the product of these two values, resulting in O(N*K) space complexity.

Edge Cases

Empty input string
How to Handle:
Return 0, as no compression is possible with an empty string.
Null input string
How to Handle:
Throw an IllegalArgumentException or return a predefined error code (e.g., -1) based on the function's contract.
k equals or exceeds the length of the string
How to Handle:
Return 0, as we can delete all characters and the compressed string will be empty.
String contains only one distinct character
How to Handle:
The compressed string will be very short and the deletion of characters may lead to further optimization.
String length approaching the limit of the integer data type used for the length
How to Handle:
Ensure intermediate calculations do not overflow the integer data type by considering using long if necessary.
k is zero (no deletions allowed)
How to Handle:
The compression algorithm should still produce the compressed string without any deletions.
The string is already optimally compressed (e.g., 'abc')
How to Handle:
Deleting any character will result in an uncompressed string, so the result depends if k > 0. If k=0, apply compression function, else return 0
Large k values combined with long repeated segments
How to Handle:
The dynamic programming table needs to be large enough to handle all possible substrings after character deletions; consider memory limits and possible out of memory errors if very long strings are allowed.