Taro Logo

String Compression

#1 Most AskedMedium
34 views
Topics:
ArraysTwo PointersStrings

Given an array of characters chars, compress it using the following algorithm:

Begin with an empty string s. For each group of consecutive repeating characters in chars:

  • If the group's length is 1, append the character to s.
  • Otherwise, append the character followed by the group's length.

The compressed string s should not be returned separately, but instead, be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.

After you are done modifying the input array, return the new length of the array.

You must write an algorithm that uses only constant extra space.

Example 1:

Input: chars = ["a","a","b","b","c","c","c"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]
Explanation: The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".

Example 2:

Input: chars = ["a"]
Output: Return 1, and the first character of the input array should be: ["a"]
Explanation: The only group is "a", which remains uncompressed since it's a single character.

Example 3:

Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Output: Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"].
Explanation: The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".

Constraints:

  • 1 <= chars.length <= 2000
  • chars[i] is a lowercase English letter, uppercase English letter, digit, or symbol.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input list of characters contain uppercase and lowercase letters, or only lowercase English letters as seen in typical examples?
  2. If a character repeats more than 9 times, how should the count be represented? For example, should it be 'a10' or 'a1', '0'?
  3. What should happen if the input list is empty? What is the expected new length in that case?
  4. Does the input list always contain at least one character, or can it be null or an empty list?
  5. Are there any constraints on the maximum length of the input list or the maximum number of repetitions for a character?

Brute Force Solution

Approach

The brute force approach to string compression is like trying to find the most efficient way to write a message by grouping identical consecutive characters. It involves going through the message and creating a new, potentially shorter, version by representing groups of repeating characters with the character itself followed by how many times it repeats.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the message.
  2. Look at the current character and see how many times it repeats consecutively.
  3. If the character repeats more than once, write down the character and then write down the count of how many times it repeated.
  4. If the character only appears once, just write down the character itself.
  5. Move to the next new character in the message and repeat the process.
  6. Keep doing this until you've gone through the entire original message.
  7. The result is a new, possibly shorter, message.

Code Implementation

def compress_string_brute_force(original_string):
    compressed_parts = []
    string_index = 0

    while string_index < len(original_string):
        current_character = original_string[string_index]
        count = 0

        # Determine the consecutive count of the current character
        while string_index < len(original_string) and original_string[string_index] == current_character:
            count += 1
            string_index += 1

        # Append the character to the result list
        compressed_parts.append(current_character)

        # Only append the count if it's greater than 1
        if count > 1:
            compressed_parts.append(str(count))

    return "".join(compressed_parts)

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string once. For each character, it counts consecutive occurrences. In the worst case, each character is distinct, and we still only process each character a constant number of times (checking its next occurrence). Therefore, the total number of operations is directly proportional to the length of the input string, 'n', resulting in a time complexity of O(n).
Space Complexity

Optimal Solution

Approach

The best way to compress a string is to look for consecutive identical characters. We can then replace these sequences with the character itself followed by the count of how many times it repeats, but only if the count is greater than one. This cleverly shortens the string by representing repeated characters more compactly.

Here's how the algorithm would work step-by-step:

  1. Go through the string from the beginning.
  2. As you encounter characters, keep track of how many times the current character appears consecutively.
  3. When the character changes, or you reach the end of the string, check the count of the previous character.
  4. If the count of the previous character was more than one, write down the character and then write down its count.
  5. If the count was just one, simply write down the character once.
  6. Then, start tracking the new character and its count from this point forward.
  7. Continue this process until you have gone through the entire original string.

Code Implementation

def compress_string(input_string):
    if not input_string:
        return ""

    compressed_parts = []
    current_character_index = 0
    string_length = len(input_string)

    while current_character_index < string_length:
        current_character = input_string[current_character_index]
        consecutive_character_count = 0
        start_of_sequence_index = current_character_index

        # Count consecutive occurrences of the current character.
        while current_character_index < string_length and input_string[current_character_index] == current_character:
            consecutive_character_count += 1
            current_character_index += 1

        compressed_parts.append(current_character)

        # Only append the count if it's greater than one to save space.
        if consecutive_character_count > 1:
            compressed_parts.append(str(consecutive_character_count))

    # Join all parts to form the final compressed string.
    return "".join(compressed_parts)

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n exactly once. For each character, it performs a constant number of operations to count consecutive occurrences and append to the result. Even if the count is a multi-digit number, converting and appending it takes time proportional to the number of digits, which is logarithmic to the count itself, and the sum of all such logarithmic times across the string is still bounded by O(n). Thus, the total number of operations is directly proportional to the length of the string, resulting in O(n) time complexity.
Space Complexity
O(1)The algorithm iterates through the input string using a few variables to keep track of the current character and its consecutive count. No auxiliary data structures that grow with the input size N are created. The space used by these variables is constant, regardless of the string's length. Therefore, the auxiliary space complexity is O(1).

Edge Cases

Input list is empty
How to Handle:
An empty input list should result in a compressed length of 0 and the list remains empty.
Input list has only one character
How to Handle:
A single character should remain as is, and the compressed length will be 1.
All characters in the list are the same
How to Handle:
The algorithm should correctly append the character followed by its total count, potentially resulting in a longer string if the count is greater than 9.
No repeating characters in the list
How to Handle:
Each character will be appended to the compressed list, followed by '1', which will be removed, leaving only the original characters, and the length remains unchanged.
Repeat counts greater than 9
How to Handle:
The algorithm must handle multi-digit counts by appending each digit as a separate character to the compressed list.
Compressed list is longer than original list
How to Handle:
The problem statement implies modification in-place and returning the new length, so we must ensure the original list is large enough or handle potential out-of-bounds issues if not.
Input list contains characters that are digits
How to Handle:
The algorithm should treat digit characters like any other character and not interpret them as counts.
Very large input list
How to Handle:
The solution should be efficient in terms of time complexity (ideally O(n)) and space complexity (ideally O(1) for in-place modification).
0/6 completed