Taro Logo

String Compression

Medium
Pinterest logo
Pinterest
2 views
Topics:
ArraysStringsTwo Pointers

Given an array of characters chars, compress it using the following algorithm:

Begin with an empty string s. For each group of consecutive repeating characters in chars:

  • If the group's length is 1, append the character to s.
  • Otherwise, append the character followed by the group's length.

The compressed string s should not be returned separately, but instead, be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.

After you are done modifying the input array, return the new length of the array.

You must write an algorithm that uses only constant extra space.

Example 1:

Input: chars = ["a","a","b","b","c","c","c"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]
Explanation: The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".

Example 2:

Input: chars = ["a"]
Output: Return 1, and the first character of the input array should be: ["a"]
Explanation: The only group is "a", which remains uncompressed since it's a single character.

Example 3:

Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Output: Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"].
Explanation: The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".

Constraints:

  • 1 <= chars.length <= 2000
  • chars[i] is a lowercase English letter, uppercase English letter, digit, or symbol.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of characters within the `chars` array? (e.g., ASCII, Unicode)?
  2. Can the input array `chars` be empty or null?
  3. If the compressed length is the same as the original length, do I still modify the `chars` array, or should I leave it as is?
  4. Are the characters in the input array guaranteed to be in sorted order of appearance?
  5. If a count is 10 or greater, is there a maximum possible value for the count, and how would very large counts be handled, considering each digit becomes a character in `chars`?

Brute Force Solution

Approach

The basic idea is to go through the input string character by character and count consecutive repeating characters. We then build a new string by replacing each repeating sequence with the character and its count.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the string.
  2. Look at the first character and count how many times it repeats in a row.
  3. Write down the character and the number of times it repeats.
  4. Move past all the repeating characters you just counted.
  5. Repeat the process of counting and writing for the next character (and its repeats) until you reach the end of the original string.
  6. Now you have a new, potentially shorter, string. Check if this new string is actually shorter than the original. If not, the compression didn't work and you return the original string. If the new string is shorter, return it.

Code Implementation

def string_compression(input_string):
    compressed_string = ""
    string_length = len(input_string)

    index = 0
    while index < string_length:
        current_character = input_string[index]
        count = 1

        # Count repeating characters
        while (index + 1 < string_length and
               input_string[index] == input_string[index + 1]):

            count += 1
            index += 1

        # Append the character and its count to the new string
        compressed_string += current_character + str(count)

        index += 1

    # Return compressed string if shorter
    if len(compressed_string) < string_length:

        return compressed_string
    else:
        # Otherwise, return the original string
        return input_string

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n once to count consecutive repeating characters. Within the loop, it only increments a counter while the current character matches the next one. Constructing the compressed string also takes at most O(n) time. Finally, comparing the compressed string's length to the original string takes constant time. Therefore, the time complexity is dominated by the initial iteration, resulting in O(n).
Space Complexity
O(N)The algorithm constructs a new string to store the compressed version. In the worst-case scenario, where no characters repeat (e.g., "abcde"), the compressed string will be twice the size of the original (e.g., "a1b1c1d1e1"). Therefore, the space required for the new string is proportional to the length of the input string, denoted as N. Hence, the auxiliary space complexity is O(N).

Optimal Solution

Approach

The goal is to shorten a string by replacing consecutive repeating characters with the character and its count. We can do this efficiently by keeping track of the current character and its frequency as we go through the string. If the frequency is more than one we will append it to the string.

Here's how the algorithm would work step-by-step:

  1. Begin at the start of the string.
  2. Keep track of the current character we are examining.
  3. Count how many times the current character appears consecutively.
  4. Once the character changes, or we reach the end of the string, record the character.
  5. If the count of the character is greater than one, also record the count.
  6. Move to the next distinct character and repeat the counting process until we reach the end of the string.
  7. Replace the original string with the newly created string.

Code Implementation

def string_compression(input_string):
    if not input_string:
        return input_string

    compressed_string = ''
    current_char = input_string[0]
    char_count = 1

    for i in range(1, len(input_string)):
        if input_string[i] == current_char:
            char_count += 1
        else:
            # Append the previous character and its count.
            compressed_string += current_char

            if char_count > 1:
                compressed_string += str(char_count)

            current_char = input_string[i]
            char_count = 1

    # Append the last character and its count.
    compressed_string += current_char

    # We append count if it's greater than 1
    if char_count > 1:
        compressed_string += str(char_count)

    # Compare lengths and return shorter string
    if len(compressed_string) < len(input_string):
        return compressed_string
    else:
        return input_string

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n once. Inside the loop, it counts consecutive repeating characters. While counting, it only advances the index linearly through the string. The operations inside the loop (comparison, incrementing the count, and appending to the result) take constant time O(1). Therefore, the time complexity is dominated by the single pass through the string, resulting in O(n).
Space Complexity
O(N)The algorithm constructs a new string to store the compressed version of the input string. In the worst-case scenario, where no characters repeat consecutively, the compressed string could have a length equal to 2N (e.g., 'ababab...'), where N is the length of the original string because each character is followed by its count, which can be a single digit. Therefore, the auxiliary space required to store this new string grows linearly with the input size. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input arrayReturn 0, as there are no characters to compress.
Input array with a single characterReturn 1, as the character is simply copied to the beginning of the array without compression.
Input array with two different charactersReturn 2, because no compression is possible and both are placed in the beginning of the array.
Input array with all identical characters (e.g., ['a', 'a', 'a', 'a', 'a'])Correctly compresses the array to ['a', '5'] and returns 2.
Input array with repeating characters followed by non-repeating characters (e.g., ['a', 'a', 'b', 'c'])The algorithm should handle the first run of repeating characters and correctly place the next characters after the compressed portion.
Input array with counts exceeding single digits (e.g., 10 or more)The count needs to be split into individual characters, placed after the repeated character in the array.
Input contains mixed case alphabet characters onlyTreat each character independently, compressing consecutive repeats.
Large input array with many repeated characters, potential for performance issuesEnsure the solution uses an efficient in-place algorithm to avoid unnecessary memory allocation or copying, optimizing for time complexity.