Taro Logo

String Compression

Medium
Meta logo
Meta
3 views
Topics:
ArraysStringsTwo Pointers

Given an array of characters chars, compress it using the following algorithm:

Begin with an empty string s. For each group of consecutive repeating characters in chars:

  • If the group's length is 1, append the character to s.
  • Otherwise, append the character followed by the group's length.

The compressed string s should not be returned separately, but instead, be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.

After you are done modifying the input array, return the new length of the array.

You must write an algorithm that uses only constant extra space.

For example:

chars = ["a","a","b","b","c","c","c"] should modify the array to ["a","2","b","2","c","3"] and return 6.

chars = ["a"] should return 1 and the array remains ["a"].

chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"] should modify the array to ["a","b","1","2"] and return 4.

Solution


Naive Solution

A naive solution would involve creating a new string to store the compressed characters. We iterate through the input chars array, count consecutive repeating characters, and then append the character and its count (if the count is greater than 1) to the new string. Finally, copy the compressed string back to the chars array.

Code (Python):

def compress_naive(chars):
    if not chars:
        return 0

    compressed = ""
    count = 1
    for i in range(len(chars)):
        if i + 1 < len(chars) and chars[i] == chars[i + 1]:
            count += 1
        else:
            compressed += chars[i]
            if count > 1:
                compressed += str(count)
            count = 1

    chars[:] = list(compressed)
    return len(chars)

Big(O) Analysis:

  • Time Complexity: O(n), where n is the length of the input array, as we iterate through the array once.
  • Space Complexity: O(n), in the worst case, the compressed string could have the same length as the input array.

Optimal Solution

The optimal solution uses a two-pointer approach with constant extra space. We use one pointer (write_ptr) to track the position where we write the compressed characters and another pointer (read_ptr) to iterate through the input array.

Algorithm:

  1. Initialize write_ptr and read_ptr to 0.
  2. Iterate through the chars array using read_ptr.
  3. For each group of consecutive repeating characters, count the occurrences.
  4. Write the character to chars[write_ptr] and increment write_ptr.
  5. If the count is greater than 1, convert the count to a string and write each digit to chars starting from write_ptr.
  6. Return write_ptr, which represents the new length of the compressed array.

Code (Python):

def compress(chars):
    write_ptr = 0
    read_ptr = 0

    while read_ptr < len(chars):
        char = chars[read_ptr]
        count = 0
        while read_ptr < len(chars) and chars[read_ptr] == char:
            read_ptr += 1
            count += 1

        chars[write_ptr] = char
        write_ptr += 1

        if count > 1:
            for digit in str(count):
                chars[write_ptr] = digit
                write_ptr += 1

    return write_ptr

Big(O) Analysis:

  • Time Complexity: O(n), where n is the length of the input array, as we iterate through the array once.
  • Space Complexity: O(1), as we use constant extra space.

Edge Cases:

  • Empty input array: The code should handle an empty array gracefully by returning 0.
  • Single character input: The code should correctly handle a single character input by returning 1 and leaving the array unchanged.
  • All same characters: The code should compress the array to the character followed by the count.
  • Characters with counts >= 10: Ensure that multi-digit counts are correctly handled by converting them to strings and writing individual digits to the array.

Summary

The optimal solution uses a two-pointer approach to compress the character array in-place with constant extra space. This approach is more efficient than the naive solution, which uses O(n) extra space.