Taro Logo

String Compression

Medium
Nvidia logo
Nvidia
1 view
Topics:
ArraysStringsTwo Pointers

Given an array of characters chars, compress it in-place using the following algorithm:

  1. For each group of consecutive repeating characters, if the group's length is 1, append the character to the compressed string. Otherwise, append the character followed by the group's length. Note that group lengths that are 10 or longer will be split into multiple characters.
  2. Modify the input array chars in-place to store the compressed string. Do not return a separate string.
  3. Return the new length of the array.

You must write an algorithm that uses only constant extra space.

Example 1:

Input: chars = ["a","a","b","b","c","c","c"]

Output: 6, and chars becomes ["a","2","b","2","c","3"]

Example 2:

Input: chars = ["a"]

Output: 1, and chars remains ["a"]

Example 3:

Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]

Output: 4, and chars becomes ["a","b","1","2"]

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible characters in the input array? Are we limited to ASCII characters, or can we expect Unicode?
  2. If a character appears only once, should it be represented as char + '1' or just the character itself?
  3. Can the input array be empty or null?
  4. Is the provided array guaranteed to be modifiable (mutable)?
  5. Could the count of occurrences of a character ever exceed the maximum integer value representable (requiring a different storage approach)?
  6. After compression, if the compressed string is shorter than the original, should I pad the original array with arbitrary characters at the end?

Brute Force Solution

Approach

The brute force method for string compression involves checking every possible combination. We explore all potential compressed versions and pick the shortest one. It's like trying every possibility, no matter how inefficient.

Here's how the algorithm would work step-by-step:

  1. Start by assuming the compressed string is exactly the same as the original string (no compression at all).
  2. Now, look for repeating sequences of characters.
  3. For each repeating sequence you find, try compressing it using a count of how many times it repeats.
  4. This will generate a slightly different string, which might be shorter.
  5. Keep track of the length of this newly generated compressed string.
  6. Repeat the process of finding repeating sequences and compressing them in different ways, creating many different compressed strings.
  7. Compare the lengths of all the compressed strings you created, including the original uncompressed string.
  8. Choose the absolute shortest compressed string that you found. This is your brute force compressed string.

Code Implementation

def string_compression_brute_force(input_string):
    shortest_compressed_string = input_string
    string_length = len(input_string)

    for sub_string_length in range(1, string_length // 2 + 1):
        for start_index in range(string_length - sub_string_length + 1):
            sub_string = input_string[start_index:start_index + sub_string_length]
            repeat_count = 1
            current_index = start_index + sub_string_length

            # Count how many times the substring repeats
            while current_index + sub_string_length <= string_length and \
                  input_string[current_index:current_index + sub_string_length] == sub_string:
                repeat_count += 1
                current_index += sub_string_length

            if repeat_count > 1:
                # Attempt to compress with the repeated sequence.
                compressed_string = input_string[:start_index] + str(repeat_count) + sub_string + input_string[current_index:]

                #Need to compare against existing shortest string
                if len(compressed_string) < len(shortest_compressed_string):
                    shortest_compressed_string = compressed_string

    return shortest_compressed_string

Big(O) Analysis

Time Complexity
O(n^3)The brute force string compression algorithm explores all possible repeating sequences. It iterates through the string of length n to identify potential starting positions for repeating substrings. For each starting position, it checks all possible lengths of repeating substrings up to n/2. For each of these substrings, it then iterates through the rest of the string to find repeating occurrences. This nested iteration results in a time complexity of O(n * n/2 * n), which simplifies to O(n^3).
Space Complexity
O(N^2)The brute force approach explores all possible compressed versions of the string. This involves generating and storing multiple compressed strings, potentially up to N^2 different strings, where N is the length of the input string. The space required to store these generated strings contributes to the auxiliary space. Therefore, the space complexity is proportional to the number of possible compressed versions, which can grow up to O(N^2) in the worst-case scenario where many repeating sequences are evaluated.

Optimal Solution

Approach

The goal is to shorten a string by replacing repeating characters with the character and the number of times it repeats. We can accomplish this efficiently by walking through the string once, keeping track of the current character and how many times we've seen it.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the string.
  2. Keep track of the character you're currently looking at.
  3. Count how many times that character repeats in a row.
  4. If the character repeats more than once, write down the character followed by the number of times it repeated.
  5. If the character only appears once, just write down the character.
  6. Move to the next character and repeat the process until you reach the end of the string.
  7. The final result will be the compressed version of the original string.

Code Implementation

def string_compression(input_string):
    compressed_string = ""
    string_length = len(input_string)

    if string_length == 0:
        return ""

    current_character = input_string[0]
    character_count = 1

    for i in range(1, string_length):
        if input_string[i] == current_character:
            character_count += 1

        else:
            # Append character/count to result
            compressed_string += current_character

            if character_count > 1:
                compressed_string += str(character_count)

            current_character = input_string[i]
            character_count = 1

    # Append the last character sequence
    compressed_string += current_character

    if character_count > 1:
        compressed_string += str(character_count)

    return compressed_string

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n exactly once. Inside the loop, it performs a constant amount of work to count consecutive repeating characters. Therefore, the time complexity is directly proportional to the length of the string. Consequently, the algorithm's time complexity is O(n).
Space Complexity
O(N)The described algorithm constructs a new string to store the compressed version. In the worst-case scenario, where no characters repeat (e.g., "abcde"), the compressed string's length will be the same as the input string. Therefore, the algorithm uses extra space proportional to the input string's length, N, to store the compressed result. No other significant auxiliary space is used beyond this compressed string. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input arrayReturn 0 as the new length for an empty array.
Input array with a single characterReturn 1 as the new length since no compression is needed.
Input array with all identical charactersCompress to character followed by the total count; ensure correct digit-by-digit insertion for large counts.
Input array with no repeating charactersThe array remains unchanged and the length remains the same.
Input array with repeating characters at the beginning and end, but not in the middleHandle each group of repeating characters independently and update the index correctly.
Count exceeding a single digit (e.g., count of 10 or more)Convert the count to individual digits and insert them into the char array one by one.
Maximum possible length of the input arrayThe solution should scale linearly with the input size, ensuring no excessive memory usage or stack overflow.
Array with repeating characters where the compressed version takes more space than the originalEnsure that the compressed character count is being handled by overwriting what is already present in the original array.