Taro Logo

Decode String

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+22
60 views
Topics:
StringsStacksRecursion

Given an encoded string, return its decoded string.

The encoding rule is: k[encoded_string], where the encoded_string inside the square brackets is being repeated exactly k times. Note that k is guaranteed to be a positive integer.

You may assume that the input string is always valid; there are no extra white spaces, square brackets are well-formed, etc. Furthermore, you may assume that the original data does not contain any digits and that digits are only for those repeat numbers, k. For example, there will not be input like 3a or 2[4].

The test cases are generated so that the length of the output will never exceed 105.

Example 1:

Input: s = "3[a]2[bc]"
Output: "aaabcbc"

Example 2:

Input: s = "3[a2[c]]"
Output: "accaccacc"

Example 3:

Input: s = "2[abc]3[cd]ef"
Output: "abcabccdcdcdef"

Constraints:

  • 1 <= s.length <= 30
  • s consists of lowercase English letters, digits, and square brackets '[]'.
  • s is guaranteed to be a valid input.
  • All the integers in s are in the range [1, 300].

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters can appear in the encoded string, besides digits and square brackets? Are lowercase letters the only other allowed characters?
  2. What is the maximum nesting depth of the encoded strings? Is there a limit to the size of the decoded string?
  3. Can the integer preceding a bracketed string be zero? Are negative numbers allowed as repeat counts?
  4. If the input string is invalid (e.g., mismatched brackets, non-integer repeat counts), what should the function return? Should I throw an error, or return an empty string?
  5. Is the input string guaranteed to conform to the encoding pattern, or should I expect malformed inputs that need to be handled?

Brute Force Solution

Approach

The decode string problem can be solved by trying every possible repetition count and substring. We will exhaustively explore all combinations of repeating and concatenating decoded segments until we find the fully decoded string. This approach is not efficient, but it's a straightforward way to understand the problem.

Here's how the algorithm would work step-by-step:

  1. Start by examining the string from left to right.
  2. Whenever you encounter a number, consider it as a potential repetition count.
  3. Then, identify the substring that needs to be repeated based on the number. This might involve some pattern matching to determine where the repeated section begins and ends.
  4. Repeat that substring the specified number of times and replace the original number and encoded substring with the expanded string.
  5. Continue this process from left to right, always expanding any encoded segments you find.
  6. Keep doing this over and over, until there are no more numbers left in the string. This means you've expanded everything you possibly can.
  7. The final result is the fully decoded string after expanding all possible encoded sections.

Code Implementation

def decode_string_brute_force(encoded_string):
    while True:
        start_index = -1
        for index, char in enumerate(encoded_string):
            if char.isdigit():
                start_index = index
                break
        
        if start_index == -1:
            break

        number_end_index = start_index
        while number_end_index < len(encoded_string) and encoded_string[number_end_index].isdigit():
            number_end_index += 1
        
        repetition_count = int(encoded_string[start_index:number_end_index])

        # Find the corresponding opening bracket
        bracket_start = number_end_index

        bracket_level = 1
        bracket_end = bracket_start + 1

        # We need to find the end of the encoded segment
        while bracket_end < len(encoded_string) and bracket_level > 0:
            if encoded_string[bracket_end] == '[':
                bracket_level += 1
            elif encoded_string[bracket_end] == ']':
                bracket_level -= 1
            bracket_end += 1

        bracket_end -= 1

        repeated_string = encoded_string[bracket_start + 1:bracket_end] * repetition_count
        encoded_string = encoded_string[:start_index] + repeated_string + encoded_string[bracket_end+1:]

    return encoded_string

Big(O) Analysis

Time Complexity
O(exponential)The provided solution attempts to decode the string by exhaustively exploring all possible repetition counts and substrings. In the worst-case scenario, each number encountered could potentially lead to multiple branching possibilities for decoding, as we try different substring lengths and repetition factors. Because of this exhaustive searching of all possibilities, the number of operations grows exponentially with the input string's length (n), especially when heavily nested encoded segments are present. Thus the time complexity is O(exponential).
Space Complexity
O(N*M)The provided plain English solution expands encoded substrings in place, potentially leading to significant string growth. In the worst case, a substring could be repeated many times (M), and this expansion process could happen multiple times across the original string (N). This repeated expansion creates a string of potentially N*M in size, where M is the maximum length of a repeated decoded substring and N is length of the input string. The space is used to store the expanded string as a result of these operations. Thus, the space complexity is O(N*M).

Optimal Solution

Approach

The best way to decode this string is to use the concept of a stack. We'll process the string character by character, using the stack to remember what we've seen and properly handle nested encoded sequences.

Here's how the algorithm would work step-by-step:

  1. Go through the string from beginning to end, one character at a time.
  2. If you see a digit, get the whole number. This number tells you how many times to repeat a string later.
  3. If you see an opening bracket, like '[', save the current string you're building and the repetition count onto a stack. This is like pausing your current work to start a nested decoding.
  4. If you see a letter, just add it to the current string you are building.
  5. If you see a closing bracket, like ']', it means you're done decoding a nested part. Take the last repetition count and the string that was saved earlier from the stack.
  6. Repeat the decoded substring the correct number of times and add it to the string you were working on before you hit the opening bracket.
  7. At the end, the string you have built will be the fully decoded string.

Code Implementation

def decode_string(encoded_string):
    string_stack = []
    current_string = ''
    current_number = 0

    for char in encoded_string:
        if char.isdigit():
            current_number = current_number * 10 + int(char)
        elif char == '[':
            # Save current state before diving into substring
            string_stack.append((current_string, current_number))
            current_string = ''
            current_number = 0

        elif char == ']':
            # Retrieve previous state to build decoded string
            previous_string, repeat_count = string_stack.pop()
            current_string = previous_string + current_string * repeat_count

        else:
            # Append chars to current substring
            current_string += char

    return current_string

Big(O) Analysis

Time Complexity
O(n*m)The algorithm iterates through the input string of length n once. Inside the loop, when a closing bracket ']' is encountered, the string preceding the bracket needs to be constructed by popping elements from the stack, which, in the worst case, may involve concatenating a substring with a length proportional to the repetition count m times, where m is the count to repeat a string. Therefore, the complexity depends on both n (the length of the input string) and m (the number of repetitions which can be variable dependent on the input). The most computationally expensive part is constructing the repeated substring, leading to O(n*m) complexity in the worst-case scenario where the decoded string is much larger than the input due to high repetition counts. Hence the overall time complexity is O(n*m).
Space Complexity
O(N)The space complexity is dominated by the stack, which stores strings and repetition counts. In the worst-case scenario, the input string could consist of nested encoded sequences like 'k[k[k[...k[string]...]]]', where 'k' is a number. The depth of the nesting, and hence the number of elements pushed onto the stack, can be proportional to the length of the input string, N. Therefore, the auxiliary space used by the stack can grow linearly with the input size N. Hence, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty string inputReturn an empty string as there's nothing to decode.
Null string inputThrow an IllegalArgumentException or return null to signal an invalid input.
Input string with no encoding (no numbers or brackets)Return the original string unchanged as it represents a simple string.
Nested encodings with high repetition counts (e.g., 1000[abc]) to test performanceThe solution should scale reasonably, potentially using StringBuilder for efficient string concatenation to avoid quadratic time complexity.
Invalid format with mismatched brackets (e.g., 3[ab or 3ab])Throw an IllegalArgumentException or return an error string to indicate invalid input.
Very deep nesting of encodings (e.g., many layers of [ and ]) to test stack overflowAn iterative approach avoids recursion depth limits, or optimize recursive solution to be tail recursive if language supports it.
Repetition factor of 0 (e.g., 0[abc])Treat it as an empty string, so '0[abc]' should decode to ''.
Large repetition factors that could lead to integer overflow when calculating final string lengthImplement overflow checks, or switch to using long type for intermediate calculations to avoid integer overflow related issues.