Taro Logo

Encode and Decode Strings

Medium
Google logo
Google
2 views
Topics:
Strings

Design an algorithm to encode a list of strings to a single string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Implement encode and decode.

def encode(strs: List[str]) -> str:
    """Encodes a list of strings to a single string.
    """
    pass

def decode(s: str) -> List[str]:
    """Decodes a single string to a list of strings.
    """
    pass

For example, given the input ["lint", "code", "love", "you"], your encode function should return a single string. The decode function should then take this encoded string and return the original list: ["lint", "code", "love", "you"].

Clarifications:

  1. What characters can be in the string? Can I assume any encoding?
  2. Can the input strings be empty? If so, how should that be handled?
  3. How should I handle edge cases such as an empty input list?
  4. What is the expected performance in terms of time and space complexity?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the strings in the input list contain any special characters or Unicode characters?
  2. What is the expected behavior if the input list is empty?
  3. What is the maximum length of an individual string in the input list?
  4. What character(s) should I use as a delimiter for encoding, and should I be concerned about those characters appearing in the original strings?
  5. What is the expected output if the input list contains very long strings that may approach memory limits?

Brute Force Solution

Approach

The goal is to combine multiple strings into one, and then be able to separate them back out. The straightforward way is to add something special between each string, so we know where one string ends and the next one begins.

Here's how the algorithm would work step-by-step:

  1. To combine strings, go through each string one at a time.
  2. For each string, first find out how long it is, and write down that length.
  3. Then, write down the string itself.
  4. Repeat this process for every string, so we have a list of lengths followed by the actual strings.
  5. To separate the strings later, first read the length of the first string.
  6. Then, read that many characters from the combined string to get the first string.
  7. Repeat this process by reading the next length and then reading the next string until you have separated all of them.

Code Implementation

def encode(strings_to_encode):
    encoded_string = ""

    for current_string in strings_to_encode:
        # Prepend the length of the string to the string itself.
        string_length = str(len(current_string))
        encoded_string += string_length + "," + current_string

    return encoded_string

def decode(encoded_string):
    decoded_strings = []
    index = 0

    while index < len(encoded_string):
        # Extract the length of the next string.
        comma_index = encoded_string.find(",", index)
        string_length = int(encoded_string[index:comma_index])

        # Extract the string based on the parsed length.
        start_index = comma_index + 1
        end_index = comma_index + 1 + string_length
        decoded_string = encoded_string[start_index:end_index]
        decoded_strings.append(decoded_string)

        # Move the index to the beginning of the next length.
        index = end_index

    return decoded_strings

Big(O) Analysis

Time Complexity
O(n)Encoding iterates through each of the 'k' strings in the input list, where the total length of all strings combined is 'n'. For each string, its length is calculated (O(1)) and then written, along with the string itself. Decoding iterates through the encoded string. For each string, it reads the length (O(1)) and then extracts the string, which takes time proportional to the string's length. Since encoding and decoding touch each character in all input strings only once in total, the time complexity is O(n), where n is the total length of all strings.
Space Complexity
O(N)The encoding process builds a new string by concatenating lengths and original strings. In the worst case, the encoded string has a size proportional to the sum of the lengths of the input strings (N), plus some overhead for the length representations. The decoding process creates a new list to store the decoded strings, and in the worst case, that list will also contain N total characters across all the decoded strings, where N is the sum of the lengths of all the original input strings. Therefore, the auxiliary space used is O(N).

Optimal Solution

Approach

The problem is to convert a list of words into a single encoded string and then convert that encoded string back into the original list of words. We need to invent a way to mark where each word begins and ends within the single string so we can separate them later.

Here's how the algorithm would work step-by-step:

  1. For encoding, go through each word in the list.
  2. Before adding a word to the encoded string, add the word's length followed by a special character like a slash or period.
  3. Then, add the actual word to the encoded string.
  4. Repeat this process for every word, combining everything into one large string.
  5. For decoding, read the encoded string from the beginning.
  6. First, read the length of the next word by reading until you reach the special character.
  7. Then, read the word itself using the length you just found.
  8. Add the decoded word to the list of decoded words.
  9. Repeat until you have processed the entire encoded string.

Code Implementation

def encode(list_of_strings):
    encoded_string = ""
    for current_string in list_of_strings:
        # Prepend length to allow decoding
        encoded_string += str(len(current_string)) + "." + current_string

    return encoded_string

def decode(encoded_string):
    decoded_list = []
    string_index = 0

    while string_index < len(encoded_string):
        # Find the delimiter to extract length
        delimiter_index = encoded_string.find(".", string_index)
        string_length = int(encoded_string[string_index:delimiter_index])

        # Increment index past length and delimiter
        string_index = delimiter_index + 1

        # Extract the string based on the extracted length
        word = encoded_string[string_index:string_index + string_length]
        decoded_list.append(word)

        # Increment index to the next encoded word
        string_index += string_length

    return decoded_list

Big(O) Analysis

Time Complexity
O(n)The encoding process iterates through each word in the input list once. For each word, it calculates the length and appends the length, a delimiter, and the word itself to the encoded string. Similarly, the decoding process iterates through the encoded string once, reading the length of each word, extracting the word, and adding it to the result list. Both encoding and decoding perform a constant amount of work per word/character resulting in a linear time complexity relative to the total number of characters across all the input strings. Thus, the time complexity is O(n), where n is the total number of characters in all the input strings.
Space Complexity
O(N)During encoding, a new string is created to store the encoded representation of the input list of words. In the worst case, the encoded string could be proportional to the combined lengths of all the words in the input list. During decoding, a new list is created to store the decoded words, and in the worst case, this list will have the same number of words as the input list, with each word taking up space proportional to its length. Therefore, the auxiliary space is O(N), where N is the total length of all the words in the original list.

Edge Cases

CaseHow to Handle
Empty input listReturn an empty string if the input list is empty to avoid processing errors.
List contains an empty stringHandle empty strings correctly by encoding them with a length of 0.
Strings with special characters (e.g., Unicode, control characters)Ensure encoding and decoding preserve special characters without corruption by using a robust encoding scheme like UTF-8.
Very long strings leading to integer overflow when encoding the lengthUse a data type with sufficient range (e.g., long) to store the length and prevent overflow.
List with a very large number of strings exceeding available memoryConsider a streaming approach if memory is a constraint, processing strings in chunks.
Encoded string with invalid length prefixDuring decoding, check if the declared length is within the bounds of the remaining encoded string and handle invalid lengths gracefully, potentially returning an error.
Null input for string in the input ListTreat null strings as empty strings or throw an IllegalArgumentException.
Strings containing the delimiter used for encoding lengthsEscape or avoid the delimiter in the strings to prevent misinterpretation during decoding.