Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Example:
Input: ["lint","code","love","you"]
Output: ["lint","code","love","you"]
Explain how your encoding and decoding algorithms work. What are the time and space complexities of your approach? Consider edge cases such as empty strings, empty lists, very long strings, and the presence of delimiter characters within the strings themselves. Can you write a function encode(strs)
which takes a list of strings as input and returns an encoded single string? And a function decode(s)
which takes the encoded string as input and returns the original list of strings?
Let's explore how to encode a list of strings into a single string and then decode it back to the original list. This is a common problem when you need to transmit a list of strings over a network where you can only send a single string.
A simple approach is to concatenate all the strings together. However, this presents a problem: how do we know where one string ends and the next one begins? If the strings contain special characters, these could be used as delimiters.
class Codec:
def encode(self, strs):
return "".join(strs)
def decode(self, s):
# This naive decode wouldn't work without string lengths
return [s]
The naive approach doesn't provide a practical decoding mechanism without delimiters or string length information. It's mainly useful as a starting point to understand the core problem.
A much better approach is to prefix each string with its length followed by a delimiter (e.g., #
). This provides a clear way to know where each string begins and ends during decoding.
class Codec:
def encode(self, strs):
encoded_string = ""
for s in strs:
encoded_string += str(len(s)) + "#" + s
return encoded_string
def decode(self, s):
result = []
i = 0
while i < len(s):
j = i
while s[j] != '#':
j += 1
length = int(s[i:j])
i = j + 1
result.append(s[i:i + length])
i += length
return result
Let's say strs = ["lint", "code", "love", "you"]
.
Encoding:
Encoded string: "4#lint4#code4#love3#you"
Decoding:
#
can appear in the input strings, we'd need to use an escape character, or choose a delimiter character that is disallowed in the input. Another approach is to replace all occurrances of #
with a substitute sequence during encoding, and revert during decoding.This length-based encoding provides a robust and efficient way to serialize and deserialize a list of strings into a single string.