Taro Logo

Serialize and Deserialize N-ary Tree

Hard
Amazon logo
Amazon
1 view
Topics:
TreesRecursion

You are given the root of an N-ary tree. An N-ary tree is a tree in which each node has no more than N children. Design an algorithm to serialize and deserialize an N-ary tree. There is no restriction on how your serialization/deserialization algorithm should work. You just need to ensure that an N-ary tree can be serialized to a string and this string can be deserialized to the original tree structure.

Clarifications:

  1. Node Structure: An N-ary tree node has a value and a list of children nodes. The value is an integer.
  2. Serialization Format: You can choose any valid and unambiguous string representation for the serialized tree.
  3. Correctness: The deserialized tree should be structurally identical to the original tree, with the same values at corresponding nodes.

Examples:

Example 1:

Input:

     1
   / | \
  2  3  4
 /      \
5       6

Output: Serialized string representing the tree, and a deserialized tree that mirrors the original.

Example 2:

Input: null (empty tree)

Output: "" (empty string) and null.

Constraints:

  • The number of nodes in the tree is in the range [0, 10^4].
  • 0 <= Node.val <= 10^4
  • Your serialization and deserialization algorithms should be efficient.

Solution


Serialize and Deserialize N-ary Tree

Naive Approach: Breadth-First Search (BFS) with Delimiters

A naive approach is to perform a breadth-first traversal of the N-ary tree. We can use special delimiters to indicate the end of a node's children. This is similar to how binary trees are serialized, but we must deal with a variable number of children.

Algorithm

  1. Serialization:

    • Use a queue to perform BFS.
    • For each node, append its value to the serialized string.
    • Append the number of children the current node has to the serialized string.
    • Append a delimiter to separate the node's value from its children count.
    • Append a delimiter after each node to differentiate it from its siblings.
    • Append null to the serialized string when we encounter a null node.
  2. Deserialization:

    • Split the serialized string using the delimiters.
    • Reconstruct the tree using a queue and the number of children indicated in the serialized string.

Example:

Consider an N-ary tree where node A has children B and C, and B has child D. A simple serialization could be:

A,2,B,1,C,0,D,0

This denotes that A has value A and 2 children, B has value B and 1 child, C has value C and 0 children, and D has value D and 0 children. However, this lacks clarity and can be difficult to parse robustly, especially with more complex data or node value constraints.

Code (Python)

class Node:
    def __init__(self, val=None, children=None):
        self.val = val
        self.children = children

class Codec:
    def serialize(self, root: 'Node') -> str:
        if not root:
            return ""

        result = []
        queue = [root]

        while queue:
            node = queue.pop(0)
            if not node:
                result.append("null")
            else:
                result.append(str(node.val))
                result.append(str(len(node.children) if node.children else 0))
                if node.children:
                    queue.extend(node.children)

        return ",".join(result)

    def deserialize(self, data: str) -> 'Node':
        if not data:
            return None

        nodes = data.split(",")
        if not nodes:
          return None

        val = nodes.pop(0)
        num_children = int(nodes.pop(0))

        root = Node(val, [])

        queue = [root]

        while queue:
          node = queue.pop(0)
          for _ in range(num_children):
            child_val = nodes.pop(0)
            child_num_children = int(nodes.pop(0))

            child = Node(child_val, [])
            node.children.append(child)
            queue.append(child)

          if not nodes:
            break

          val = nodes.pop(0)
          num_children = int(nodes.pop(0))


        return root

Big O Analysis

  • Time Complexity: O(N), where N is the number of nodes in the tree, as we visit each node once during both serialization and deserialization.
  • Space Complexity: O(N) in the worst-case scenario (e.g., a wide tree) where the queue used in BFS might hold all the nodes at one level.

Optimal Approach: Preorder Traversal with Child Count

A more robust and efficient approach involves using preorder traversal combined with encoding the number of children each node has. This allows for easier and unambiguous reconstruction of the tree structure.

Algorithm

  1. Serialization:

    • Perform a preorder traversal of the tree.
    • For each node, store its value and the number of its children in the serialized string.
    • Use a delimiter (e.g., comma) to separate the value and child count and a different delimiter (e.g., space) to separate nodes.
    • Encode None (null) nodes with a specific marker (e.g., '#').
  2. Deserialization:

    • Split the serialized string into tokens.
    • Use the tokens to reconstruct the tree in a preorder fashion.
    • The child count informs how many children to create for each node.

Example:

Consider the same N-ary tree as before. A serialized string could be:

A,2 B,1 D,0 C,0

This clearly indicates that A has value A and 2 children, B has value B and 1 child, D has value D and 0 children, and C has value C and 0 children. This representation is easier to parse and less ambiguous than the naive approach.

Code (Python)

class Node:
    def __init__(self, val=None, children=None):
        self.val = val
        self.children = children

class Codec:
    def serialize(self, root: 'Node') -> str:
        def encode(node):
            if not node:
                return "# "
            s = str(node.val) + "," + str(len(node.children)) + " "
            for child in node.children:
                s += encode(child)
            return s

        return encode(root) if root else ""

    def deserialize(self, data: str) -> 'Node':
        def decode():
            if not data:
                return None
            val, num_children = nodes.pop(0).split(",")
            node = Node(int(val), [])
            for _ in range(int(num_children)):
                node.children.append(decode())
            return node

        if not data:
            return None
        nodes = data.split() # Splitting by spaces
        if nodes[0] == "#":
          return None
        return decode()

Big O Analysis

  • Time Complexity: O(N), where N is the number of nodes in the tree, as we visit each node once during both serialization and deserialization.
  • Space Complexity: O(N) in the worst-case scenario, primarily due to the recursion stack during preorder traversal and the storage of the serialized string and its tokens.

Edge Cases

  • Empty Tree: The code should handle the case where the input tree is empty (root is None). In this case, the serialized string should be empty, and deserialization should return None.
  • Null Nodes: The code should gracefully handle null nodes during deserialization. This is usually done by using a special character (e.g., '#') to represent null nodes in the serialized string.
  • Trees with Varying Branching Factors: The code should correctly handle trees where different nodes have different numbers of children.
  • Large Tree: While O(N), it should still be performant if the size of the tree grows.