Taro Logo

Vertical Order Traversal of a Binary Tree

Medium
3 views
2 months ago

Given the root of a binary tree, calculate the vertical order traversal of the binary tree. For each node at position (row, col), its left and right children will be at positions (row + 1, col - 1) and (row + 1, col + 1) respectively. The root of the tree is at (0, 0). The vertical order traversal of a binary tree is a list of top-to-bottom orderings for each column index starting from the leftmost column and ending on the rightmost column. There may be multiple nodes in the same row and same column. In such a case, sort these nodes by their values. Return the vertical order traversal of the binary tree. For example, given the tree [3,9,20,null,null,15,7], return [[9],[3,15],[20],[7]]. As another example, given the tree [1,2,3,4,5,6,7], return [[4],[2],[1,5,6],[3],[7]]. The number of nodes in the tree is in the range [1, 1000] and 0 <= Node.val <= 1000.

Sample Answer
## Vertical Order Traversal of a Binary Tree

This problem asks us to perform a vertical order traversal of a binary tree.  For each node, we assign coordinates (row, col) where the root is at (0, 0). Left children are at (row + 1, col - 1), and right children are at (row + 1, col + 1). The goal is to return a list of top-to-bottom orderings for each column index, from leftmost to rightmost. When nodes share the same row and column, they are sorted by their values.

### 1. Naive Solution (Using Hashmap and Sorting)

A naive approach involves traversing the tree and storing nodes along with their column indices in a hashmap.  The keys of the hashmap represent the column indices, and the values are lists of nodes in that column.  After traversing the tree, we sort the column indices and then sort the nodes within each column based on their row and value.

```python
from collections import defaultdict, deque

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def vertical_order_traversal_naive(root):
    if not root:
        return []

    column_map = defaultdict(list)
    queue = deque([(root, 0, 0)])  # (node, row, col)

    while queue:
        node, row, col = queue.popleft()
        column_map[col].append((row, node.val))

        if node.left:
            queue.append((node.left, row + 1, col - 1))
        if node.right:
            queue.append((node.right, row + 1, col + 1))

    sorted_columns = sorted(column_map.keys())
    result = []
    for col in sorted_columns:
        column_nodes = sorted(column_map[col], key=lambda x: (x[0], x[1]))  # Sort by row, then value
        result.append([node[1] for node in column_nodes])

    return result

2. Optimal Solution (Using Hashmap and Priority Queue)

We can optimize the sorting step by using a priority queue (min-heap) to store nodes within each column. The priority queue will automatically maintain the nodes in sorted order based on their row and value. This eliminates the need for explicit sorting after the traversal.

from collections import defaultdict, deque
import heapq

def vertical_order_traversal_optimal(root):
    if not root:
        return []

    column_map = defaultdict(list)
    queue = deque([(root, 0, 0)])  # (node, row, col)

    while queue:
        node, row, col = queue.popleft()
        heapq.heappush(column_map[col], (row, node.val))

        if node.left:
            queue.append((node.left, row + 1, col - 1))
        if node.right:
            queue.append((node.right, row + 1, col + 1))

    sorted_columns = sorted(column_map.keys())
    result = []
    for col in sorted_columns:
        column_nodes = []
        while column_map[col]:
            column_nodes.append(heapq.heappop(column_map[col])[1]) # Extract value from the heap
        result.append(column_nodes)

    return result

3. Big(O) Run-time Analysis

  • Naive Solution: O(N log N), where N is the number of nodes in the tree. The traversal takes O(N) time. Sorting the column indices takes O(W log W) where W is the width of the tree and W <= N. Sorting the nodes within each column takes O(K log K) in the worst case where K is the number of nodes in a single column. Overall the complexity of the sorting is O(N log K), where K is the maximum number of nodes in a single column (which is bounded by N). Since sorting column indices is less significant compared to sorting the nodes inside the columns, we take O(N log N) as the time complexity.
  • Optimal Solution: O(N log K), where N is the number of nodes in the tree and K is the maximum number of nodes in a single column. The traversal takes O(N) time. Pushing and popping from the priority queue takes O(log K) time for each node. Since each node is pushed and popped once, the overall time complexity is O(N log K). Typically, K is much smaller than N.

4. Big(O) Space Usage Analysis

  • Naive Solution: O(N), where N is the number of nodes in the tree. The hashmap stores all the nodes in the tree. In the worst-case scenario (a skewed tree), the queue can hold O(N) nodes during the traversal.
  • Optimal Solution: O(N), where N is the number of nodes in the tree. Similar to the naive solution, the hashmap stores all the nodes, and the queue can hold O(N) nodes in the worst case.

5. Edge Cases

  • Empty Tree: If the root is None, return an empty list. This is handled at the beginning of both solutions.
  • Skewed Tree: In a skewed tree, all nodes may fall into a few columns, potentially increasing the time complexity of sorting in the naive solution. The optimal solution handles this gracefully using priority queues.
  • Duplicate Node Values: The solutions correctly handle duplicate node values by sorting nodes with the same row and column based on their values.
  • Large Number of Columns: If the tree is very wide, the number of columns can be significant. The defaultdict automatically handles the creation of new columns as needed.
  • Negative Column Indices: The solutions correctly handle negative column indices by using a defaultdict, which allows for negative keys.

By using a defaultdict and a priority queue (in the optimal solution), we can efficiently perform the vertical order traversal while handling edge cases and duplicate values appropriately. The priority queue ensures that nodes within each column are correctly ordered without the need for explicit sorting after the traversal.