Taro Logo

Number of Nodes in the Sub-Tree With the Same Label

Medium
Asked by:
Profile picture
Profile picture
12 views
Topics:
GraphsTreesRecursionArrays

You are given a tree (i.e. a connected, undirected graph that has no cycles) consisting of n nodes numbered from 0 to n - 1 and exactly n - 1 edges. The root of the tree is the node 0, and each node of the tree has a label which is a lower-case character given in the string labels (i.e. The node with the number i has the label labels[i]).

The edges array is given on the form edges[i] = [ai, bi], which means there is an edge between nodes ai and bi in the tree.

Return an array of size n where ans[i] is the number of nodes in the subtree of the ith node which have the same label as node i.

A subtree of a tree T is the tree consisting of a node in T and all of its descendant nodes.

Example 1:

Input: n = 7, edges = [[0,1],[0,2],[1,4],[1,5],[2,3],[2,6]], labels = "abaedcd"
Output: [2,1,1,1,1,1,1]
Explanation: Node 0 has label 'a' and its sub-tree has node 2 with label 'a' as well, thus the answer is 2. Notice that any node is part of its sub-tree.
Node 1 has a label 'b'. The sub-tree of node 1 contains nodes 1,4 and 5, as nodes 4 and 5 have different labels than node 1, the answer is just 1 (the node itself).

Example 2:

Input: n = 4, edges = [[0,1],[1,2],[0,3]], labels = "bbbb"
Output: [4,2,1,1]
Explanation: The sub-tree of node 2 contains only node 2, so the answer is 1.
The sub-tree of node 3 contains only node 3, so the answer is 1.
The sub-tree of node 1 contains nodes 1 and 2, both have label 'b', thus the answer is 2.
The sub-tree of node 0 contains nodes 0, 1, 2 and 3, all with label 'b', thus the answer is 4.

Example 3:

Input: n = 5, edges = [[0,1],[0,2],[1,3],[0,4]], labels = "aabab"
Output: [3,2,1,1,1]

Constraints:

  • 1 <= n <= 105
  • edges.length == n - 1
  • edges[i].length == 2
  • 0 <= ai, bi < n
  • ai != bi
  • labels.length == n
  • labels is consisting of only of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible values for the number of nodes 'n' in the tree? Can 'n' be zero?
  2. Are the node labels guaranteed to be lowercase English letters, or can they be other characters or uppercase? What is the maximum length of the 'labels' string?
  3. Is the 'edges' list guaranteed to represent a valid tree (connected, no cycles, single root)?
  4. If a node's label doesn't match the label of any node in its subtree (including itself), should the count for that node be zero?
  5. Can the tree be very deep, and should I be mindful of potential stack overflow issues if using recursion?

Brute Force Solution

Approach

Imagine you're exploring a family tree where each person has a favorite color. We want to know, for each person, how many of their descendants also share their favorite color. The brute-force method involves checking every single person in each subtree and comparing their favorite color with the root of the subtree.

Here's how the algorithm would work step-by-step:

  1. For each person in the family tree, consider them as the 'root' of a smaller family tree (subtree).
  2. Starting with that 'root' person, go through every single one of their descendants.
  3. For each descendant, check if their favorite color matches the favorite color of the 'root' person.
  4. Count how many descendants have the same favorite color as the 'root' person.
  5. Record this count as the number of people with the same favorite color in that person's subtree.
  6. Repeat this process for every single person in the entire family tree.

Code Implementation

def subtree_nodes_with_same_label(number_of_nodes, edges, labels):
    adjacency_list = [[] for _ in range(number_of_nodes)]
    for edge_start, edge_end in edges:
        adjacency_list[edge_start].append(edge_end)
        adjacency_list[edge_end].append(edge_start)

    result = [0] * number_of_nodes

    for current_node in range(number_of_nodes):
        # We begin by considering each node as the root.
        count = 0
        visited = [False] * number_of_nodes

        def depth_first_search(node):
            nonlocal count
            visited[node] = True
            if labels[node] == labels[current_node]:
                count += 1

            # Iterating thru all the descendants of the current node.
            for neighbor in adjacency_list[node]:
                if not visited[neighbor]:
                    depth_first_search(neighbor)

        depth_first_search(current_node)
        result[current_node] = count

    # Returns the list with the final count.
    return result

Big(O) Analysis

Time Complexity
O(n²)The given brute-force approach iterates through each of the n nodes in the tree, considering each as the root of a subtree. For each of these n nodes, it traverses the entire subtree to count nodes with matching labels. In the worst-case scenario, where the tree is a single branch or has many nodes in each subtree, the subtree traversal can take O(n) time. Therefore, the overall time complexity is O(n * n), which simplifies to O(n²).
Space Complexity
O(N)The brute-force approach, as described, involves recursion (or a stack-based equivalent) to traverse each subtree. In the worst-case scenario, such as a skewed tree, the depth of the recursion can be equal to the number of nodes, N. Therefore, the recursion stack can grow up to a depth of N, consuming O(N) space. Additionally, intermediate results, such as the count of nodes with the same label within each subtree, need to be stored, potentially requiring space proportional to N as well. Hence, the overall auxiliary space complexity is O(N).

Optimal Solution

Approach

The best way to solve this problem is to use a technique where each part of the tree figures out its answer by asking its children for their results. We then combine these results to get the answer for the current part, and share that with its parent. This avoids recomputing the same information over and over again.

Here's how the algorithm would work step-by-step:

  1. Start at the leaves (the very bottom) of the tree and move upwards.
  2. For each part of the tree, ask each of its direct children how many nodes within their subtree have the same label as the child's node.
  3. Add up those numbers, including the number of nodes with the same label as the current node.
  4. Store this total as the answer for that part of the tree.
  5. Pass this answer up to the parent of this part of the tree.
  6. Continue working up the tree, repeating the same steps until you reach the top (the root node).
  7. The answer at each part of the tree will give you the number of nodes in that subtree which match the part's label.

Code Implementation

def subtree_nodes_with_same_label(number_of_nodes, edge_list, labels):    graph = [[] for _ in range(number_of_nodes)]    for start_node, end_node in edge_list:      graph[start_node].append(end_node)
      graph[end_node].append(start_node)
    node_count_with_same_labels = [0] * number_of_nodes
    visited = [False] * number_of_nodes
    def depth_first_search(current_node):
      visited[current_node] = True
      # Initialize count for the current node's label
      node_counts = {}
      node_counts[labels[current_node]] = 1
      # Iterate through each neighbor of the current node
      for neighbor in graph[current_node]:
        if not visited[neighbor]:

          neighbor_node_counts = depth_first_search(neighbor)
          # Aggregate child counts into the current node counts.
          for label, count in neighbor_node_counts.items():
            node_counts[label] = node_counts.get(label, 0) + count
      
      node_count_with_same_labels[current_node] = node_counts.get(labels[current_node], 0)
      return node_counts

    # Start the depth-first search from node 0.
    depth_first_search(0)
    return node_count_with_same_labels

Big(O) Analysis

Time Complexity
O(n)The algorithm performs a Depth-First Search (DFS) traversal of the tree, visiting each node exactly once. At each node, a constant amount of work is performed: combining the results from its children (adding counts from the children’s hashmaps and incrementing its label count if needed). Since we visit each of the n nodes once and perform constant work at each node, the overall time complexity is O(n).
Space Complexity
O(N)The auxiliary space complexity is dominated by the recursion stack and the storage of results for each node. In the worst-case scenario, where the tree resembles a linked list, the recursion depth could reach N, where N is the number of nodes. Each recursive call requires storing a stack frame. Additionally, we store the computed answer (number of nodes with the same label) for each of the N nodes in the tree. Therefore, the overall auxiliary space complexity is O(N).

Edge Cases

Null or empty tree (no nodes)
How to Handle:
Return an empty result list immediately as there are no subtrees to process.
Single node tree
How to Handle:
Return a list of size 1 with value 1 if the node's label matches itself, otherwise 0.
Tree with all nodes having the same label
How to Handle:
Ensure the subtree count correctly propagates up, resulting in all nodes having counts equal to the size of their respective subtrees.
Tree where no node's label matches its subtree
How to Handle:
The algorithm should correctly return 0 for all nodes in the result list.
Maximum allowed number of nodes causing stack overflow with recursion
How to Handle:
Consider iterative DFS or BFS to avoid exceeding the maximum recursion depth if the number of nodes can be very large.
Disconnected graph represented as tree edges
How to Handle:
The provided edges should form a valid tree, and disconnected components are considered invalid input.
Integer overflow when counting nodes in large subtrees
How to Handle:
Use a data type (e.g., long) with sufficient capacity to store large subtree counts.
Cycles present in the provided edges causing infinite recursion
How to Handle:
The problem assumes the input forms a tree and cycles should not exist, but a check for cycles during graph construction can prevent infinite loops and provide error handling.