Find Weighted Median Node in Tree

Hard

Asked by:

10 views

Topics:

TreesGraphs

You are given an integer n and an undirected, weighted tree rooted at node 0 with n nodes numbered from 0 to n - 1. This is represented by a 2D array edges of length n - 1, where edges[i] = [u_i, v_i, w_i] indicates an edge from node u_i to v_i with weight w_i.

The weighted median node is defined as the first node x on the path from u_i to v_i such that the sum of edge weights from u_i to x is greater than or equal to half of the total path weight.

You are given a 2D integer array queries. For each queries[j] = [u_j, v_j], determine the weighted median node along the path from u_j to v_j.

Return an array ans, where ans[j] is the node index of the weighted median for queries[j].

Example 1:

Input: n = 2, edges = [[0,1,7]], queries = [[1,0],[0,1]]

Output: [0,1]

Explanation:

Query	Path	Edge Weights	Total Path Weight	Half	Explanation	Answer
`[1, 0]`	`1 → 0`	`[7]`	7	3.5	Sum from `1 → 0 = 7 >= 3.5`, median is node 0.	0
`[0, 1]`	`0 → 1`	`[7]`	7	3.5	Sum from `0 → 1 = 7 >= 3.5`, median is node 1.	1

Example 2:

Input: n = 3, edges = [[0,1,2],[2,0,4]], queries = [[0,1],[2,0],[1,2]]

Output: [1,0,2]

Explanation:

Query	Path	Edge Weights	Total Path Weight	Half	Explanation	Answer
`[0, 1]`	`0 → 1`	`[2]`	2	1	Sum from `0 → 1 = 2 >= 1`, median is node 1.	1
`[2, 0]`	`2 → 0`	`[4]`	4	2	Sum from `2 → 0 = 4 >= 2`, median is node 0.	0
`[1, 2]`	`1 → 0 → 2`	`[2, 4]`	6	3	Sum from `1 → 0 = 2 < 3`. Sum from `1 → 2 = 2 + 4 = 6 >= 3`, median is node 2.	2

Example 3:

Input: n = 5, edges = [[0,1,2],[0,2,5],[1,3,1],[2,4,3]], queries = [[3,4],[1,2]]

Output: [2,2]

Explanation:

Query	Path	Edge Weights	Total Path Weight	Half	Explanation	Answer
`[3, 4]`	`3 → 1 → 0 → 2 → 4`	`[1, 2, 5, 3]`	11	5.5	Sum from `3 → 1 = 1 < 5.5`. Sum from `3 → 0 = 1 + 2 = 3 < 5.5`. Sum from `3 → 2 = 1 + 2 + 5 = 8 >= 5.5`, median is node 2.	2
`[1, 2]`	`1 → 0 → 2`	`[2, 5]`	7	3.5	Sum from `1 → 0 = 2 < 3.5`. Sum from `1 → 2 = 2 + 5 = 7 >= 3.5`, median is node 2.	2

Constraints:

2 <= n <= 10⁵
edges.length == n - 1
edges[i] == [u_i, v_i, w_i]
0 <= u_i, v_i < n
1 <= w_i <= 10⁹
1 <= queries.length <= 10⁵
queries[j] == [u_j, v_j]
0 <= u_j, v_j < n
The input is generated such that edges represents a valid tree.

Try coding it in LeetCode

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What are the possible values for the node weights and the edge weights? Can they be negative, zero, or floating-point numbers?
Is the tree guaranteed to be connected? What should I return if the tree is empty or consists of a single node?
What does it mean if multiple nodes satisfy the weighted median condition? Should I return any one of them, the first one I encounter in a specific traversal, or is there a tie-breaking rule?
Can you define more precisely what constitutes the 'weighted median node'? Is it the node where the sum of weights of nodes reachable with less than or equal to the edge weights to that node is >= half of the total weight, AND the sum of weights of nodes reachable with less than or equal to the edge weights via *other* paths is also >= half of the total weight?
Are the node IDs distinct integers, or can they be any data type, and how are the nodes represented in the input (e.g., an adjacency list/matrix, a list of edges, etc.)?

Brute Force Solution

Approach

We're looking for a special node in a tree that's the 'middle' when considering weights. The brute force way means checking every single node in the tree to see if it fits the criteria of being the weighted median by exhaustively exploring all possibilities.

Here's how the algorithm would work step-by-step:

Start by picking a node in the tree. Let's pretend it's the weighted median.
Now, for that node, we need to look at all the other nodes in the tree.
For each of those other nodes, determine if it is closer to the 'pretend' weighted median node or further away.
Calculate the total weight of all the nodes that are closer to the 'pretend' weighted median node.
Also, calculate the total weight of all the nodes that are further away from the 'pretend' weighted median node.
If the total weight of nodes closer is less than or equal to half the total weight of all nodes AND the total weight of nodes further away is also less than or equal to half the total weight of all nodes, then this 'pretend' weighted median node is actually the weighted median. If this is true, we found it!
If it's NOT the weighted median, move on to the next node in the tree and repeat all the steps above.
Keep going until we've checked every single node in the tree.

Code Implementation

def find_weighted_median_node_brute_force(tree, weights):
    number_of_nodes = len(tree)
    total_weight = sum(weights)

    for potential_median_node in range(number_of_nodes):
        weight_closer = 0
        weight_further = 0

        for other_node in range(number_of_nodes):
            if other_node == potential_median_node:
                continue

            distance_to_median_node = calculate_distance(tree, potential_median_node, other_node)

            # Determine if the other node is closer or further.
            weight_closer_temp = 0
            weight_further_temp = 0
            other_distance = float('inf')

            for intermediate_node in range(number_of_nodes):
                if intermediate_node == other_node:
                    continue

                intermediate_distance = calculate_distance(tree, other_node, intermediate_node)

                if intermediate_distance < other_distance:
                    other_distance = intermediate_distance

            node_to_median_distance = calculate_distance(tree, other_node, potential_median_node)
            if node_to_median_distance <= other_distance:

                weight_closer_temp += weights[other_node]
                
            else:
                weight_further_temp += weights[other_node]
                
            weight_closer += weight_closer_temp
            weight_further += weight_further_temp

        # Check if the potential median is indeed the weighted median.
        if weight_closer <= total_weight / 2 and weight_further <= total_weight / 2:
            return potential_median_node

    return -1

def calculate_distance(tree, node1, node2):
    if node1 == node2:
        return 0

    queue = [(node1, 0)]
    visited = {node1}

    while queue:
        current_node, distance = queue.pop(0)

        if current_node == node2:
            return distance

        for neighbor in tree[current_node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, distance + 1))

    return float('inf')

Big(O) Analysis

Time Complexity

O(n²) – The algorithm iterates through each of the n nodes in the tree, considering each node as a potential weighted median. For each potential median node, the algorithm iterates through all other n nodes in the tree to determine their distance relative to the potential median. Therefore, for each of the n nodes, we perform approximately n distance calculations. Thus, the total number of operations approximates n * n, which simplifies to O(n²).

Space Complexity

O(N) – The algorithm iterates through each node in the tree, using each node as a potential weighted median. For each candidate node, the algorithm iterates through all other nodes to determine their distance relative to the candidate. While not explicitly stated, calculating distances in a tree often involves performing a depth-first search (DFS) or breadth-first search (BFS). Both DFS and BFS might involve a queue or stack and potentially a 'visited' set to avoid cycles. The space used by this queue/stack and the visited set is proportional to the number of nodes in the tree, N, at most. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

To find the weighted median node in a tree efficiently, we'll use a divide-and-conquer approach based on the concept of tree centroids. We'll repeatedly find centroids, which are nodes that split the tree into smaller, roughly equal parts considering node weights, and then strategically search within the relevant subtree.

Here's how the algorithm would work step-by-step:

First, calculate the total weight of the entire tree by summing up the weights of all the nodes.
Find a special node called a centroid. A centroid is a node that, when removed, splits the tree into smaller parts where each part has at most half the total weight of the tree. We can efficiently find a centroid by traversing the tree.
Once you find the centroid, check if it is the weighted median. A node is the weighted median if the sum of weights of nodes smaller than it is less than or equal to half of the total weight and the sum of weights of nodes larger than it is less than or equal to half of the total weight. If it is, you're done!
If the centroid is not the weighted median, determine which side of the centroid the actual weighted median should be located on. This can be done by comparing the weight on each side of the centroid with half the total weight.
Focus on the subtree containing the weighted median. We can disregard all other parts of the tree. This is the key to the efficiency of the method.
Repeat the process: calculate the total weight of the new subtree, find a centroid within that subtree, and check if it's the weighted median. Continue until you find the weighted median node.

Code Implementation

def find_weighted_median_node(graph, weights):
    number_of_nodes = len(graph)
    
    def calculate_total_weight(subtree_root, parent):
        total_weight = weights[subtree_root]
        for neighbor in graph[subtree_root]:
            if neighbor != parent:
                total_weight += calculate_total_weight(neighbor, subtree_root)
        return total_weight

    total_weight = sum(weights)

    def find_centroid(subtree_root, parent, subtree_weight):
        is_centroid = True
        subtree_size = weights[subtree_root]

        for neighbor in graph[subtree_root]:
            if neighbor != parent:
                neighbor_subtree_weight = calculate_subtree_weight(neighbor, subtree_root)
                if neighbor_subtree_weight > subtree_weight / 2:
                    is_centroid = False
                    return find_centroid(neighbor, subtree_root, subtree_weight)
                subtree_size += neighbor_subtree_weight

        if is_centroid:
            return subtree_root

    def calculate_subtree_weight(subtree_root, parent):
        weight = weights[subtree_root]
        for neighbor in graph[subtree_root]:
            if neighbor != parent:
                weight += calculate_subtree_weight(neighbor, subtree_root)
        return weight

    current_subtree_root = 0
    nodes_in_current_subtree = list(range(number_of_nodes))

    while True:
        subtree_weight = sum([weights[node] for node in nodes_in_current_subtree])
        centroid = find_centroid(current_subtree_root, -1, subtree_weight)

        # Check if the centroid is the weighted median
        less_than_centroid_weight = 0
        greater_than_centroid_weight = 0
        for i in range(number_of_nodes):
            if i != centroid:
                if weights[i] <= weights[centroid]:
                    less_than_centroid_weight += weights[i]
                else:
                    greater_than_centroid_weight += weights[i]

        if less_than_centroid_weight <= total_weight / 2 and greater_than_centroid_weight <= total_weight / 2:
            return centroid

        # Determine which subtree to focus on
        heavy_subtree = -1
        heavy_subtree_weight = 0
        for neighbor in graph[centroid]:
            neighbor_subtree_weight = calculate_subtree_weight(neighbor, centroid)
            if neighbor_subtree_weight > heavy_subtree_weight:
                heavy_subtree_weight = neighbor_subtree_weight
                heavy_subtree = neighbor

        if heavy_subtree == -1:
            return centroid

        #Narrow down the search space to the heavy subtree
        new_nodes_in_subtree = []
        def dfs(node, parent):
            new_nodes_in_subtree.append(node)
            for neighbor in graph[node]:
                if neighbor != parent:
                    dfs(neighbor,node)

        new_nodes_in_subtree = []
        dfs(heavy_subtree, centroid)
        
        nodes_in_current_subtree = new_nodes_in_subtree
        current_subtree_root = nodes_in_current_subtree[0]

Big(O) Analysis

Time Complexity

O(n log n) – The algorithm repeatedly finds centroids, which split the tree into subtrees of roughly half the size. Finding a centroid involves traversing the current subtree, taking O(n) time in the worst case for a tree of size n. Since the tree is split approximately in half at each step, the number of iterations (depth of recursion) is O(log n). Therefore, the overall time complexity is O(n log n) because each level of the recursion potentially processes all n nodes to find the centroid, and there are log n levels.

Space Complexity

O(N) – The algorithm implicitly uses recursion, with each recursive call potentially adding a new frame to the call stack. In the worst-case scenario, the tree could resemble a linked list, leading to a maximum recursion depth proportional to the number of nodes N. Furthermore, the centroid finding process may require auxiliary storage to keep track of subtree sizes during traversal, also potentially proportional to N in the worst case. Therefore, the space complexity is dominated by the recursion depth and subtree size storage, resulting in O(N) auxiliary space.

Edge Cases

Case	How to Handle
Null or empty tree	Return null if the tree is null or empty as there is no median node.
Single node tree	Return the single node itself as it is trivially the weighted median.
Tree with all weights equal to zero	Return any node; typically the root is a good default in this edge case, as all nodes are equally valid.
Tree with negative weights (if weights can be negative)	Handle negative weights based on problem constraints, potentially normalizing them to positive values if necessary.
Very large weights causing integer overflow	Use appropriate data types (e.g., long) or consider normalization if overflow is a concern.
Skewed tree with the majority of the weight concentrated in one subtree.	Ensure that the algorithm efficiently traverses such skewed trees without excessive recursion depth or performance degradation.
Disconnection of nodes from root due to weight adjustments.	The chosen solution should correctly handle potentially orphaned nodes (disconnected from the root after weight adjustment) during the search for the weighted median.
Nodes with duplicate weight values	The solution must correctly account for and distinguish between nodes with same weight values to identify the true weighted median.

Null or empty tree

How to Handle:

Return null if the tree is null or empty as there is no median node.

Single node tree

How to Handle:

Return the single node itself as it is trivially the weighted median.

Tree with all weights equal to zero

How to Handle:

Return any node; typically the root is a good default in this edge case, as all nodes are equally valid.

Tree with negative weights (if weights can be negative)

How to Handle:

Handle negative weights based on problem constraints, potentially normalizing them to positive values if necessary.

Very large weights causing integer overflow

How to Handle:

Use appropriate data types (e.g., long) or consider normalization if overflow is a concern.

Skewed tree with the majority of the weight concentrated in one subtree.

How to Handle:

Ensure that the algorithm efficiently traverses such skewed trees without excessive recursion depth or performance degradation.

Disconnection of nodes from root due to weight adjustments.

How to Handle:

The chosen solution should correctly handle potentially orphaned nodes (disconnected from the root after weight adjustment) during the search for the weighted median.

Nodes with duplicate weight values

How to Handle:

The solution must correctly account for and distinguish between nodes with same weight values to identify the true weighted median.