Taro Logo

Most Frequent Subtree Sum

Medium
Amazon logo
Amazon
2 views
Topics:
TreesRecursion

Given the root of a binary tree, return the most frequent subtree sum. If there is a tie, return all the values with the highest frequency in any order.

The subtree sum of a node is defined as the sum of all the node values formed by the subtree rooted at that node (including the node itself).

For example:

Consider the following binary tree:

      5
     /  \
    2   -3

The subtree sums are:

  • Node with value 2: 2
  • Node with value -3: -3
  • Node with value 5: 5 + 2 + (-3) = 4

The frequencies are:

  • 2: 1
  • -3: 1
  • 4: 1

Since 2, -3, and 4 all have a frequency of 1, the output should be [2, -3, 4].

As another example:

      5
     /  \
    2   -5

The subtree sums are:

  • Node with value 2: 2
  • Node with value -5: -5
  • Node with value 5: 5 + 2 + (-5) = 2

The frequencies are:

  • 2: 2
  • -5: 1

Since 2 has the highest frequency (2), the output should be [2].

Write a function that takes the root of a binary tree as input and returns an array containing the most frequent subtree sums. If there is a tie, return all values with the highest frequency in any order.

Constraints:

  • The number of nodes in the tree is in the range [1, 10^4].
  • -10^5 <= Node.val <= 10^5

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of node values in the binary tree? Can node values be negative, zero, or non-integer?
  2. If multiple subtree sums have the same highest frequency, how should I return them? Should they be sorted in a specific order?
  3. What should be returned if the input tree is empty (null)? Should I return an empty list, or null, or is the input guaranteed to be a non-empty tree?
  4. Are we dealing with a binary tree, or a more general tree structure (where a node can have more than two children)?
  5. By 'subtree', do you mean a connected subgraph containing the node and all its descendants, or just any set of nodes reachable from the root?

Brute Force Solution

Approach

To find the most frequent subtree sum, we'll calculate the sum of every possible subtree. Then, we'll count how many times each sum appears, and find the sum that appears most often.

Here's how the algorithm would work step-by-step:

  1. Consider each node in the tree as the root of a potential subtree.
  2. For each of these 'root' nodes, calculate the sum of all the nodes within its subtree. To do this, add up the value of the 'root' node itself plus the sums of its left and right subtrees. If a subtree is empty, its sum is zero.
  3. After calculating the sum for each possible subtree, go through the list of sums you've found.
  4. Count how many times each unique sum appears in your list. For example, if the sum '5' appears three times, then the frequency of '5' is three.
  5. Find the sum (or sums) that has the highest frequency. That sum is the most frequent subtree sum.

Code Implementation

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def find_most_frequent_subtree_sum_brute_force(root):
    subtree_sums = []

    def calculate_subtree_sum(node):
        if not node:
            return 0

        # Recursively calculate sums of left and right subtrees
        left_subtree_sum = calculate_subtree_sum(node.left)

        right_subtree_sum = calculate_subtree_sum(node.right)

        subtree_sum = node.val + left_subtree_sum + right_subtree_sum
        subtree_sums.append(subtree_sum)
        return subtree_sum

    calculate_subtree_sum(root)

    sum_frequency = {}

    # Count the frequency of each subtree sum
    for subtree_sum in subtree_sums:
        if subtree_sum in sum_frequency:
            sum_frequency[subtree_sum] += 1
        else:
            sum_frequency[subtree_sum] = 1

    max_frequency = 0
    most_frequent_sums = []

    # Find the sums with the highest frequency
    for subtree_sum, frequency in sum_frequency.items():
        if frequency > max_frequency:
            max_frequency = frequency

            most_frequent_sums = [subtree_sum]

        elif frequency == max_frequency:
            most_frequent_sums.append(subtree_sum)

    return most_frequent_sums

Big(O) Analysis

Time Complexity
O(n²)Calculating the subtree sum for each of the n nodes in the tree requires traversing the entire subtree rooted at that node. In the worst-case scenario (e.g., a skewed tree), calculating the subtree sum for each node might involve visiting a significant portion of the other nodes. Furthermore, counting the frequency of each sum after calculating all subtree sums potentially requires iterating through the list of n sums and comparing each sum to every other sum. This leads to roughly n * n comparisons, which simplifies to O(n²).
Space Complexity
O(N)The space complexity is dominated by two factors: the storage of subtree sums and the frequency map. In the process of calculating subtree sums, we effectively visit each node once, leading to, in the worst case, storing N subtree sums. Additionally, the frequency map stores each unique subtree sum and its frequency. In the worst case, all subtree sums are unique, leading to a frequency map of size N. Therefore, the overall auxiliary space complexity is O(N), where N is the number of nodes in the tree.

Optimal Solution

Approach

To find the most frequent subtree sum efficiently, we calculate the sum of each subtree only once using a clever trick that builds up from the bottom. We keep track of how often each sum appears and then find the sum that occurred most often.

Here's how the algorithm would work step-by-step:

  1. Start at the very bottom of the tree, at the leaves (the nodes with no children).
  2. For each leaf, its sum is simply its own value. Store this sum and increase its count by one.
  3. Move up to the nodes above the leaves. For each of these nodes, calculate its subtree sum by adding its own value to the subtree sums of its children.
  4. Store this new subtree sum and increase its count by one. If a sum already exists in our records, just increase its count; otherwise, add it to the records with a count of one.
  5. Continue moving upwards, repeating the process of calculating the subtree sum and updating the counts until you reach the root of the entire tree.
  6. Once the entire tree has been processed, find the sum (or sums) that have the highest count in your records. These are the most frequent subtree sums.

Code Implementation

def find_most_frequent_subtree_sum(root):
    subtree_sums = {}
    most_frequent_sums = []
    max_frequency = 0

    def calculate_subtree_sum(node):
        if not node:
            return 0

        left_subtree_sum = calculate_subtree_sum(node.left)
        right_subtree_sum = calculate_subtree_sum(node.right)

        current_subtree_sum = node.val + left_subtree_sum + right_subtree_sum

        # Store and count each subtree sum encountered
        if current_subtree_sum not in subtree_sums:
            subtree_sums[current_subtree_sum] = 0
        subtree_sums[current_subtree_sum] += 1

        return current_subtree_sum

    calculate_subtree_sum(root)

    # Iterate through the sums to find the most frequent ones.
    for subtree_sum, frequency in subtree_sums.items():
        if frequency > max_frequency:

            # New max, reset list
            most_frequent_sums = [subtree_sum]
            max_frequency = frequency
        elif frequency == max_frequency:

            # Another sum with the same max frequency
            most_frequent_sums.append(subtree_sum)

    return most_frequent_sums

Big(O) Analysis

Time Complexity
O(n)The algorithm visits each node in the tree exactly once to calculate its subtree sum, where n is the number of nodes in the tree. Calculating the sum at each node involves a constant number of operations: adding the node's value to the (at most) two subtree sums from its children and updating the count of the subtree sum. Therefore, the time complexity is directly proportional to the number of nodes, resulting in O(n).
Space Complexity
O(N)The algorithm uses a hash map (or dictionary) to store the frequency of each subtree sum. In the worst case, each node in the tree could have a unique subtree sum. Therefore, the hash map could potentially store N different subtree sums, where N is the number of nodes in the tree. Additionally, the recursive calls could add up to a maximum depth of N in a skewed tree. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Null root nodeReturn an empty list if the root is null as there are no subtrees.
Single node treeReturn a list containing the value of the single node as the only subtree sum.
All nodes have the same value (e.g., all 0s or all 1s)The frequency map will have a single entry and return a list with that value.
Large tree causing deep recursion leading to stack overflowEnsure the recursion depth is within limits or convert to an iterative approach using a stack.
Integer overflow in subtree sum calculationUse a larger data type like long to store the subtree sums to prevent overflow.
Tree with only positive node valuesThe sums will be positive, and the frequency map will handle the counting correctly.
Tree with only negative node valuesThe sums will be negative, and the frequency map will handle the counting correctly.
Multiple subtree sums with the same highest frequencyReturn all the subtree sums that have the maximum frequency in any order.