Taro Logo

Check If Two Expression Trees are Equivalent

Medium
Asked by:
Profile picture
8 views
Topics:
TreesRecursion

A binary expression tree is a kind of binary tree used to represent arithmetic expressions. Each node of a binary expression tree has either 0 or 2 children. Leaf nodes (0 children) correspond to variables (e.g., x) or constants (e.g., 3), and internal nodes (2 children) correspond to the operators (e.g., +, -, *, and /).

For this problem, assume that the binary expression trees will be univariate, meaning that every variable will be the same.

You are given two binary expression trees, represented by their root nodes: root1 and root2. Determine whether the two binary expression trees are equivalent. That is, return true if both binary expression trees represent the same expression. Otherwise, return false.

Example 1:

Input: root1 = [x], root2 = [x]
Output: true

Example 2:

Input: root1 = [+,x,3], root2 = [x,+,3]
Output: true

Example 3:

Input: root1 = [+,*,x,7,3], root2 = [*,+,x,3,7]
Output: false

Constraints:

  • The number of nodes in each tree is in the range [1, 5000].
  • Each node has either 0 or 2 children.
  • Leaf nodes have a value that is either x or a non-negative integer.
  • Non-leaf nodes have a value that is one of the four arithmetic operators: "+", "-", "*", and "/".
  • All subtrees representing constants evaluate to an integer.
  • Nodes that represent a variable x do not have their value in the set {"+", "-", "*", "/"}.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What data types will the nodes of the expression trees hold (e.g., integers, characters representing variables, operators)?
  2. Can the expression trees be empty or contain only a single node? If so, what should be returned?
  3. Are the expression trees guaranteed to be valid (e.g., properly formed with operators having the correct number of operands), or should I handle potentially invalid trees?
  4. When comparing expressions, should I consider algebraic simplifications (e.g., is `a + b` equivalent to `b + a`, or `2 * a` equivalent to `a + a`)?
  5. What operators will be present in the expression trees (e.g., `+`, `-`, `*`, `/`, exponents, boolean operators)? Are there any precedence rules I should be aware of?

Brute Force Solution

Approach

We want to see if two math formulas, represented as trees, are the same. The brute force way is like figuring out what the formula actually calculates for many different numbers, and seeing if they always give the same result.

Here's how the algorithm would work step-by-step:

  1. Pick a number for each variable (like 'x' or 'y') that appears in the formulas.
  2. Plug those numbers into the first formula tree, and calculate the final result.
  3. Now, plug the exact same numbers into the second formula tree, and calculate its result.
  4. Compare the two results. If they are different, the trees are definitely NOT equivalent.
  5. Repeat this process many, many times with different sets of numbers.
  6. If the results are ALWAYS the same after trying many different numbers, we can guess that the trees are probably equivalent. Otherwise they are not.

Code Implementation

import random

class TreeNode:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def evaluate_tree(node, variable_mapping):
    if isinstance(node.value, (int, float)): 
        return float(node.value)
    if node.value in variable_mapping:
        return float(variable_mapping[node.value])
    if node.value == '+':
        return evaluate_tree(node.left, variable_mapping) + evaluate_tree(node.right, variable_mapping)
    if node.value == '-':
        return evaluate_tree(node.left, variable_mapping) - evaluate_tree(node.right, variable_mapping)
    if node.value == '*':
        return evaluate_tree(node.left, variable_mapping) * evaluate_tree(node.right, variable_mapping)
    if node.value == '/':
        left_value = evaluate_tree(node.left, variable_mapping)
        right_value = evaluate_tree(node.right, variable_mapping)
        return left_value / right_value
    return 0.0

def check_if_two_expression_trees_are_equivalent(tree_one, tree_two, number_of_trials=100):
    variables = set()
    def collect_variables(node):
        if not isinstance(node.value, (int, float)) and not node.value in ['+', '-', '*', '/']:
            variables.add(node.value)
        if node.left:
            collect_variables(node.left)
        if node.right:
            collect_variables(node.right)
    collect_variables(tree_one)
    collect_variables(tree_two)

    for _ in range(number_of_trials):
        variable_mapping = {}
        for variable in variables:
            variable_mapping[variable] = random.uniform(-10, 10)

        #Evaluate both trees with the same random values.
        tree_one_result = evaluate_tree(tree_one, variable_mapping)

        tree_two_result = evaluate_tree(tree_two, variable_mapping)
        
        #If the results differ, trees are not equivalent.
        if abs(tree_one_result - tree_two_result) > 1e-6:
            return False

    #If all trials produce the same result, trees are likely equivalent.
    return True

Big(O) Analysis

Time Complexity
O(k)The provided solution involves evaluating two expression trees 'k' times with randomly generated inputs. Each evaluation traverses the tree (which we're not given specifics about the size of), but the dominant factor is the number of iterations 'k'. The complexity depends primarily on the number of iterations we perform, and each iteration can be considered to take a constant or near-constant time relative to k. Therefore, the overall time complexity is O(k), where k is the number of test iterations performed.
Space Complexity
O(1)The provided approach involves repeatedly plugging in numbers and evaluating the expression trees. The only auxiliary space used is to store the results of evaluating the two trees for each set of input values. Since we only need to store two numerical values (the result from the first tree and the result from the second tree), the space required is constant, regardless of the size or structure of the input expression trees (denoted as N). Thus, the space complexity is O(1).

Optimal Solution

Approach

The best way to check if two expression trees are equivalent is to simplify them as much as possible and then directly compare the simplified results. This avoids the need to explore all possible evaluation orders or tree structures. By focusing on simplification, we reduce the problem to a basic comparison.

Here's how the algorithm would work step-by-step:

  1. First, we need a way to make each expression tree simpler. We do this by applying simplification rules to the tree structure.
  2. A key simplification rule is to evaluate any parts of the tree that are just numbers connected by simple math like addition or multiplication. So, if a node says 'add 2 and 3', we change it to just '5'.
  3. Another simplification rule involves recognizing mathematical identities. For example, 'x + 0' is always 'x', and 'x * 1' is always 'x'. We can rewrite the tree to remove those operations.
  4. Keep simplifying each expression tree until you can't simplify it any further.
  5. After simplification, compare the two resulting expressions. If the expressions are now identical, the original trees were equivalent.

Code Implementation

class Node:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def are_trees_equivalent(tree1, tree2):
    simplified_tree1 = simplify_tree(tree1)
    simplified_tree2 = simplify_tree(tree2)
    
    return are_trees_identical(simplified_tree1, simplified_tree2)

def simplify_tree(node):
    if node is None:
        return None

    node.left = simplify_tree(node.left)
    node.right = simplify_tree(node.right)

    # Evaluate constant expressions
    if node.left and node.right:
        if isinstance(node.left.value, (int, float)) and \
           isinstance(node.right.value, (int, float)):

            if node.value == '+':
                return Node(node.left.value + node.right.value)
            elif node.value == '-':
                return Node(node.left.value - node.right.value)
            elif node.value == '*':
                return Node(node.left.value * node.right.value)
            elif node.value == '/':
                if node.right.value != 0:
                    return Node(node.left.value / node.right.value)
                else:
                    return node # Avoid division by zero

    # Apply identity rules like x + 0 = x and x * 1 = x
    if node.value == '+' and node.right and node.right.value == 0:
        return node.left
    if node.value == '+' and node.left and node.left.value == 0:
        return node.right
    if node.value == '*' and node.right and node.right.value == 1:
        return node.left
    if node.value == '*' and node.left and node.left.value == 1:
        return node.right

    return node

def are_trees_identical(tree1, tree2):
    if tree1 is None and tree2 is None:
        return True
    if tree1 is None or tree2 is None:
        return False

    if tree1.value != tree2.value:
        return False

    # Recursively check left and right subtrees.
    return are_trees_identical(tree1.left, tree2.left) and \
           are_trees_identical(tree1.right, tree2.right)

Big(O) Analysis

Time Complexity
O(n)Simplification traverses each tree in potentially constant time per node, but since we have 'n' nodes across both trees (assuming 'n' represents the total number of nodes in both trees), this traversal takes O(n) time. The simplification rules are assumed to take constant time per node. The final comparison of the simplified expressions also takes O(n) time in the worst case if the simplified expressions are large or complex. Therefore, the overall time complexity is O(n).
Space Complexity
O(H)The dominant space complexity comes from the recursion depth during the simplification process. The simplification rules described in the plain English explanation involve traversing and potentially modifying the expression trees. In the worst-case scenario, where the tree is highly unbalanced, the recursion depth could reach the height (H) of the tree. Thus, the auxiliary space used by the recursion stack is proportional to the tree's height. Simplifying to Big O notation, this results in O(H) space complexity.

Edge Cases

Both trees are null
How to Handle:
Return true, as two empty trees are considered equivalent.
One tree is null, the other is not
How to Handle:
Return false, as one tree has a structure and the other does not.
Trees have different structures (different arrangement of operators and operands)
How to Handle:
The recursive comparison should identify structural differences and return false.
Trees have identical structures but different operand values
How to Handle:
The recursive comparison should identify value differences and return false.
Trees have very large depths, potentially causing stack overflow with naive recursion
How to Handle:
Consider iterative deepening or tail recursion optimization to prevent stack overflow.
Operands are floating point numbers with potential precision issues.
How to Handle:
Compare floating point numbers with a tolerance (epsilon) instead of direct equality.
Tree contains only a single node (a constant)
How to Handle:
The base case of the recursion must correctly handle single-node trees.
Integer overflow during evaluation of subtrees if evaluation is part of comparison
How to Handle:
Use a larger data type or modulo arithmetic during evaluation to avoid overflow issues, or perform purely structural comparisons.