Taro Logo

Build Binary Expression Tree From Infix Expression

Hard
Amazon logo
Amazon
3 views
Topics:
TreesStacks

Let's explore how to build a binary expression tree from an infix expression.

Question:

Design an algorithm and write code to construct a binary expression tree from a given infix expression string. The expression can contain numbers, operators (+, -, ", /), and parentheses. The tree should represent the structure and order of operations defined by the infix expression.

For example, given the infix expression (3 + 5) * 2, your algorithm should produce a binary tree where the root node contains "*", the left child is a subtree representing (3 + 5), and the right child is a node containing "2". The subtree representing (3 + 5) should have "+" as the root, "3" as the left child, and "5" as the right child.

Your solution should handle:

  1. Operator Precedence: Correctly handle operator precedence (e.g., multiplication and division before addition and subtraction).
  2. Parentheses: Properly interpret parentheses to enforce the correct order of operations.
  3. Valid Expressions: Assume the input is a valid infix expression.

Describe the time and space complexity of your solution.

Could you provide a Python code to implement this algorithm? Also, describe any edge cases you might encounter.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What operators will be present in the infix expression, and what is their precedence? Should I assume only +, -, *, and /?
  2. Will the input infix expression always be valid, or do I need to handle cases with mismatched parentheses or invalid operator sequences?
  3. What data type will the operands in the infix expression be (e.g., integers, floats)? What is the range of possible values for these operands?
  4. What should the node structure of the binary expression tree look like? Should it contain the value (operand or operator) and pointers to the left and right children?
  5. If the expression is empty or null, what should I return? Should I return null, or throw an exception?

Brute Force Solution

Approach

The goal is to transform a mathematical expression written in the usual way (infix notation) into a tree structure. A brute-force method involves exploring every possible way to group parts of the expression to figure out how to build the tree. We essentially try different tree structures until we find one that correctly represents the expression.

Here's how the algorithm would work step-by-step:

  1. Consider every possible place where you could put parentheses in the expression, changing the order of operations.
  2. For each arrangement of parentheses, try to build a binary tree. The tree's nodes will be numbers or operators.
  3. Start by picking an operator that could be at the root of the tree, considering all possible operators and their positions in the expression.
  4. Once you've picked a potential root operator, try to build the left and right subtrees. This means figuring out which parts of the expression belong on each side of the operator.
  5. If you can successfully build both left and right subtrees, you have a candidate binary expression tree.
  6. Check if the tree is valid. Make sure that when you evaluate the tree, it gives the same result as the original expression.
  7. Repeat this process for all possible parenthesis arrangements and root operators until you've explored all possible tree configurations.
  8. If you find a valid tree, that's your answer. If you explore all possibilities and don't find a valid tree, the expression might be invalid or you may have an error.

Code Implementation

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def build_expression_tree_brute_force(expression):
    # Helper function to evaluate an expression tree
    def evaluate_tree(node):
        if node is None:
            return 0
        if isinstance(node.value, int):
            return node.value
        left_result = evaluate_tree(node.left)
        right_result = evaluate_tree(node.right)
        if node.value == '+':
            return left_result + right_result
        elif node.value == '-':
            return left_result - right_result
        elif node.value == '*':
            return left_result * right_result
        elif node.value == '/':
            if right_result == 0:
                return float('inf')
            return left_result / right_result
        return 0

    # Helper function to tokenize the expression
    def tokenize(expression):
        tokens = []
        number = ''
        for char in expression:
            if char.isdigit():
                number += char
            elif char in ['+', '-', '*', '/']:
                if number:
                    tokens.append(int(number))
                    number = ''
                tokens.append(char)
            elif char == '(': 
              pass
            elif char == ')':
              pass
            elif char == ' ':
              pass
            else:
                return None
        if number:
            tokens.append(int(number))
        return tokens

    # Helper function to build the tree recursively
    def build_tree(tokens):
        if not tokens:
            return None

        # Iterate through possible root operators and positions
        for i in range(len(tokens)):
            if tokens[i] in ['+', '-', '*', '/']:
                root = Node(tokens[i])

                # Recursively build the left and right subtrees
                left_tokens = tokens[:i]
                right_tokens = tokens[i+1:]
                root.left = build_tree(left_tokens)
                root.right = build_tree(right_tokens)

                # Check if the tree is valid by evaluating it
                if root.left is not None and root.right is not None:
                    return root
        if len(tokens) == 1 and isinstance(tokens[0], int):
          return Node(tokens[0])
        else:
          return None

    tokens = tokenize(expression)
    if tokens is None:
      return None
    return build_tree(tokens)

Big(O) Analysis

Time Complexity
O(Catalan(n))The algorithm considers every possible placement of parentheses in the infix expression, which can be analyzed using Catalan numbers. For an expression of length n, the number of possible binary trees (corresponding to different parenthesis arrangements and operator precedences) grows according to the Catalan number sequence, approximately (4^n)/ (n^(3/2) * sqrt(pi)). For each of these trees, validation might take up to O(n) time (e.g., to evaluate the tree or compare the result with the original expression), the dominant factor for the overall complexity remains the number of possible tree configurations, which is Catalan(n). Thus, the algorithm explores a number of possibilities defined by the Catalan number of n, and validation is dwarfed.
Space Complexity
O(N^2 * M)The algorithm explores all possible parenthesis arrangements which can lead to O(N) arrangements, where N is the length of the infix expression. For each arrangement, building a binary tree and evaluating it involves storing nodes and operators, potentially using O(N) space in the worst-case scenario. The number of potential root operators to consider is related to the length of the expression as well, represented by M. Additionally, there are recursive calls to build the left and right subtrees, which can have a depth proportional to N in the worst case, consuming stack space. The space complexity is therefore roughly O(N * M * N), which simplifies to O(N^2 * M).

Optimal Solution

Approach

To build the expression tree efficiently, we will convert the infix expression to postfix first. The postfix expression makes building the tree simpler since we know operators appear after their operands. Then we use a stack to create the tree from the postfix form.

Here's how the algorithm would work step-by-step:

  1. Use a stack to convert the infix expression to postfix expression, respecting operator precedence (like multiplication before addition). Parentheses guide this conversion.
  2. Now, use a stack to construct the binary expression tree using the postfix expression.
  3. Go through the postfix expression from left to right. If you see a number, push it onto the stack. If you see an operator, create a new node with that operator as its value.
  4. Take the top two nodes from the stack, make them the left and right children of your new operator node, and push the new operator node back onto the stack.
  5. Repeat the previous two steps until you've processed the entire postfix expression.
  6. At the end, the single node remaining on the stack is the root of your binary expression tree.

Code Implementation

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def build_expression_tree(infix_expression):
    def precedence(operator):
        if operator == '+' or operator == '-':
            return 1
        elif operator == '*' or operator == '/':
            return 2
        return 0

    def infix_to_postfix(infix_expression):
        postfix_result = []
        operator_stack = []

        for char in infix_expression:
            if char.isalnum():
                postfix_result.append(char)
            elif char == '(': 
                operator_stack.append(char)
            elif char == ')':
                while operator_stack and operator_stack[-1] != '(': 
                    postfix_result.append(operator_stack.pop())
                operator_stack.pop()
            else:
                # Ensure correct order of operations
                while operator_stack and precedence(char) <= precedence(operator_stack[-1]):
                    postfix_result.append(operator_stack.pop())
                operator_stack.append(char)

        while operator_stack:
            postfix_result.append(operator_stack.pop())

        return ''.join(postfix_result)

    postfix_expression = infix_to_postfix(infix_expression)
    node_stack = []

    for char in postfix_expression:
        if char.isalnum():
            node = TreeNode(char)
            node_stack.append(node)
        else:
            # Operators need two operands
            node = TreeNode(char)
            right_node = node_stack.pop()
            left_node = node_stack.pop()
            node.right = right_node
            node.left = left_node
            node_stack.append(node)

    #The remaining node is the root
    return node_stack[0]

Big(O) Analysis

Time Complexity
O(n)The algorithm involves two main stages. First, converting the infix expression to postfix, which iterates through the n elements of the infix expression once. Second, building the expression tree from the postfix expression, which also involves iterating through the n elements of the postfix expression once, performing constant time operations (stack pushes/pops, node creation) for each element. Therefore, the overall time complexity is dominated by two linear passes over the input, resulting in O(n).
Space Complexity
O(N)The algorithm uses a stack to convert the infix expression to postfix, and in the worst case, this stack can hold all the operators and operands from the input infix expression of size N. Similarly, a stack is used to construct the binary expression tree from the postfix expression, and in the worst case (e.g., a chain of additions or multiplications), the stack can also hold all N nodes before the final tree is constructed. Therefore, the auxiliary space used by the two stacks is proportional to the size of the input expression, N, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn null or throw an IllegalArgumentException since an empty expression cannot form a tree.
Input string with only a single operandCreate a single-node tree with the operand as the root.
Input string with only operators and no operandsReturn null or throw an exception as it is an invalid expression.
Input string with unmatched parenthesesThrow an exception indicating invalid expression syntax because proper nesting is essential.
Input string with consecutive operators without operandsThrow an exception indicating invalid expression syntax since operators need operands.
Very long input string (potential stack overflow with recursion)Consider an iterative approach to avoid stack overflow errors for deeply nested expressions.
Division by zeroThe evaluation of the expression tree should handle this case by returning an error or appropriate value (e.g., infinity) to prevent runtime exceptions.
Input string containing unsupported operators or operandsThrow an exception or return null to indicate an invalid expression with unsupported characters.