Taro Logo

Evaluate Division

Medium
Rippling logo
Rippling
1 view
Topics:
Graphs

You are given an array of variable pairs equations and an array of real numbers values, where equations[i] = [Ai, Bi] and values[i] represent the equation Ai / Bi = values[i]. Each Ai or Bi is a string that represents a single variable.

You are also given some queries, where queries[j] = [Cj, Dj] represents the jth query where you must find the answer for Cj / Dj = ?.

Return the answers to all queries. If a single answer cannot be determined, return -1.0.

Note: The input is always valid. You may assume that evaluating the queries will not result in division by zero and that there is no contradiction.

Note: The variables that do not occur in the list of equations are undefined, so the answer cannot be determined for them.

Example 1:

Input: equations = [["a","b"],["b","c"]], values = [2.0,3.0], queries = [["a","c"],["b","a"],["a","e"],["a","a"],["x","x"]]
Output: [6.00000,0.50000,-1.00000,1.00000,-1.00000]
Explanation: 
Given: a / b = 2.0, b / c = 3.0
queries are: a / c = ?, b / a = ?, a / e = ?, a / a = ?, x / x = ? 
return: [6.0, 0.5, -1.0, 1.0, -1.0 ]
note: x is undefined => -1.0

Example 2:

Input: equations = [["a","b"],["b","c"],["bc","cd"]], values = [1.5,2.5,5.0], queries = [["a","c"],["c","b"],["bc","cd"],["cd","bc"]]
Output: [3.75000,0.40000,5.00000,0.20000]

Example 3:

Input: equations = [["a","b"]], values = [0.5], queries = [["a","b"],["b","a"],["a","c"],["x","y"]]
Output: [0.50000,2.00000,-1.00000,-1.00000]

Constraints:

  • 1 <= equations.length <= 20
  • equations[i].length == 2
  • 1 <= Ai.length, Bi.length <= 5
  • values.length == equations.length
  • 0.0 < values[i] <= 20.0
  • 1 <= queries.length <= 20
  • queries[i].length == 2
  • 1 <= Cj.length, Dj.length <= 5
  • Ai, Bi, Cj, Dj consist of lower case English letters and digits.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected behavior if a query involves a variable not present in any of the equations?
  2. Can the divisor in any equation be zero? If so, how should that be handled?
  3. Are the input values (e.g., the values in `values`) guaranteed to be non-negative?
  4. Can the same equation (e.g., A / B) appear multiple times with potentially different values? If so, which value should be considered?
  5. What is the maximum number of equations and queries that I should expect? I want to understand the scale of the input to consider potential performance implications.

Brute Force Solution

Approach

We're given equations like a/b = 2.0 and we need to figure out the value of other divisions, like c/a. The brute force method involves trying to find a path between the two variables in the question, by exploring all possible combinations of equations we're given.

Here's how the algorithm would work step-by-step:

  1. For each question we're asked (e.g., what is c/a?), we start from the first variable (e.g., c).
  2. We look through all the given equations to see if any equation directly involves 'c'.
  3. If we find one, say c/d = 3.0, we remember this connection.
  4. Now we look for equations involving 'd'. If we find d/a = 4.0, we know c/a = (c/d) * (d/a) = 3.0 * 4.0 = 12.0.
  5. If we don't directly find 'a' from 'c', we explore other variables connected to 'c' through the equations and keep going until we either find a path to 'a' or exhaust all possibilities from c.
  6. If we can't find a path from the first variable to the second using the given equations, the answer is 'cannot be determined' (usually represented as -1.0).
  7. Repeat this process for every question that we are asked.

Code Implementation

def evaluate_division_brute_force(equations, values, queries):
    graph = {}
    for (numerator, denominator), value in zip(equations, values):
        if numerator not in graph:
            graph[numerator] = []
        if denominator not in graph:
            graph[denominator] = []
        graph[numerator].append((denominator, value))
        graph[denominator].append((numerator, 1.0 / value))

    def find_path(start_node, end_node, visited):
        # If we are looking for a node that does not exist.
        if start_node not in graph or end_node not in graph:
            return -1.0

        if start_node == end_node:
            return 1.0

        visited.add(start_node)
        for neighbor, value in graph[start_node]:
            if neighbor not in visited:
                result = find_path(neighbor, end_node, visited)
                # If the path exists, return its value
                if result != -1.0:
                    return value * result

        return -1.0

    results = []
    for numerator, denominator in queries:
        # Use a set to keep track of visited nodes to prevent loops
        visited_nodes = set()
        result = find_path(numerator, denominator, visited_nodes)
        results.append(result)

    return results

Big(O) Analysis

Time Complexity
O(V * (E + V)) – For each query, we perform a Depth-First Search (DFS) or Breadth-First Search (BFS) on the graph formed by the equations. In the worst case, we might visit all vertices (variables, V) and edges (equations, E) in the graph to find a path. For each vertex, we potentially explore all adjacent vertices (again up to V in a dense graph). Thus, a single query takes O(E + V) time to explore the graph. Since we have V queries in the worst case (related to the input equations), the total time complexity is O(V * (E + V)).
Space Complexity
O(N) – The space complexity is dominated by the graph representation implicitly built when processing the equations. In the worst-case scenario, each unique variable in the equations forms a node in the graph, and the equations define the edges. The algorithm explores the graph using depth-first search, implicitly using a call stack that can grow up to the number of unique variables (nodes) in the graph, representing the maximum depth of the graph. Also, a set to keep track of visited nodes is required during the DFS traversal and can store up to N variables. Therefore, the auxiliary space complexity is O(N), where N is the number of unique variables in the equations.

Optimal Solution

Approach

This problem is about figuring out relationships between things when you're only given some direct connections. We represent the connections as a kind of map and then efficiently explore that map to answer questions about how different things relate to each other.

Here's how the algorithm would work step-by-step:

  1. Think of each unique thing (like 'a', 'b', 'c') as a location on a map.
  2. For each direct relationship you're given (like 'a / b = 2.0'), create a route between the corresponding locations on your map, and label that route with the value (2.0).
  3. Also create a route in the opposite direction (from 'b' to 'a'), and label it with the inverse of the value (1.0 / 2.0 = 0.5).
  4. Now, when you're asked to find the relationship between two things (like 'a / c'), start at the first location ('a') on your map.
  5. Explore all possible routes from that location until you reach the second location ('c').
  6. If you find a route, multiply the values along that route to get the relationship between the two things.
  7. If you cannot find any route at all return -1.0 meaning there is no know relationship.
  8. To be efficient, keep track of the places you've already visited on the map so you don't go around in circles and waste time.

Code Implementation

def evaluate_division(equations, values, queries):
    graph = {}
    
    # Build the graph representing the relationships.
    for (numerator, denominator), value in zip(equations, values):
        if numerator not in graph:
            graph[numerator] = {}
        if denominator not in graph:
            graph[denominator] = {}
        graph[numerator][denominator] = value
        graph[denominator][numerator] = 1.0 / value

    def dfs(start_node, end_node, visited):
        if start_node not in graph or end_node not in graph:
            return -1.0
        if start_node == end_node:
            return 1.0

        visited.add(start_node)
        
        # Explore paths from start_node to find end_node.
        for neighbor, value in graph[start_node].items():
            if neighbor not in visited:
                result = dfs(neighbor, end_node, visited)
                if result != -1.0:
                    return value * result
        return -1.0

    results = []
    for numerator, denominator in queries:
        # Use DFS to find the path and calculate the result.
        visited_nodes = set()
        result = dfs(numerator, denominator, visited_nodes)
        results.append(result)

    return results

Big(O) Analysis

Time Complexity
O(Q * E) – The time complexity is driven by answering Q queries, where each query involves a Depth-First Search (DFS) or Breadth-First Search (BFS) on a graph. In the worst case, the graph can be densely connected, requiring us to explore nearly all edges (E) to determine if a path exists between the start and end nodes for each query. Thus, each query could take O(E) time in the worst case, where E represents the number of edges in the constructed graph. With Q queries, the overall time complexity becomes O(Q * E).
Space Complexity
O(N) – The primary space complexity comes from the graph representation, which, in the worst case, stores relationships between all unique variables. Let N be the number of unique variables (e.g., 'a', 'b', 'c'...). The algorithm also keeps track of visited locations during the search process, typically using a set or similar data structure, which can grow up to the size of N. Therefore, the auxiliary space used is proportional to the number of unique variables, which is O(N).

Edge Cases

CaseHow to Handle
Empty equations listReturn a list of -1.0 for each query, as no relationships are defined.
Empty queries listReturn an empty list, as there are no queries to process.
Equation with division by zero (e.g., A / B = 0)Treat the result of the equation as 0.0 if the value is explicity given or return -1.0 if encountered during evaluation.
Query with undefined variables (variables not in equations)Return -1.0 for such queries, as no path can be established.
Query with a variable dividing itself (e.g., A / A)Return 1.0 for such queries if the variable exists in the equations.
Cyclic equations leading to infinite recursion during evaluationUse a visited set during graph traversal (DFS or BFS) to prevent revisiting nodes and infinite loops.
Very large number of equations and queries, leading to potential memory issuesConsider optimizing graph representation (e.g., adjacency list instead of matrix) and using iterative BFS instead of recursive DFS to reduce memory footprint.
Floating-point precision issues leading to incorrect comparisonsUse a small tolerance (epsilon) when comparing floating-point numbers for equality (e.g., Math.abs(result - expected) < epsilon).