Taro Logo

Design Excel Sum Formula

Hard
Airbnb logo
Airbnb
0 views
Topics:
ArraysDynamic ProgrammingGraphs

Your task is to design the basic function of an Excel-like program. You will be given a string array formula of size m x n representing the contents of the Excel sheet. The sheet is an m x n grid where each cell contains either an integer or a formula.

Specifically, you are to implement the Excel class:

  • Excel(int height, char width) Initializes the object with a sheet of the given dimensions.
    • height is a positive integer representing the height of the sheet.
    • width is a character representing the width of the sheet. The sheet is represented as an height x width grid where the first row and column use 1-based indexing and are represented as integers and characters, respectively.
  • void set(int row, char column, int val) Changes the value at row and column to val.
  • int get(int row, char column) Returns the value at row and column. If the cell contains a formula, evaluate the formula and return the result.
  • int sum(int row, char column, string[] numbers) Sets the formula at row and column to be the sum of cells represented by numbers and returns the result.
    • Each string in numbers represents the location of a cell that contributes to the sum.
    • Each string has one of two formats:
      • "ColRow", where Col represents the column number as an uppercase letter and Row represents the row number as a string.
      • "ColRow1:ColRow2", where Col represents the column number as an uppercase letter, Row1 represents the row number as a string, and Row2 represents the end row number as a string. You can assume that Row1 <= Row2.

Example:

Excel excel = new Excel(3, 'C');
// sheet is arranged as such:
//   A1 = 1, B1 = 2, C1 = 3
//   A2 = 4, B2 = 5, C2 = 6
//   A3 = 7, B3 = 8, C3 = 9
excel.set(1, 'A', 2);
// sheet is arranged as such:
//   A1 = 2, B1 = 2, C1 = 3
//   A2 = 4, B2 = 5, C2 = 6
//   A3 = 7, B3 = 8, C3 = 9
excel.sum(3, 'C', new String[] {"A1", "A1:B2"}); // return 10
// C3 = A1 + A1 + B1 + A2 + B2
excel.get(3, 'C'); // return 10
excel.set(2, 'B', 2);
// sheet is arranged as such:
//   A1 = 2, B1 = 2, C1 = 3
//   A2 = 4, B2 = 2, C2 = 6
//   A3 = 7, B3 = 8, C3 = 9
excel.sum(3, 'C', new String[] {"A1", "A1:B2"}); // return 8
// C3 = A1 + A1 + B1 + A2 + B2

Constraints:

  • 1 <= height <= 26
  • 'A' <= width <= 'Z'
  • 1 <= row <= height
  • 'A' <= column <= width
  • -100 <= val <= 100
  • 1 <= numbers.length <= 5
  • 100 <= get(row, column) <= 200

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the dimensions 'rows' and 'cols', and the values that can be stored in the cells? Can they be negative or floating-point numbers?
  2. Are row1, col1, row2, and col2 guaranteed to be within the bounds of the Excel sheet (0 <= row < rows, 0 <= col < cols), and will row1 <= row2 and col1 <= col2 always hold?
  3. How frequently will the 'set' and 'sum' operations be called relative to each other? Is one expected to be significantly more common than the other?
  4. What should happen if I try to set a value with invalid row/col arguments that are out of bounds?
  5. Should I optimize for memory usage, or is minimizing the time complexity of the `set` and `sum` methods the primary concern?

Brute Force Solution

Approach

Imagine each cell in the Excel sheet holds a number. The sum formula tells a cell to calculate its value by adding up the values of other cells. The brute force method tries every possible combination of the cells being referenced.

Here's how the algorithm would work step-by-step:

  1. First, look at the cell containing the sum formula.
  2. Find all the other cells that the formula is referencing.
  3. For each possible combination of values in those referenced cells, calculate what the sum would be.
  4. If any of the referenced cells themselves contain a sum formula, recalculate *their* values for *every* possibility first before using them in the original sum.
  5. Once all combinations have been tested, the final result is the value calculated with the current or most recent state of referenced cell values.

Code Implementation

def design_excel_sum_formula_brute_force(cell_with_formula, referenced_cells):
    # Simulate different states of referenced cells by creating combinations
    number_of_referenced_cells = len(referenced_cells)
    possible_combinations = []
    if number_of_referenced_cells > 0:
        possible_combinations = [ {} ]
        for referenced_cell in referenced_cells:
            possible_combinations[0][referenced_cell] = 1
    else:
        return 0

    final_result = 0

    for combination in possible_combinations:
        # Evaluate sum formulas in referenced cells for current combination
        evaluated_referenced_cells = {}
        for referenced_cell in combination:
            if isinstance(referenced_cells[referenced_cell], str) and referenced_cells[referenced_cell].startswith('='):
                # Recalculate the values of the referenced cells that have sum formulas
                evaluated_referenced_cells[referenced_cell] = evaluate_sum_formula(referenced_cells[referenced_cell], referenced_cells)
            else:
                evaluated_referenced_cells[referenced_cell] = referenced_cells[referenced_cell]

        # Calculate the final value of the cell with the sum formula
        final_result = evaluate_sum_formula(cell_with_formula, referenced_cells, evaluated_referenced_cells)

    return final_result

def evaluate_sum_formula(formula, all_cells, evaluated_referenced_cells = None):
    formula = formula.replace('=', '')
    terms = formula.split('+')
    total = 0

    for term in terms:
        term = term.strip()
        try:
            # Check if term is a number
            total += float(term)
        except ValueError:
            # Or check if the term is a referenced cell to add to the total
            if term in all_cells:
                if evaluated_referenced_cells and term in evaluated_referenced_cells:
                    total += float(evaluated_referenced_cells[term])
                else:
                    total += float(all_cells[term])
            else:
                total += 0

    return total

Big(O) Analysis

Time Complexity
O(k^n) – The brute force approach recalculates the sum formula for every possible combination of values in the referenced cells. Let n be the number of cells referenced by the sum formula, and let k be the number of possible values each referenced cell can take, considering that referenced cells may also contain sum formulas that need recalculating. For each of the n referenced cells, there are k possibilities, resulting in k^n possible combinations to evaluate. Therefore, the time complexity is O(k^n), where k represents the number of possible values any one cell can take on and n is the number of referenced cells.
Space Complexity
O(K^D) – The dominant space complexity arises from the recursion needed to recalculate cell values if they contain sum formulas themselves. Specifically, the algorithm considers all possible combinations of referenced cell values. In the worst case, we have D levels of nested sum formulas, and each sum formula depends on K other cells. Recalculating these nested formulas might require storing intermediate states for each possible combination at each level of recursion, leading to a space complexity of O(K^D), where K is the number of referenced cells in a formula and D is the maximum depth of nested formulas. This space is needed to store the intermediate sums calculated in the recursion stack and any temporary data structures to hold those intermediate values.

Optimal Solution

Approach

To efficiently calculate the sum based on a formula, we need to track dependencies between cells. The optimal approach involves building a dependency graph and then using it to compute cell values in the correct order, avoiding redundant calculations.

Here's how the algorithm would work step-by-step:

  1. First, understand the user-provided formula. Identify which cells depend on others (e.g., A1 = B1 + C1 means A1 depends on B1 and C1).
  2. Build a diagram or map showing all the dependencies. This 'dependency graph' helps visualize the order in which calculations must occur.
  3. Figure out a valid order to calculate the cell values. We must calculate the values of cells that other cells depend on first.
  4. Go through the cells in the calculated order, computing each cell's value based on its formula and the already computed values of the cells it depends on.
  5. If a cell's formula refers to a cell that hasn't been calculated yet, this indicates a circular dependency, and you should flag it as an error.
  6. If a cell's value has changed, update it and then re-calculate the values of any cells that depend on it. Make sure to do this in the proper order, as determined by the dependency graph.
  7. By following this careful, order-based approach, we can avoid wasteful computation and ensure accurate calculation of all cell values based on the formulas.

Code Implementation

class Excel:
    def __init__(self, height, width):
        self.height = height
        self.width = width
        self.cells = [[''] * width for _ in range(height)]
        self.values = [[0] * width for _ in range(height)]
        self.dependency_graph = [[set() for _ in range(width)] for _ in range(height)]

    def set(self, row, column, value):
        self.cells[row][column] = str(value)
        self.dependency_graph = [[set() for _ in range(self.width)] for _ in range(self.height)]
        self.calculate_cell(row, column)

    def get(self, row, column):
        return self.values[row][column]

    def sum(self, row, column, formula):
        self.cells[row][column] = formula
        self.dependency_graph = [[set() for _ in range(self.width)] for _ in range(self.height)]
        self.calculate_cell(row, column)

    def calculate_cell(self, row, column):
        formula = self.cells[row][column]
        if formula == '':
            self.values[row][column] = 0
            return

        if formula.isdigit():
            self.values[row][column] = int(formula)
            return

        self.values[row][column] = self.evaluate_formula(row, column, formula)

        #After updating value, recalculate dependants
        for dependant_row in range(self.height):
            for dependant_column in range(self.width):
                if (row, column) in self.dependency_graph[dependant_row][dependant_column]:
                    self.calculate_cell(dependant_row, dependant_column)

    def evaluate_formula(self, row, column, formula):
        total = 0
        parts = formula.split('+')
        for part in parts:
            if ':' in part:
                start, end = part.split(':')
                start_row = ord(start[0]) - ord('A')
                start_col = int(start[1:]) - 1
                end_row = ord(end[0]) - ord('A')
                end_col = int(end[1:]) - 1

                # Add cell to dependency graph before calculating
                for i in range(start_row, end_row + 1):
                    for j in range(start_col, end_col + 1):
                        self.dependency_graph[row][column].add((i, j))
                        total += self.values[i][j]

            else:
                cell_row = ord(part[0]) - ord('A')
                cell_col = int(part[1:]) - 1
                
                # Add cell to dependency graph before calculating
                self.dependency_graph[row][column].add((cell_row, cell_col))
                total += self.values[cell_row][cell_col]

        return total

Big(O) Analysis

Time Complexity
O(n*m) – Let n be the number of cells in the spreadsheet and m be the maximum length of a formula. Building the dependency graph requires iterating through each cell (n) and parsing its formula, which can take O(m) time in the worst case (e.g., a long formula string). The topological sort (to determine calculation order) can be done in O(n+e) where e is the number of edges representing dependencies. In the worst case, e could be O(n*m) if each cell depends on many other cells via complex formulas. Updating and recalculating dependent cells in the correct order takes O(n) in the best case if there are no circular dependencies, or up to O(n*m) if cells are all related. Thus, the overall time complexity is dominated by O(n*m) due to formula parsing, dependency graph construction and re-calculation.
Space Complexity
O(N + E) – The dependency graph, which maps cells to their dependencies, requires space proportional to the number of cells (N) and the number of dependencies (E, where E represents the edges in the dependency graph). Storing the calculated cell values requires space proportional to the number of cells (N). The topological sort or similar algorithm to determine the calculation order also requires space for tracking visited nodes, potentially O(N) in the worst case, where N represents the number of cells. Therefore, the overall auxiliary space complexity is O(N + E).

Edge Cases

CaseHow to Handle
rows or cols are zeroReturn 0 for sum, or handle gracefully in set by potentially not creating the grid
row1 > row2 or col1 > col2Swap row1 and row2 if row1 > row2, and col1 and col2 if col1 > col2 to ensure valid rectangle definition
row or col indices are out of bounds (negative or exceed dimensions)Return 0 for sum when accessing out of bounds or raise an exception, and ignore set if out of bounds
Large dimensions (rows and cols) leading to high memory usageConsider using sparse matrix or lazy evaluation techniques if memory becomes a bottleneck
Integer overflow when calculating the sumUse a data type with a larger range (e.g., long) for the sum to avoid overflow
Negative values in the gridThe sum calculation should correctly handle negative values by adding them with appropriate signs
row1 equals row2 and col1 equals col2 (single cell region)Return the value of that single cell for the sum
Extreme values in cells that, when summed, cause integer overflow even with a larger datatypeImplement overflow detection and handle accordingly (e.g., throw exception or cap the sum)