Taro Logo

Design Excel Sum Formula

Hard
Amazon logo
Amazon
2 views
Topics:
Dynamic ProgrammingGraphs

Design an Excel-like application that supports setting cell values and calculating sums based on formulas. Implement the following functions:

  1. Excel(int height, char width): Initializes the Excel sheet with a given height and width. The width is represented by a character (e.g., 'C' for a width of 3).
  2. void set(int row, char column, int val): Sets the value of the cell at the given row and column to the specified value. Overwrites any existing formula.
  3. int get(int row, char column): Returns the value of the cell at the given row and column. If the cell contains a formula, the formula should be evaluated and the result returned. If the cell is empty, return 0.
  4. int sum(int row, char column, String[] formula): Sets the formula for the cell at the given row and column. The formula is an array of strings, where each string represents a cell or a range of cells to sum. For example, ["A1", "B2", "C3"] means the cell's value is the sum of cells A1, B2, and C3. The formula can also contain ranges like ["A1:A3", "B1:B3"], which means the cell's value is the sum of the cells in the ranges A1 to A3 and B1 to B3.

Example:

Excel excel = new Excel(3, 'C');
// sheet:
//   A   B   C
// 1 0   0   0
// 2 0   0   0
// 3 0   0   0

excel.set(1, 'A', 2);
// sheet:
//   A   B   C
// 1 2   0   0
// 2 0   0   0
// 3 0   0   0

excel.sum(3, 'C', ["A1", "A1:B2"]); //sets C3 to sum(A1, A1, B1, A2, B2)
// sheet:
//   A   B   C
// 1 2   0   0
// 2 0   0   0
// 3 0   0   2  <-- A1 + A1 + B1 + A2 + B2 = 2+2+0+0+0 = 4

excel.get(3, 'C'); // returns 2

excel.set(2, 'B', 2);
// sheet:
//   A   B   C
// 1 2   0   0
// 2 0   2   0
// 3 0   0   4 <-- A1 + A1 + B1 + A2 + B2 = 2+2+0+0+2 = 6

Implement the class efficiently, considering potential dependencies between cells and avoiding redundant calculations. How would you handle circular dependencies?

Solution


LeetCode Problem 631: Design Excel Sum Formula

Naive (Brute Force) Solution

A naive approach would involve storing the value of each cell and the formula associated with it. When a cell's value is requested, we would evaluate its formula recursively. This means that if a cell depends on other cells, we would have to calculate their values first, and so on. This approach is straightforward but can be inefficient due to redundant calculations.

Example:

Consider a 3x3 grid. Cell A1 has a value of 5, Cell A2 has a formula "=A1+3", and Cell A3 has a formula "=A1+A2". To get the value of A3, we would first need to get the value of A1 (which is 5) and A2 (which is 5+3 = 8). Then, A3 would be 5 + 8 = 13.

Code (Conceptual):

class Excel:
    def __init__(self, height: int, width: str):
        self.grid = [[None] * self.width_int(width) for _ in range(height)]
        self.height = height
        self.width = width

    def width_int(self, width: str) -> int:
        res = 0
        for c in width:
            res = res * 26 + (ord(c) - ord('A') + 1)
        return res

    def set(self, row: int, column: str, val: int) -> None:
        self.grid[row - 1][self.width_int(column) - 1] = val

    def get(self, row: int, column: str) -> int:
        cell = self.grid[row - 1][self.width_int(column) - 1]
        if cell is None:
            return 0
        elif isinstance(cell, int):
            return cell
        else:
            # Recursively evaluate the formula
            return self.evaluate(cell)

    def sum(self, row: int, column: str, formulas: List[str]) -> None:
        self.grid[row - 1][self.width_int(column) - 1] = formulas

    def evaluate(self, formula):
        # Parses the formula and calculates the sum
        # (Implementation omitted for brevity)
        pass

Big O Analysis (Naive):

  • Time Complexity: O(number of cells in the formula), but in worst case the formula evaluation can cascade and cause O(N^2) for N cells.
  • Space Complexity: O(depth of recursion) in the worst case, which can be O(N).

Optimal Solution

The optimal solution involves using a directed acyclic graph (DAG) to represent the dependencies between cells. Each cell is a node in the graph, and there is an edge from cell A to cell B if cell B's formula depends on cell A. We also use memoization to store the calculated value of each cell, avoiding redundant calculations. When a cell's value is updated, we need to update the values of all cells that depend on it, which can be done using a depth-first search (DFS) or topological sort.

Algorithm:

  1. Data Structures:

    • values: A 2D array to store the calculated values of each cell.
    • formulas: A 2D array to store the formula for each cell. If a cell has a formula, it's stored here; otherwise, it's None.
    • dependencies: A dictionary to store the dependencies between cells. The key is a cell (row, col), and the value is a list of cells that depend on it.
  2. set(row, column, val):

    • Update the values array with the new value.
    • Remove any existing formula for the cell.
    • Update cells that depend on this cell. Use a DFS or topological sort.
  3. get(row, column):

    • Return the value stored in the values array.
  4. sum(row, column, formula):

    • Parse the formula to identify the dependent cells.
    • Store the formula in the formulas array.
    • Update the dependencies in the dependencies dictionary.
    • Calculate and store the value of the cell in values.
    • Update cells that depend on this cell. Use a DFS or topological sort.

Code (Conceptual):

class Excel:
    def __init__(self, height: int, width: str):
        self.height = height
        self.width = self.width_int(width)
        self.values = [[0] * self.width for _ in range(height)]
        self.formulas = [[None] * self.width for _ in range(height)]
        self.dependencies = defaultdict(list)

    def width_int(self, width: str) -> int:
        res = 0
        for c in width:
            res = res * 26 + (ord(c) - ord('A') + 1)
        return res

    def set(self, row: int, column: str, val: int) -> None:
        row_idx = row - 1
        col_idx = self.width_int(column) - 1
        self.values[row_idx][col_idx] = val
        self.formulas[row_idx][col_idx] = None
        self.update_dependents(row_idx, col_idx)

    def get(self, row: int, column: str) -> int:
        row_idx = row - 1
        col_idx = self.width_int(column) - 1
        return self.values[row_idx][col_idx]

    def sum(self, row: int, column: str, terms: List[str]) -> None:
        row_idx = row - 1
        col_idx = self.width_int(column) - 1
        self.formulas[row_idx][col_idx] = terms
        self.values[row_idx][col_idx] = self.evaluate_sum(terms)
        self.update_dependencies(row_idx, col_idx, terms)
        self.update_dependents(row_idx, col_idx)

    def evaluate_sum(self, terms: List[str]) -> int:
        total = 0
        for term in terms:
            if ':' in term:
                start, end = term.split(':')
                start_row, start_col = self.parse_cell(start)
                end_row, end_col = self.parse_cell(end)
                for r in range(start_row, end_row + 1):
                    for c in range(start_col, end_col + 1):
                        total += self.values[r][c]
            else:
                row, col = self.parse_cell(term)
                total += self.values[row][col]
        return total

    def parse_cell(self, cell: str) -> Tuple[int, int]:
        col = ''
        row = ''
        for char in cell:
            if char.isalpha():
                col += char
            else:
                row += char
        return int(row) - 1, self.width_int(col) - 1

    def update_dependencies(self, row_idx: int, col_idx: int, terms: List[str]) -> None:
        for term in terms:
            if ':' in term:
                start, end = term.split(':')
                start_row, start_col = self.parse_cell(start)
                end_row, end_col = self.parse_cell(end)
                for r in range(start_row, end_row + 1):
                    for c in range(start_col, end_col + 1):
                         self.dependencies[(r, c)].append((row_idx, col_idx))
            else:
                r, c = self.parse_cell(term)
                self.dependencies[(r, c)].append((row_idx, col_idx))

    def update_dependents(self, row_idx: int, col_idx: int) -> None:
        for dependent_row, dependent_col in self.dependencies.get((row_idx, col_idx), []):
            if self.formulas[dependent_row][dependent_col]:
                self.values[dependent_row][dependent_col] = self.evaluate_sum(self.formulas[dependent_row][dependent_col])
            self.update_dependents(dependent_row, dependent_col)

Big O Analysis (Optimal):

  • Time Complexity:
    • set(): O(D), where D is the number of cells that depend on the updated cell. In the worst case, D can be all the cells, so O(H * W).
    • get(): O(1).
    • sum(): O(N + D), where N is the number of terms in the formula and D is the number of dependent cells.
  • Space Complexity: O(H * W + E), where H is the height, W is the width, and E is the number of dependencies.

Edge Cases:

  • Circular Dependencies: The code should handle circular dependencies gracefully, possibly by detecting them and throwing an error or returning a default value (e.g., 0).
  • Invalid Formula: The code should validate the formula format and handle invalid formulas appropriately.
  • Empty Grid: The code should handle cases where the grid is empty (0 height or 0 width).
  • Large Grid: The code should be optimized to handle large grids efficiently, considering memory usage and calculation time.
  • Invalid Cell References: Handling out-of-bounds cell references is also important.