Taro Logo

Design Spreadsheet

Medium
Rippling logo
Rippling
0 views
Topics:
ArraysStrings

A spreadsheet is a grid with 26 columns (labeled from 'A' to 'Z') and a given number of rows. Each cell in the spreadsheet can hold an integer value between 0 and 105.

Implement the Spreadsheet class:

  • Spreadsheet(int rows) Initializes a spreadsheet with 26 columns (labeled 'A' to 'Z') and the specified number of rows. All cells are initially set to 0.
  • void setCell(String cell, int value) Sets the value of the specified cell. The cell reference is provided in the format "AX" (e.g., "A1", "B10"), where the letter represents the column (from 'A' to 'Z') and the number represents a 1-indexed row.
  • void resetCell(String cell) Resets the specified cell to 0.
  • int getValue(String formula) Evaluates a formula of the form "=X+Y", where X and Y are either cell references or non-negative integers, and returns the computed sum.

Note: If getValue references a cell that has not been explicitly set using setCell, its value is considered 0.

Example 1:

Input:
["Spreadsheet", "getValue", "setCell", "getValue", "setCell", "getValue", "resetCell", "getValue"]
[[3], ["=5+7"], ["A1", 10], ["=A1+6"], ["B2", 15], ["=A1+B2"], ["A1"], ["=A1+B2"]]

Output:
[null, 12, null, 16, null, 25, null, 15]

Explanation

Spreadsheet spreadsheet = new Spreadsheet(3); // Initializes a spreadsheet with 3 rows and 26 columns
spreadsheet.getValue("=5+7"); // returns 12 (5+7)
spreadsheet.setCell("A1", 10); // sets A1 to 10
spreadsheet.getValue("=A1+6"); // returns 16 (10+6)
spreadsheet.setCell("B2", 15); // sets B2 to 15
spreadsheet.getValue("=A1+B2"); // returns 25 (10+15)
spreadsheet.resetCell("A1"); // resets A1 to 0
spreadsheet.getValue("=A1+B2"); // returns 15 (0+15)

Constraints:

  • 1 <= rows <= 103
  • 0 <= value <= 105
  • The formula is always in the format "=X+Y", where X and Y are either valid cell references or non-negative integers with values less than or equal to 105.
  • Each cell reference consists of a capital letter from 'A' to 'Z' followed by a row number between 1 and rows.
  • At most 104 calls will be made in total to setCell, resetCell, and getValue.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the data types of the values that can be stored in a cell (e.g., integers, floats, strings, or a combination)?
  2. What is the format for formulas? (e.g., Are they strings? What operators are supported? How are cell dependencies represented like 'A1+B2'?)
  3. What happens if there is a circular dependency in the formulas (e.g., A1 depends on B1 and B1 depends on A1)? Should I throw an error or handle it in a specific way?
  4. What are the maximum values for rows and cols, and how should I handle invalid row/col indices?
  5. Do I need to handle potential errors during formula evaluation (e.g., division by zero, invalid formula syntax)? How should I handle these?

Brute Force Solution

Approach

The brute force method for a spreadsheet involves directly calculating the value of each cell whenever its value is needed. We determine a cell's value by figuring out the values of any other cells it depends on. We repeat this process for every single cell in the spreadsheet, every time we need a value.

Here's how the algorithm would work step-by-step:

  1. Whenever you need the value of a particular cell, check if it directly contains a number or a formula.
  2. If it contains a number, you already have the cell's value.
  3. If it contains a formula, identify all the other cell references within that formula.
  4. Now, for each of those referenced cells, repeat these same steps to find *their* values.
  5. Once you have the values of all the referenced cells, perform the calculation in the original cell's formula using those values.
  6. The result of that calculation is the value of the original cell you were looking for.
  7. Repeat this entire process from the beginning every single time you need the value of any cell.

Code Implementation

class Spreadsheet:
    def __init__(self, data):
        self.data = data

    def get_cell_value(self, cell_name):
        cell_content = self.data.get(cell_name)

        if cell_content is None:
            return 0  # Or handle as an error

        try:
            # If the content is directly a number, return it
            return float(cell_content)
        except ValueError:
            # Content is not a number, so treat as a formula
            formula = cell_content
            referenced_cells = self._extract_cell_references(formula)
            cell_values = {}

            # Recursively get the values of referenced cells
            for referenced_cell in referenced_cells:
                cell_values[referenced_cell] = self.get_cell_value(referenced_cell)

            # Evaluate the formula with the cell values
            try:
                result = self._evaluate_formula(formula, cell_values)
                return result
            except Exception as evaluation_error:
                print(f"Error evaluating formula: {evaluation_error}")
                return None

    def _extract_cell_references(self, formula):
        # Simple extraction - improves with proper parsing
        return [part for part in formula.split() if part.isalpha()]

    def _evaluate_formula(self, formula, cell_values):
        # Replace cell references with their values
        for cell, value in cell_values.items():
            formula = formula.replace(cell, str(value))

        # Evaluate the expression.  Simple eval; improve with safer methods
        return eval(formula)

def two_sum_brute_force(numbers, target):
    # Go through every number and add every other number to it
    for first_index in range(len(numbers)):
        for second_index in range(first_index + 1, len(numbers)):
            if numbers[first_index] + numbers[second_index] == target:
                return [first_index, second_index]

    # If we've made it here, we couldn't find a match
    return []

Big(O) Analysis

Time Complexity
O(n^k) – The time complexity is exponential because of the recursive nature of formula evaluation. In the worst-case scenario, each cell's formula could depend on multiple other cells, and those cells could depend on even more cells. Let n be the number of cells in the spreadsheet and k be the maximum length of dependency chain (i.e., the longest chain of cells where each cell depends on the previous one). For each cell we request, we might have to evaluate up to 'k' nested levels of dependencies, each potentially referencing many other cells. In the worst case, imagine cell A1 depends on all other n-1 cells, and each of those cells depends on all other cells, this can lead to O(n^k) where k is related to the depth of these dependencies.
Space Complexity
O(D) – The brute force spreadsheet evaluation uses recursion to evaluate formulas. The depth of the recursion, D, depends on the longest chain of cell dependencies within the spreadsheet. In the worst case, each cell's formula might depend on another cell, leading to a recursion depth of D. Each level of recursion requires storing a new stack frame to track the function's state, including local variables and the return address. Therefore, the auxiliary space used is proportional to the maximum depth of the dependency chain, D, resulting in a space complexity of O(D).

Optimal Solution

Approach

Designing a spreadsheet involves efficiently storing and retrieving data, handling formulas, and managing dependencies between cells. The key is to represent the spreadsheet in a way that allows for quick access to cell values and efficient recalculation when a formula changes. This approach allows the spreadsheet to be both functional and performant.

Here's how the algorithm would work step-by-step:

  1. Represent the spreadsheet as a collection of cells, where each cell stores its content (either a value or a formula).
  2. When a cell contains a formula, store the formula's text and also create a dependency graph that shows which other cells the formula depends on.
  3. To get the value of a cell, first check if its value has already been calculated and stored. If so, return the stored value.
  4. If the cell contains a formula, recursively get the values of all the cells the formula depends on.
  5. Evaluate the formula using the retrieved values to calculate the cell's value and store it.
  6. When a cell's value changes, identify all the other cells that depend on it using the dependency graph.
  7. Recursively recalculate the values of all dependent cells to ensure the spreadsheet is up-to-date. Handle circular dependencies to prevent infinite loops by using temporary flags while calculating.
  8. Implement ways to quickly find cells, such as storing them in collections organized by rows and columns.

Code Implementation

class SpreadsheetCell:
    def __init__(self, content=""):
        self.content = content
        self.value = None
        self.formula = None
        self.dependencies = set()
        self.dependents = set()
        self.is_calculating = False

class Spreadsheet:
    def __init__(self, rows, cols):
        self.rows = rows
        self.cols = cols
        self.cells = [[SpreadsheetCell() for _ in range(cols)] for _ in range(rows)]

    def set_cell_content(self, row, col, content):
        cell = self.cells[row][col]
        cell.content = content
        cell.value = None
        self.update_formula(row, col, content)
        self.recalculate_dependents(row, col)

    def update_formula(self, row, col, content):
        cell = self.cells[row][col]

        # Clear old dependencies.
        for dependency_row, dependency_col in cell.dependencies:
            self.cells[dependency_row][dependency_col].dependents.remove((row, col))

        cell.dependencies = set()
        cell.formula = None

        if content.startswith("="):
            cell.formula = content[1:]
            dependencies = self.parse_formula_dependencies(cell.formula)
            cell.dependencies = dependencies

            # Update dependents of dependencies.
            for dependency_row, dependency_col in dependencies:
                self.cells[dependency_row][dependency_col].dependents.add((row, col))

    def parse_formula_dependencies(self, formula):
        dependencies = set()
        parts = formula.split("+")
        for part in parts:
            part = part.strip()
            if self.is_cell_reference(part):
                row, col = self.parse_cell_reference(part)
                dependencies.add((row, col))
        return dependencies

    def is_cell_reference(self, part):
        if len(part) < 2:
            return False
        if not part[0].isalpha() or not part[1:].isdigit():
            return False
        col_str = part[0].upper()
        row_str = part[1:]
        if not row_str.isdigit():
            return False
        row = int(row_str) - 1
        col = ord(col_str) - ord('A')
        return 0 <= row < self.rows and 0 <= col < self.cols

    def parse_cell_reference(self, reference):
        col_str = reference[0].upper()
        row_str = reference[1:]
        row = int(row_str) - 1
        col = ord(col_str) - ord('A')
        return row, col

    def get_cell_value(self, row, col):
        cell = self.cells[row][col]
        if cell.value is not None:
            return cell.value

        if cell.formula:
            return self.evaluate_formula(row, col)
        else:
            try:
                return float(cell.content)
            except ValueError:
                return cell.content

    def evaluate_formula(self, row, col):
        cell = self.cells[row][col]

        # Detect circular dependencies.
        if cell.is_calculating:
            return "#CIRCULAR"
        
        cell.is_calculating = True
        
        try:
            formula = cell.formula
            dependencies = cell.dependencies
            values = {}

            for dependency_row, dependency_col in dependencies:
                dependency_value = self.get_cell_value(dependency_row, dependency_col)

                # Non-numeric values in formula
                if isinstance(dependency_value, str):
                    values[(dependency_row, dependency_col)] = 0  # Treat as zero to proceed.
                else:
                    values[(dependency_row, dependency_col)] = dependency_value

            for dependency_row, dependency_col in dependencies:
                cell_reference = chr(dependency_col + ord('A')) + str(dependency_row + 1)
                formula = formula.replace(cell_reference, str(values[(dependency_row, dependency_col)]))

            try:
                cell.value = eval(formula)
            except (SyntaxError, NameError, TypeError, ZeroDivisionError):
                cell.value = "#ERROR"

        finally:
            cell.is_calculating = False

        return cell.value

    def recalculate_dependents(self, row, col):
        cell = self.cells[row][col]
        dependents = cell.dependents.copy()

        # Iterate over a copy to allow modification.
        for dependent_row, dependent_col in dependents:
            self.cells[dependent_row][dependent_col].value = None

            # Recalculate dependent's value
            self.get_cell_value(dependent_row, dependent_col)
            self.recalculate_dependents(dependent_row, dependent_col)

    def print_spreadsheet(self):
        for row in range(self.rows): 
            row_values = []
            for col in range(self.cols):
                row_values.append(str(self.get_cell_value(row, col)))
            print(", ".join(row_values))

Big(O) Analysis

Time Complexity
O(n*m) – The time complexity is primarily driven by formula evaluation and dependency updates. Let n be the number of cells and m be the maximum length of a dependency chain. Getting the value of a cell might require recursively getting the values of dependent cells, potentially traversing a chain of length m. Updating a cell's value can trigger recalculation of dependent cells, which in the worst case can affect all other cells. Therefore, the overall time complexity can be approximated as O(n*m), where n is the number of cells and m represents the maximum dependency chain length or the maximum number of cells that need to be recomputed as a result of changing one cell.
Space Complexity
O(N) – The space complexity is primarily determined by the dependency graph and the storage of calculated cell values. The dependency graph, representing dependencies between cells with formulas, can potentially store a relationship for each cell, leading to a space complexity proportional to the number of cells, N. Additionally, storing calculated values for each cell also contributes O(N) space. The recursion stack used during formula evaluation can, in the worst case (e.g., a long chain of dependencies), reach a depth proportional to the number of cells, further contributing O(N) space. Therefore, the overall auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty spreadsheet (rows = 0 or cols = 0)Return an appropriate default value (e.g., 0 or null) when accessing any cell, or throw an exception to signal invalid access.
Very large spreadsheet dimensions (rows and cols are very large)Use sparse data structures (e.g., dictionaries/hashmaps) to store only non-empty cell values to save memory.
Circular dependencies in formulas (e.g., A1 depends on A2, and A2 depends on A1)Detect circular dependencies during formula evaluation and throw an exception or return a specific error value to prevent infinite loops.
Formulas referencing invalid cell coordinates (e.g., referencing row -1 or column larger than spreadsheet size)Return a specific error value or throw an exception when a formula tries to access an out-of-bounds cell.
Formulas containing division by zeroHandle division by zero errors in formula evaluation by returning a specific error value (e.g., NaN) or throwing an exception.
Setting a cell's value to null or an empty stringTreat null or empty string as clearing the cell's value, defaulting to a predefined value (e.g., 0 or an empty string).
Complex formulas with nested dependencies and various arithmetic operationsImplement a robust formula parsing and evaluation mechanism that correctly handles operator precedence and function calls.
Integer overflow during formula evaluationUse appropriate data types (e.g., long or double) to prevent integer overflows during calculations, or detect overflows and return an error value.