Taro Logo

Number of Atoms

Hard
Amazon logo
Amazon
1 view
Topics:
StringsStacks

Given a string formula representing a chemical formula, return the count of each atom.

The atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

One or more digits representing that element's count may follow if the count is greater than 1. If the count is 1, no digits will follow.

  • For example, "H2O" and "H2O2" are possible, but "H1O2" is impossible.

Two formulas are concatenated together to produce another formula.

  • For example, "H2O2He3Mg4" is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula.

  • For example, "(H2O2)" and "(H2O2)3" are formulas.

Return the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

The test cases are generated so that all the values in the output fit in a 32-bit integer.

Example 1:

Input: formula = "H2O"
Output: "H2O"
Explanation: The count of elements are {'H': 2, 'O': 1}.

Example 2:

Input: formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

Example 3:

Input: formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters are allowed in the formula string besides element symbols and numbers?
  2. What is the maximum length of the input formula string?
  3. Can the coefficient of an element or group be greater than 999? Is there an upper bound to the integers?
  4. If the formula is invalid, such as having mismatched parentheses, what should the function return?
  5. Is the case of element symbols significant, and should I assume the input follows standard chemical element naming (e.g., 'Si' is valid, but 'si' is not)?

Brute Force Solution

Approach

The brute force approach for counting atoms in a chemical formula involves exploring every possible combination of atom counts and groupings. We essentially break down the formula piece by piece, checking all possibilities. It's like trying every combination lock until you find the right one.

Here's how the algorithm would work step-by-step:

  1. Start by examining the formula character by character, one at a time.
  2. If you find an element symbol, assume it could potentially be a single atom or part of a larger group.
  3. If you encounter a parenthesis, explore all possible groupings within it, considering different multipliers that might be outside the parenthesis.
  4. Treat each potential combination as a possibility to track and calculate the total number of each atom.
  5. When you reach the end of the formula, you'll have checked every possible combination.
  6. Finally, organize the counts for each atom and present them in the required format.

Code Implementation

def count_atoms_brute_force(formula):
    element_counts = {}

    def parse_formula(formula_string, multiplier):
        index = 0
        while index < len(formula_string):
            character = formula_string[index]

            if character.isupper():
                start = index
                index += 1
                while index < len(formula_string) and formula_string[index].islower():
                    index += 1
                element = formula_string[start:index]
                count_start = index

                while index < len(formula_string) and formula_string[index].isdigit():
                    index += 1
                count_string = formula_string[count_start:index]

                element_count = 1
                if count_string:
                    element_count = int(count_string)

                if element not in element_counts:
                    element_counts[element] = 0
                element_counts[element] += element_count * multiplier

            elif character == '(': 
                # Recursively parse the subformula inside the parenthesis
                index += 1
                parenthesis_level = 1
                start = index

                while index < len(formula_string) and parenthesis_level > 0:
                    if formula_string[index] == '(': 
                        parenthesis_level += 1
                    elif formula_string[index] == ')':
                        parenthesis_level -= 1

                    index += 1

                sub_formula = formula_string[start:index - 1]
                count_start = index

                while index < len(formula_string) and formula_string[index].isdigit():
                    index += 1
                count_string = formula_string[count_start:index]

                sub_formula_multiplier = 1
                if count_string:
                    sub_formula_multiplier = int(count_string)

                parse_formula(sub_formula, multiplier * sub_formula_multiplier)

            else:
                index += 1

    parse_formula(formula, 1)
    sorted_elements = sorted(element_counts.keys())
    result = ""
    for element in sorted_elements:
        result += element
        if element_counts[element] > 1:
            result += str(element_counts[element])

    return result

Big(O) Analysis

Time Complexity
O(exponential)The brute force approach explores all possible combinations of groupings and multipliers within the chemical formula. In the worst-case scenario, with nested parentheses and varying multipliers, the number of combinations to explore grows exponentially with the length (n) of the formula. For each element or parenthesis, the algorithm explores numerous possibilities. This exponential growth in the number of explored paths leads to a time complexity of O(exponential).
Space Complexity
O(N)The brute force approach described relies on exploring all possible combinations of atom counts and groupings within the chemical formula. This requires storing intermediate counts for different atoms and potential groupings, likely using a hash map or similar data structure to keep track of these counts. The size of this data structure can grow linearly with the length of the formula (N), as the number of potential atom groupings increases with formula size. The maximum size of the call stack could also potentially grow linearly with N depending on the depth of nested parentheses leading to O(N) auxiliary space complexity.

Optimal Solution

Approach

The best way to solve this problem is to process the chemical formula from right to left using a stack. This allows us to efficiently keep track of element counts and their nesting within parentheses.

Here's how the algorithm would work step-by-step:

  1. Start by processing the chemical formula from the very end to the beginning.
  2. Keep track of counts of each atom encountered so far using a temporary storage.
  3. When you see a number, it means that the preceding atom or group of atoms should be multiplied by that number. Multiply accordingly.
  4. When you see a closing parenthesis, it indicates the beginning of a nested group. Save the current atom counts onto a main storage.
  5. When you see an opening parenthesis, it means you are exiting a nested group. Merge the saved atom counts back into the current counts, applying any multipliers encountered.
  6. After processing the entire formula, format the result by listing each atom in alphabetical order along with its count (if the count is greater than 1).

Code Implementation

def number_of_atoms(formula):
    element_counts = {}
    stack = []
    index = len(formula) - 1

    while index >= 0:
        character = formula[index]

        if character.isdigit():
            number_end_index = index
            while index >= 0 and formula[index].isdigit():
                index -= 1
            
            multiplier = int(formula[index + 1: number_end_index + 1])

            for element, count in element_counts.items():
                element_counts[element] = count * multiplier

        elif character == ')':
            # Save current counts when entering a nested group.
            stack.append(element_counts)
            element_counts = {}
            index -= 1

        elif character == '(':           
            # Merge saved counts back when exiting a group.
            previous_element_counts = stack.pop()
            for element, count in element_counts.items():
                previous_element_counts[element] = previous_element_counts.get(element, 0) + count
            element_counts = previous_element_counts
            index -= 1
        
        elif character.isupper():
            element = character
            index -= 1

            while index >= 0 and formula[index].islower():
                element = formula[index] + element
                index -= 1
            
            # Atom count = 1 if not already present
            element_counts[element] = element_counts.get(element, 0) + 1

        else:
            index -= 1

    sorted_elements = sorted(element_counts.keys())
    result = ""
    for element in sorted_elements:
        result += element
        count = element_counts[element]
        if count > 1:
            result += str(count)

    return result

Big(O) Analysis

Time Complexity
O(n log n)The dominant operation is iterating through the formula string of length n from right to left. Inside the loop, we perform operations like parsing numbers and updating counts in a dictionary (or hash map). The count updates generally take O(1) on average. However, at the end, to format the result, we sort the atoms alphabetically, which takes O(k log k) time, where k is the number of distinct atoms. In the worst case, k can be proportional to n. Thus sorting dominates and the algorithm runs in O(n log n) time.
Space Complexity
O(N)The auxiliary space complexity is dominated by the stack used to store atom counts at each parenthesis level and the temporary storage (likely a hash map) used to keep track of atom counts at the current level. In the worst-case scenario with deeply nested parentheses, the stack could grow to a depth proportional to the length of the formula, N. Similarly, the hash map can store a number of elements up to N in some cases. Thus, the space complexity is O(N), where N is the length of the input formula.

Edge Cases

CaseHow to Handle
Null or empty input stringReturn an empty string as there are no atoms to count
Formula with no atoms (e.g., '()')Return an empty string as the parsing logic will result in an empty counts map.
Single atom formula (e.g., 'H')Handle as the base case where atom count is implicitly 1.
Deeply nested formula (e.g., '((H))')Ensure the stack handling the parenthesis doesn't overflow or become inefficient.
Large counts (e.g., 'H999999999')The solution uses integers to store counts, so integer overflow should be considered; however, the problem statement should specify if such cases are expected and if so use a different data structure.
Unclosed parenthesis (e.g., '(H')Throw an error or return an appropriate message indicating invalid input string.
Formula with zero count for an atom (e.g., 'H0')Ignore the atom when a 0 follows a letter so as not to include an empty result.
Atoms appearing in reverse order of atomic number (e.g. 'OsHe')The sorting step should ensure the output is always in alphabetical order regardless of input sequence.