Given a string formula
representing a chemical formula, return the count of each atom.
The atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.
One or more digits representing that element's count may follow if the count is greater than 1
. If the count is 1
, no digits will follow.
"H2O"
and "H2O2"
are possible, but "H1O2"
is impossible.Two formulas are concatenated together to produce another formula.
"H2O2He3Mg4"
is also a formula.A formula placed in parentheses, and a count (optionally added) is also a formula.
"(H2O2)"
and "(H2O2)3"
are formulas.Return the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1
), followed by the second name (in sorted order), followed by its count (if that count is more than 1
), and so on.
The test cases are generated so that all the values in the output fit in a 32-bit integer.
Example 1:
Input: formula = "H2O"
Output: "H2O"
Explanation: The count of elements are {'H': 2, 'O': 1}.
Example 2:
Input: formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.
Example 3:
Input: formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach for counting atoms in a chemical formula involves exploring every possible combination of atom counts and groupings. We essentially break down the formula piece by piece, checking all possibilities. It's like trying every combination lock until you find the right one.
Here's how the algorithm would work step-by-step:
def count_atoms_brute_force(formula):
element_counts = {}
def parse_formula(formula_string, multiplier):
index = 0
while index < len(formula_string):
character = formula_string[index]
if character.isupper():
start = index
index += 1
while index < len(formula_string) and formula_string[index].islower():
index += 1
element = formula_string[start:index]
count_start = index
while index < len(formula_string) and formula_string[index].isdigit():
index += 1
count_string = formula_string[count_start:index]
element_count = 1
if count_string:
element_count = int(count_string)
if element not in element_counts:
element_counts[element] = 0
element_counts[element] += element_count * multiplier
elif character == '(':
# Recursively parse the subformula inside the parenthesis
index += 1
parenthesis_level = 1
start = index
while index < len(formula_string) and parenthesis_level > 0:
if formula_string[index] == '(':
parenthesis_level += 1
elif formula_string[index] == ')':
parenthesis_level -= 1
index += 1
sub_formula = formula_string[start:index - 1]
count_start = index
while index < len(formula_string) and formula_string[index].isdigit():
index += 1
count_string = formula_string[count_start:index]
sub_formula_multiplier = 1
if count_string:
sub_formula_multiplier = int(count_string)
parse_formula(sub_formula, multiplier * sub_formula_multiplier)
else:
index += 1
parse_formula(formula, 1)
sorted_elements = sorted(element_counts.keys())
result = ""
for element in sorted_elements:
result += element
if element_counts[element] > 1:
result += str(element_counts[element])
return result
The best way to solve this problem is to process the chemical formula from right to left using a stack. This allows us to efficiently keep track of element counts and their nesting within parentheses.
Here's how the algorithm would work step-by-step:
def number_of_atoms(formula):
element_counts = {}
stack = []
index = len(formula) - 1
while index >= 0:
character = formula[index]
if character.isdigit():
number_end_index = index
while index >= 0 and formula[index].isdigit():
index -= 1
multiplier = int(formula[index + 1: number_end_index + 1])
for element, count in element_counts.items():
element_counts[element] = count * multiplier
elif character == ')':
# Save current counts when entering a nested group.
stack.append(element_counts)
element_counts = {}
index -= 1
elif character == '(':
# Merge saved counts back when exiting a group.
previous_element_counts = stack.pop()
for element, count in element_counts.items():
previous_element_counts[element] = previous_element_counts.get(element, 0) + count
element_counts = previous_element_counts
index -= 1
elif character.isupper():
element = character
index -= 1
while index >= 0 and formula[index].islower():
element = formula[index] + element
index -= 1
# Atom count = 1 if not already present
element_counts[element] = element_counts.get(element, 0) + 1
else:
index -= 1
sorted_elements = sorted(element_counts.keys())
result = ""
for element in sorted_elements:
result += element
count = element_counts[element]
if count > 1:
result += str(count)
return result
Case | How to Handle |
---|---|
Null or empty input string | Return an empty string as there are no atoms to count |
Formula with no atoms (e.g., '()') | Return an empty string as the parsing logic will result in an empty counts map. |
Single atom formula (e.g., 'H') | Handle as the base case where atom count is implicitly 1. |
Deeply nested formula (e.g., '((H))') | Ensure the stack handling the parenthesis doesn't overflow or become inefficient. |
Large counts (e.g., 'H999999999') | The solution uses integers to store counts, so integer overflow should be considered; however, the problem statement should specify if such cases are expected and if so use a different data structure. |
Unclosed parenthesis (e.g., '(H') | Throw an error or return an appropriate message indicating invalid input string. |
Formula with zero count for an atom (e.g., 'H0') | Ignore the atom when a 0 follows a letter so as not to include an empty result. |
Atoms appearing in reverse order of atomic number (e.g. 'OsHe') | The sorting step should ensure the output is always in alphabetical order regardless of input sequence. |