Given a list of accounts
where each element accounts[i]
is a list of strings, where the first element accounts[i][0]
is a name, and the rest of the elements are emails representing emails of the account. Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name. After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order. Let's look at an example. Given the input accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
, the expected output would be [["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
. The first and second John's are the same person as they have the common email "johnsmith@mail.com". The third John and Mary are different people as none of their email addresses are used by other accounts. How would you implement an efficient algorithm to solve this problem?
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Path compression
return self.parent[x]
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x != root_y:
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
class Solution:
def accountsMerge(self, accounts: list[list[str]]) -> list[list[str]]:
email_to_index = {}
email_to_name = {}
email_count = 0
for account in accounts:
name = account[0]
for email in account[1:]:
if email not in email_to_index:
email_to_index[email] = email_count
email_to_name[email] = name
email_count += 1
uf = UnionFind(email_count)
for account in accounts:
first_email = account[1]
first_email_index = email_to_index[first_email]
for email in account[2:]:
email_index = email_to_index[email]
uf.union(first_email_index, email_index)
index_to_emails = {}
for email, index in email_to_index.items():
root = uf.find(index)
if root not in index_to_emails:
index_to_emails[root] = []
index_to_emails[root].append(email)
result = []
for root, emails in index_to_emails.items():
name = email_to_name[emails[0]]
sorted_emails = sorted(emails)
result.append([name] + sorted_emails)
return result
Union-Find Data Structure:
UnionFind
class is a data structure that keeps track of a set of elements partitioned into a number of disjoint subsets. It provides near-constant-time operations to add new sets, merge existing sets, and determine whether elements are in the same set.find(x)
: Determines which subset a particular element x
is in. It also implements path compression to optimize future find
operations.union(x, y)
: Merges the subsets that x
and y
are in. It uses rank-based union to keep the tree structure relatively flat for better performance.Algorithm Overview:
accountsMerge
function uses the Union-Find data structure to merge accounts that share common emails.accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],
["John","johnsmith@mail.com","john00@mail.com"],
["Mary","mary@mail.com"],
["John","johnnybravo@mail.com"]]
solution = Solution()
merged_accounts = solution.accountsMerge(accounts)
print(merged_accounts)
# Output: [['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com'], ['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com']]
email_to_index
, email_to_name
, and index_to_emails
dictionaries, as well as the parent
and rank
arrays in the Union-Find data structure.