Taro Logo

Accounts Merge

Medium
Meta logo
Meta
Topics:
ArraysGraphs

Given a list of accounts where each element accounts[i] is a list of strings, where the first element accounts[i][0] is a name, and the rest of the elements are emails representing emails of the account.

Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.

After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order.

Example 1:

Input: accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
Output: [["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
Explanation:
The first and second John's are the same person as they have the common email "johnsmith@mail.com".
The third John and Mary are different people as none of their email addresses are used by other accounts.
We could return these lists in any order, for example the answer [['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com'], 
['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com']] would still be accepted.

Example 2:

Input: accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]
Output: [["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]

Solution


Naive Approach: Brute Force

One straightforward approach is to iterate through all pairs of accounts and check if they share a common email. If they do, we merge them. We repeat this process until no more merges are possible. The downside is that it's highly inefficient.

Algorithm:

  1. For each account, compare it with all other accounts.
  2. If two accounts share a common email, merge them into one.
  3. Repeat until no more merges occur.
  4. Sort the emails in each merged account and return the result.

Big O Analysis:

  • Time Complexity: O(N^2 * M), where N is the number of accounts and M is the maximum number of emails per account. The N^2 comes from the nested loops to compare each pair of accounts. The M comes from potentially needing to scan the email lists to perform a merge.
  • Space Complexity: O(N * M) in the worst case, to store the merged accounts. Where N is the number of input accounts and M is the maximum emails per account.

Edge Cases:

  • Empty input list.
  • Accounts with no emails.
  • Duplicate emails within the same account.

Optimal Approach: Disjoint Set Union (DSU)

A much more efficient solution uses the Disjoint Set Union (DSU) data structure (also known as Union-Find). Each email address is treated as a node, and accounts are edges. We union all emails within an account, then collect emails by the root name.

Algorithm:

  1. Initialization: Create a DSU data structure where each email is initially in its own set. Also create a map to store the account name associated with the first email in each account.
  2. Union: Iterate through the accounts. For each account, union all emails in that account together in the DSU. The first email in the account will act as the "representative" of the account's name.
  3. Collect: After unioning, create a map to store the emails associated with each root (representative) email. Traverse the emails, find the root for each email, and add the email to the list of emails for that root.
  4. Format: Finally, create the result list. For each root email, create an account with the account name (retrieved from the name map) and the sorted list of emails associated with that root.

Big O Analysis:

  • Time Complexity: O(N * M * α(N * M) + S log S), where N is the number of accounts, M is the maximum number of emails per account, α is the inverse Ackermann function (which grows very slowly and can be considered almost constant), and S is the total number of emails. The N * M * α(N * M) comes from the DSU operations (union and find). The S log S comes from sorting the emails for each account.
  • Space Complexity: O(N * M), to store the DSU data structure, email-to-name map, and the merged email lists.

Edge Cases:

  • Empty input list.
  • Accounts with no emails.
  • Duplicate emails within the same account (handled correctly by the set).

Code (Python):

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x != root_y:
            self.parent[root_x] = root_y


def accountsMerge(accounts):
    email_to_name = {}
    email_to_index = {}
    index_to_email = {}
    email_count = 0

    for account in accounts:
        name = account[0]
        for email in account[1:]:
            if email not in email_to_index:
                email_to_name[email] = name
                email_to_index[email] = email_count
                index_to_email[email_count] = email
                email_count += 1

    dsu = DSU(email_count)

    for account in accounts:
        first_email = account[1]
        first_index = email_to_index[first_email]
        for email in account[2:]:
            index = email_to_index[email]
            dsu.union(first_index, index)

    email_groups = {}
    for email in email_to_index:
        index = email_to_index[email]
        root_index = dsu.find(index)
        root_email = index_to_email[root_index]
        if root_email not in email_groups:
            email_groups[root_email] = []
        email_groups[root_email].append(email)

    result = []
    for root_email in email_groups:
        name = email_to_name[root_email]
        emails = sorted(email_groups[root_email])
        result.append([name] + emails)

    return result