Taro Logo

Accounts Merge

Medium
1 views
a month ago

Given a list of accounts where each element accounts[i] is a list of strings, where the first element accounts[i][0] is a name, and the rest of the elements are emails representing emails of the account. Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name. After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order. Let's look at an example. Given the input accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]], the expected output would be [["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]. The first and second John's are the same person as they have the common email "johnsmith@mail.com". The third John and Mary are different people as none of their email addresses are used by other accounts. How would you implement an efficient algorithm to solve this problem?

Sample Answer
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        if root_x != root_y:
            if self.rank[root_x] < self.rank[root_y]:
                self.parent[root_x] = root_y
            elif self.rank[root_x] > self.rank[root_y]:
                self.parent[root_y] = root_x
            else:
                self.parent[root_y] = root_x
                self.rank[root_x] += 1


class Solution:
    def accountsMerge(self, accounts: list[list[str]]) -> list[list[str]]:
        email_to_index = {}
        email_to_name = {}
        email_count = 0

        for account in accounts:
            name = account[0]
            for email in account[1:]:
                if email not in email_to_index:
                    email_to_index[email] = email_count
                    email_to_name[email] = name
                    email_count += 1

        uf = UnionFind(email_count)

        for account in accounts:
            first_email = account[1]
            first_email_index = email_to_index[first_email]
            for email in account[2:]:
                email_index = email_to_index[email]
                uf.union(first_email_index, email_index)

        index_to_emails = {}
        for email, index in email_to_index.items():
            root = uf.find(index)
            if root not in index_to_emails:
                index_to_emails[root] = []
            index_to_emails[root].append(email)

        result = []
        for root, emails in index_to_emails.items():
            name = email_to_name[emails[0]]
            sorted_emails = sorted(emails)
            result.append([name] + sorted_emails)

        return result

Explanation:

  1. Union-Find Data Structure:

    • The UnionFind class is a data structure that keeps track of a set of elements partitioned into a number of disjoint subsets. It provides near-constant-time operations to add new sets, merge existing sets, and determine whether elements are in the same set.
    • find(x): Determines which subset a particular element x is in. It also implements path compression to optimize future find operations.
    • union(x, y): Merges the subsets that x and y are in. It uses rank-based union to keep the tree structure relatively flat for better performance.
  2. Algorithm Overview:

    • The accountsMerge function uses the Union-Find data structure to merge accounts that share common emails.
    • First, it builds mappings from emails to unique integer indices and from emails to account names.
    • Then, it iterates through the accounts, using the Union-Find data structure to merge the sets of emails belonging to the same person.
    • Finally, it constructs the result by grouping emails belonging to the same set and associating them with the corresponding name.

Example:

accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],
            ["John","johnsmith@mail.com","john00@mail.com"],
            ["Mary","mary@mail.com"],
            ["John","johnnybravo@mail.com"]]

solution = Solution()
merged_accounts = solution.accountsMerge(accounts)
print(merged_accounts)
# Output: [['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com'], ['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com']]

Time Complexity:

  • O(N * K * α(N)), where N is the number of accounts, K is the maximum number of emails in an account, and α is the inverse Ackermann function, which grows very slowly, so it can be considered almost constant. The dominant operations are iterating through the accounts and their emails to perform the union operations.

Space Complexity:

  • O(N * K), where N is the number of accounts, and K is the maximum number of emails in an account. This space is used to store the email_to_index, email_to_name, and index_to_emails dictionaries, as well as the parent and rank arrays in the Union-Find data structure.