Taro Logo

Accounts Merge

Medium
Meta logo
Meta
4 views
Topics:
ArraysGraphs

Given a list of accounts where each element accounts[i] is a list of strings, where the first element accounts[i][0] is a name, and the rest of the elements are emails representing emails of the account.

Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.

After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order.

Example 1:

Input: accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
Output: [["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
Explanation:
The first and second John's are the same person as they have the common email "johnsmith@mail.com".
The third John and Mary are different people as none of their email addresses are used by other accounts.
We could return these lists in any order, for example the answer [['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com'], 
['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com']] would still be accepted.

Example 2:

Input: accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]
Output: [["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can an account have multiple identical email addresses?
  2. If two accounts share an email, but have different names, should those accounts be merged? (Or are the names guaranteed to be the same for accounts sharing an email?)
  3. If multiple merge paths exist, is the order of the final merged accounts important? If so, what is the desired order (e.g., lexicographical by account name or first email)?
  4. Are the email addresses within a single account guaranteed to be unique?
  5. If there are no accounts, or none of the accounts can be merged, what should the output be? An empty list?

Brute Force Solution

Approach

The goal is to combine email accounts that belong to the same person. The brute force way involves checking every possible pair of accounts and merging them if they share an email. We continue merging until no more accounts can be combined.

Here's how the algorithm would work step-by-step:

  1. Start with a list of all the accounts.
  2. Pick the first account in the list.
  3. Compare it to every other account in the list.
  4. If the two accounts share any of the same email addresses, merge them into a single account.
  5. Now, take this newly merged account and compare it to every other account again.
  6. Keep merging accounts that share emails until you've compared the current account with all the other accounts and can't merge it any further.
  7. Once the current account can no longer be merged, move on to the next account in the original list that hasn't been processed.
  8. Repeat the process of comparing and merging with all other accounts until you've gone through every original account in the list.
  9. At the end, you'll have a final list of accounts, where each account contains a unique set of email addresses belonging to the same person.

Code Implementation

def accounts_merge_brute_force(accounts):
    merged_accounts = accounts[:] # Start with a copy

    number_of_accounts = len(accounts)

    for first_index in range(number_of_accounts):
        for second_index in range(number_of_accounts):
            # Ensure we don't compare the same account to itself
            if first_index == second_index:
                continue

            first_account = merged_accounts[first_index]
            second_account = merged_accounts[second_index]

            first_account_emails = set(first_account[1:])
            second_account_emails = set(second_account[1:])

            # Check if the accounts share any email addresses
            if not first_account_emails.isdisjoint(second_account_emails):
                # Merge the accounts
                merged_name = first_account[0]
                merged_emails = set(first_account[1:] + second_account[1:])
                merged_account = [merged_name] + sorted(list(merged_emails))

                merged_accounts[first_index] = merged_account

                # Mark the second account as empty to avoid reprocessing
                merged_accounts[second_index] = [""]

    # Remove any empty accounts from the final result
    final_accounts = [account for account in merged_accounts if account != [""]]

    # Need to remove dupes after all merging is complete
    final_result = []
    for account in final_accounts:
        if account not in final_result:
            final_result.append(account)

    return final_result

Big(O) Analysis

Time Complexity
O(n² * m)The algorithm iterates through each of the n accounts. For each account, it compares it with every other account, resulting in a nested loop structure. Within the comparison of two accounts, it needs to check for shared emails. Assume each account has at most m emails. Comparing these accounts takes O(m) time. Therefore, the time complexity becomes O(n * n * m), which simplifies to O(n² * m).
Space Complexity
O(N^2)The provided solution iteratively merges accounts based on shared emails. In the worst-case scenario, each account may need to store a combined list of emails from all other accounts if they are progressively merged. This implies each of the N accounts might grow to contain close to N emails, leading to a list of lists structure where each list can have up to N elements. Therefore, the auxiliary space used to store the merged accounts and email lists scales quadratically with the number of accounts, resulting in a space complexity of O(N^2).

Optimal Solution

Approach

The optimal approach involves grouping emails belonging to the same person and then merging these groups. This is done by treating emails as connections in a network and identifying connected components to find all emails associated with each account. Finally, each group of emails is combined into a single account.

Here's how the algorithm would work step-by-step:

  1. Think of each email as a point and each account as defining connections between those points.
  2. Build a system to figure out which emails are connected to each other based on the accounts they appear in. If two emails appear in the same account, they are connected.
  3. Use the connections you've found to group all connected emails together. This means finding all emails that are reachable from any given email.
  4. For each group of connected emails, associate them with the account name that contains any of those emails.
  5. Finally, combine all the emails within each group, remove duplicates, and sort them. Include the account name with the sorted list of emails to create the merged account.

Code Implementation

def accounts_merge(accounts):
    email_to_name = {}
    email_to_account_index = {}
    graph = {}

    # Build the graph and email mappings
    for account_index, account in enumerate(accounts):
        account_name = account[0]
        for email_index in range(1, len(account)):
            email = account[email_index]
            email_to_name[email] = account_name
            email_to_account_index[email] = account_index
            if email not in graph:
                graph[email] = []
            if email_index > 1:
                previous_email = account[email_index - 1]
                graph[email].append(previous_email)
                graph[previous_email].append(email)

    merged_accounts = []
    visited = set()

    # Iterate through all emails
    for email in email_to_name:
        if email not in visited:
            # Perform DFS to find connected emails
            stack = [email]
            visited.add(email)
            connected_emails = [email]

            while stack:
                current_email = stack.pop()
                for neighbor in graph.get(current_email, []):
                    if neighbor not in visited:
                        visited.add(neighbor)
                        stack.append(neighbor)
                        connected_emails.append(neighbor)

            # Sort and add the merged account
            connected_emails.sort()
            account_name = email_to_name[email]
            merged_accounts.append([account_name] + connected_emails)

    return merged_accounts

Big(O) Analysis

Time Complexity
O(n*m*log(m))Building the email graph involves iterating through each account (n accounts in total) and, for each account, iterating through the emails within that account (let the maximum number of emails in any account be m). The Union-Find operations (connecting emails) can be considered near-constant time in practice due to path compression and ranking. Finding connected components effectively visits each email and its neighbors. Sorting the emails within each connected component takes O(m*log(m)) time in the worst case as the number of distinct emails within a user's accounts is bounded by m. Therefore, the overall time complexity is dominated by the initial graph construction and the final sorting resulting in O(n*m*log(m)).
Space Complexity
O(N)The algorithm uses auxiliary space to store connections between emails and to group them. In the worst-case scenario, each email could be connected to every other email, requiring a graph or data structure like a disjoint set union to store these connections. Where N is the total number of emails across all accounts, this connectivity information could potentially occupy O(N) space. Additionally, storing the merged accounts in a new list also requires O(N) space in the worst case where each account has unique emails. Therefore, the dominant factor is the storage of email connections and merged accounts resulting in O(N) auxiliary space.

Edge Cases

CaseHow to Handle
Null or empty accounts listReturn an empty list if the input accounts list is null or empty, as there are no accounts to merge.
Single account with multiple email addressesThe algorithm should correctly group all emails belonging to that single account into a single merged account.
Multiple accounts with the same name and different email addresses, and accounts with same email and different namesThe algorithm relies on the email addresses for merging, so accounts with the same name but different emails should not be merged, accounts with same email should be merged regardless of name.
Accounts with no email addressesThese accounts should be treated as isolated accounts and included in the result without merging.
Maximum number of accounts and emails exceeding memory limitsConsider using data structures and algorithms that optimize memory usage, such as using a more efficient disjoint-set data structure or processing accounts in batches.
Cyclic email relationships (e.g., A linked to B, B linked to C, and C linked to A)The union-find data structure should correctly handle cycles and merge all connected components into a single set.
Email addresses in different casing but considered the same (e.g., 'Test@example.com' vs 'test@example.com')Normalize email addresses to lowercase before processing to ensure case-insensitive comparison and merging.
Accounts with very large number of emails in each accountAlgorithm should be efficient even with large lists in each account and should be tested to ensure no time out errors occur.