Taro Logo

Unique Email Addresses

Easy
Twitch logo
Twitch
1 view
Topics:
ArraysStrings

Every valid email consists of a local name and a domain name, separated by the '@' sign. Besides lowercase letters, the email may contain one or more '.' or '+'.

  • For example, in "alice@leetcode.com", "alice" is the local name, and "leetcode.com" is the domain name.

If you add periods '.' between some characters in the local name part of an email address, mail sent there will be forwarded to the same address without dots in the local name. Note that this rule does not apply to domain names.

  • For example, "alice.z@leetcode.com" and "alicez@leetcode.com" forward to the same email address.

If you add a plus '+' in the local name, everything after the first plus sign will be ignored. This allows certain emails to be filtered. Note that this rule does not apply to domain names.

  • For example, "m.y+name@email.com" will be forwarded to "my@email.com".

It is possible to use both of these rules at the same time.

Given an array of strings emails where we send one email to each emails[i], return the number of different addresses that actually receive mails.

Example 1:

Input: emails = ["test.email+alex@leetcode.com","test.e.mail+bob.cathy@leetcode.com","testemail+david@lee.tcode.com"]
Output: 2
Explanation: "testemail@leetcode.com" and "testemail@lee.tcode.com" actually receive mails.

Example 2:

Input: emails = ["a@leetcode.com","b@leetcode.com","c@leetcode.com"]
Output: 3

Constraints:

  • 1 <= emails.length <= 100
  • 1 <= emails[i].length <= 100
  • emails[i] consist of lowercase English letters, '+', '.' and '@'.
  • Each emails[i] contains exactly one '@' character.
  • All local and domain names are non-empty.
  • Local names do not start with a '+' character.
  • Domain names end with the ".com" suffix.
  • Domain names must contain at least one character before ".com" suffix.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can you provide an estimate for the maximum number of emails we might receive as input?
  2. Are the email addresses guaranteed to be well-formed according to standard email syntax (e.g., containing an '@' symbol and a domain)?
  3. Is the local name case-sensitive (i.e., does 'Test' equal 'test' before or after applying the rules)?
  4. If the input list is empty, what should the function return?
  5. By 'equivalent' emails, do you mean emails that resolve to the same address *after* applying both the '.' and '+' rules to the local name?

Brute Force Solution

Approach

The brute force approach for counting unique email addresses involves systematically processing each email and comparing it against all previously processed emails. The core idea is to simulate email sending and filter out duplicates based on the rules of the email system itself. It's like manually checking every email against every other email to see if they would end up at the same inbox.

Here's how the algorithm would work step-by-step:

  1. Take the first email address from the list.
  2. Clean it up by applying the email rules: Remove periods before the @ symbol, and remove anything after a '+' symbol before the @ symbol.
  3. Check if this cleaned-up email address is already in our list of unique addresses.
  4. If it's not, add it to the list of unique addresses.
  5. Take the next email address from the original list.
  6. Clean it up using the same email rules as before.
  7. Check if this cleaned-up email address is already in our list of unique addresses.
  8. If it's not, add it to the list of unique addresses.
  9. Repeat steps 5-8 for every email address in the original list.
  10. Once you have processed all the email addresses, count how many unique email addresses are in the unique addresses list. This count is your final answer.

Code Implementation

def num_unique_emails_brute_force(emails):
    unique_email_addresses = []

    for email in emails:
        # Split the email into local and domain parts
        local_name, domain_name = email.split('@')

        # Remove periods from the local name
        local_name = local_name.replace('.', '')

        # Split at the '+' sign and take the part before it
        if '+' in local_name:
            local_name = local_name.split('+')[0]

        cleaned_email = local_name + '@' + domain_name

        # Check if this cleaned email is already in the unique list
        if cleaned_email not in unique_email_addresses:

            # Only add if the cleaned email is truly unique
            unique_email_addresses.append(cleaned_email)

    return len(unique_email_addresses)

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n email addresses. For each email, it compares it with all the previously processed emails to check for duplicates. This comparison involves at most n-1 comparisons in the worst case for each of the n emails. Therefore, the total number of comparisons is approximately n * (n-1), which simplifies to n² - n. In Big O notation, we drop the lower order terms, resulting in a time complexity of O(n²).
Space Complexity
O(N)The algorithm maintains a list of unique email addresses. In the worst-case scenario, where all N email addresses in the input are unique after processing (removing periods and parts after '+'), the list of unique addresses will store all N cleaned email addresses. Therefore, the auxiliary space required grows linearly with the number of input email addresses. This results in a space complexity of O(N).

Optimal Solution

Approach

The goal is to count how many unique email addresses are valid after applying the special rules for local names (handling periods and plus signs). Instead of directly comparing every email, we standardize them first so emails that are actually the same are treated as one. This significantly reduces the number of comparisons needed.

Here's how the algorithm would work step-by-step:

  1. For each email address, split it into the part before the @ symbol (the local name) and the part after (the domain name).
  2. Take the local name and remove all the periods.
  3. Look for the plus sign in the local name. If there is one, remove everything from the plus sign to the end of the local name.
  4. Combine the modified local name with the domain name to form the standardized email address.
  5. Keep track of all the unique standardized email addresses you've seen so far. A simple way to do this is to put each one into a collection (like a set) that only stores unique values.
  6. After processing all the emails, the number of items in your collection of unique email addresses is the answer.

Code Implementation

def unique_email_addresses(emails):
    unique_emails = set()

    for email in emails:
        local_name, domain_name = email.split('@')

        # Remove all periods from the local name
        local_name = local_name.replace('.', '')

        # Handle the plus sign by truncating the local name
        if '+' in local_name:
            plus_index = local_name.find('+')
            local_name = local_name[:plus_index]

            # Standardize the email address
        standardized_email = local_name + '@' + domain_name
        
        # Track unique email addresses using a set
        unique_emails.add(standardized_email)

    return len(unique_emails)

Big(O) Analysis

Time Complexity
O(n*m)Let n be the number of email addresses and m be the maximum length of an email address. The code iterates through each email address once, which takes O(n) time. Inside the loop, the operations on the local name, such as removing periods and characters after the plus sign, take O(m) time, where m is the length of the email address. The set insertion takes on average O(1) time. Therefore, the overall time complexity is O(n*m) since we do these operations on each of the n email addresses.
Space Complexity
O(N)The primary space usage comes from storing the unique standardized email addresses in a collection, as described in step 5. In the worst case, all N email addresses in the input are unique after standardization. Therefore, the set of unique emails could grow up to size N, where N is the number of input email addresses. This results in an auxiliary space complexity of O(N).

Edge Cases

CaseHow to Handle
Empty email arrayReturn 0 if the input email array is empty.
Null email arrayThrow an IllegalArgumentException or return 0 if the input email array is null.
Email with only local nameHandle emails without a domain by treating the local name as the full email (e.g. 'test' becomes 'test').
Email with only domain nameReject emails without local names or treat them as invalid and exclude them from the count.
Email with consecutive dots in local nameCollapse consecutive dots into a single dot or remove them altogether as per the rule.
Email with plus sign at the end of local nameThe plus sign and all characters following it are dropped, including the ending character.
Extremely long email addresses causing potential memory issuesLimit the length of processed email strings to prevent excessive memory consumption and potential denial of service.
Emails with invalid characters (e.g., control characters, Unicode)Filter out or sanitize emails containing invalid characters to ensure correct processing.