Taro Logo

Masking Personal Information

#162 Most AskedMedium
3 views
Topics:
Strings

You are given a personal information string s, representing either an email address or a phone number. Return the masked personal information using the below rules.

Email address:

An email address is:

  • A name consisting of uppercase and lowercase English letters, followed by
  • The '@' symbol, followed by
  • The domain consisting of uppercase and lowercase English letters with a dot '.' somewhere in the middle (not the first or last character).

To mask an email:

  • The uppercase letters in the name and domain must be converted to lowercase letters.
  • The middle letters of the name (i.e., all but the first and last letters) must be replaced by 5 asterisks "*****".

Phone number:

A phone number is formatted as follows:

  • The phone number contains 10-13 digits.
  • The last 10 digits make up the local number.
  • The remaining 0-3 digits, in the beginning, make up the country code.
  • Separation characters from the set {'+', '-', '(', ')', ' '} separate the above digits in some way.

To mask a phone number:

  • Remove all separation characters.
  • The masked phone number should have the form:
    • "***-***-XXXX" if the country code has 0 digits.
    • "+*-***-***-XXXX" if the country code has 1 digit.
    • "+**-***-***-XXXX" if the country code has 2 digits.
    • "+***-***-***-XXXX" if the country code has 3 digits.
  • "XXXX" is the last 4 digits of the local number.

Example 1:

Input: s = "LeetCode@LeetCode.com"
Output: "l*****e@leetcode.com"
Explanation: s is an email address.
The name and domain are converted to lowercase, and the middle of the name is replaced by 5 asterisks.

Example 2:

Input: s = "AB@qq.com"
Output: "a*****b@qq.com"
Explanation: s is an email address.
The name and domain are converted to lowercase, and the middle of the name is replaced by 5 asterisks.
Note that even though "ab" is 2 characters, it still must have 5 asterisks in the middle.

Example 3:

Input: s = "1(234)567-890"
Output: "***-***-7890"
Explanation: s is a phone number.
There are 10 digits, so the local number is 10 digits and the country code is 0 digits.
Thus, the resulting masked number is "***-***-7890".

Constraints:

  • s is either a valid email or a phone number.
  • If s is an email:
    • 8 <= s.length <= 40
    • s consists of uppercase and lowercase English letters and exactly one '@' symbol and '.' symbol.
  • If s is a phone number:
    • 10 <= s.length <= 20
    • s consists of digits, spaces, and the symbols '(', ')', '-', and '+'.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. How can I differentiate between an email address and a phone number in the input string?
  2. What is the expected format of the phone number input (e.g., are there spaces, dashes, or parentheses)? What characters are guaranteed to be there?
  3. If the phone number contains a country code, how many digits will it be, and what characters will those be? Is the + symbol guaranteed to appear?
  4. For email addresses, is the local part (before the @) and the domain part (after the @) guaranteed to have at least 5 characters and 3 characters respectively before masking, or should I handle shorter strings?
  5. If the input is neither a valid email address nor a valid phone number according to your definitions, what should the function return?

Brute Force Solution

Approach

The brute force method addresses masking personal information by trying every possible mask and format. This involves checking all potential combinations to find one that satisfies the rules for email and phone number masking. We check each option to comply with the specified formatting.

Here's how the algorithm would work step-by-step:

  1. First, check if the input is an email address or a phone number.
  2. If it's an email, try masking different parts of the name and domain, leaving some characters visible as specified in the rules.
  3. For each masking possibility, ensure the format is valid by following the expected email structure (e.g., something@something.com).
  4. If it's a phone number, try masking the appropriate digits according to the specified pattern.
  5. For each phone number masking possibility, make sure the format is valid, especially with the country code format if applicable.
  6. For both emails and phone numbers, continue trying every single combination until a valid masking format is found.

Code Implementation

def mask_pii(personal_information):    if '@' in personal_information:        # Handle email masking.        username, domain = personal_information.split('@')        masked_username = username[0] + '*****' + username[-1]        masked_email = masked_username + '@' + domain        return masked_email.lower()    else:        # Handle phone number masking.        digits_only = ''.join(character for character in personal_information if character.isdigit())        # We handle phone numbers that are 10 to 13 digits long.        if len(digits_only) == 10:            local_number = '***-***-' + digits_only[-4:]            return '+*-' + local_number        elif len(digits_only) == 11:            country_code = '+' + digits_only[0] + '-'            local_number = '***-***-' + digits_only[-4:]            return country_code + '***-' + local_number        elif len(digits_only) == 12:            country_code = '+' + digits_only[:2] + '-'            local_number = '***-***-' + digits_only[-4:]            return country_code + '***-' + local_number        else:            # Length is 13            country_code = '+' + digits_only[:3] + '-'            local_number = '***-***-' + digits_only[-4:]            return country_code + '***-' + local_number

Big(O) Analysis

Time Complexity
O(1)The described brute force method involves checking a fixed number of email and phone number formats based on specific masking rules. The number of possibilities to check is constant, independent of the length of the input string (email or phone number). Therefore, the time complexity remains constant, regardless of the input size. Consequently, the algorithm performs a fixed number of operations.
Space Complexity
O(1)The provided brute force method explores masking possibilities without storing them. While it iterates and checks many combinations, it does not maintain auxiliary data structures whose size depends on the input email or phone number length (N). It may use a few fixed-size variables for indexing or temporary string manipulation within each iteration, but this usage is constant. Therefore, the space complexity is O(1).

Optimal Solution

Approach

The goal is to mask parts of an email address or phone number based on its type. We'll figure out what kind of information we have, then apply the right masking steps. By handling email and phone numbers differently and directly, we avoid unnecessary work.

Here's how the algorithm would work step-by-step:

  1. First, check if we have an email address or a phone number.
  2. If it's an email address, split it into the username and domain parts.
  3. Mask the username by keeping only the first and last characters, and putting five asterisks in the middle.
  4. Combine the masked username with the domain name using the @ symbol to get the masked email.
  5. If it's a phone number, remove all non-digit characters (like spaces and dashes).
  6. Keep the last ten digits of the cleaned phone number. These are the actual digits we need to mask.
  7. Mask the first six digits with asterisks in groups of three separated by hyphens.
  8. Add the country code, if present. If the original number was longer than 10 digits, add '+*-' followed by asterisks, and a hyphen before the last ten digits.
  9. Combine all parts to form the final masked phone number string.

Code Implementation

def mask_personal_information(personal_info):
    if '@' in personal_info:
        # Handle email address
        local_name, domain_name = personal_info.split('@')
        masked_local_name = local_name[0] + "*****" + local_name[-1]
        return masked_local_name + '@' + domain_name
    else:
        # Handle phone number
        digits_only = ''.join(filter(str.isdigit, personal_info))

        number_of_digits = len(digits_only)

        # Determine country code and number formatting
        if number_of_digits == 10:
            local_number = digits_only[-10:]
            formatted_number = "***-***-" + local_number[-4:]
            return "+" + "*" * 0 + "-" + formatted_number
        else:
            country_code_length = number_of_digits - 10
            country_code = '+' + '*' * country_code_length
            local_number = digits_only[-10:]

            # Mask the phone number as required
            formatted_number = "***-***-" + local_number[-4:]
            return country_code + '-' + formatted_number

Big(O) Analysis

Time Complexity
O(n)The algorithm first determines if the input is an email or phone number, which takes constant time. If it's an email, string manipulations such as splitting and concatenation are performed, all in O(1) time because they involve fixed-size string operations. If it's a phone number, it iterates through the input string once to remove non-digit characters, taking O(n) time where n is the length of the input string. The remaining operations on the cleaned phone number (masking and concatenation) take constant time. Thus, the dominant operation is the removal of non-digit characters from the phone number string, giving a time complexity of O(n).
Space Complexity
O(1)The space complexity is O(1) because the algorithm primarily uses a fixed number of string variables and temporary storage, regardless of the length of the input string. When handling emails, the username and domain are substrings or modified versions of substrings, whose size is bounded by the original string's length, but we are not creating N copies, just fixed copies. Similarly, for phone numbers, we might create substrings or intermediate strings, but the amount of extra space is independent of the input length. Therefore, the auxiliary space used is constant.

Edge Cases

Null or empty input string
How to Handle:
Return an empty string or throw an IllegalArgumentException since there is nothing to mask.
Email with less than 7 characters (e.g., 'a@b.c')
How to Handle:
Return 'xxxxx@xxxxx.xxx' after the standard masking procedure, which implies masking as much as possible, even if insufficient chars exist.
Email with '+' or other special characters in username/domain
How to Handle:
Treat '+' and other standard email characters normally during masking to 'xxxxx@xxxxx.xxx'.
Phone number with only 7 digits (no country code)
How to Handle:
Mask as '***-***-xxxx' which assumes the local number format without a country code.
Phone number with non-numeric characters besides '+', '-', and spaces
How to Handle:
Remove non-numeric characters before masking to ensure only digits are processed.
Phone number starting with '+0'
How to Handle:
Treat it as country code 0 during the masking process of '+*-***-***-xxxx'.
Very long email address exceeding reasonable limits
How to Handle:
Mask as 'xxxxx@xxxxx.xxx', scaling masking to the first 5 characters without performance issues.
Input string is neither a valid email nor a valid phone number
How to Handle:
Return an empty string or throw an IllegalArgumentException indicating invalid format because there is no sensible masking possible.
0/202 completed