Taro Logo

Subdomain Visit Count

Medium
Roblox logo
Roblox
6 views
Topics:
ArraysStrings

A website domain "discuss.leetcode.com" consists of various subdomains. At the top level, we have "com", at the next level, we have "leetcode.com" and at the lowest level, "discuss.leetcode.com". When we visit a domain like "discuss.leetcode.com", we will also visit the parent domains "leetcode.com" and "com" implicitly.

A count-paired domain is a domain that has one of the two formats "rep d1.d2.d3" or "rep d1.d2" where rep is the number of visits to the domain and d1.d2.d3 is the domain itself.

  • For example, "9001 discuss.leetcode.com" is a count-paired domain that indicates that discuss.leetcode.com was visited 9001 times.

Given an array of count-paired domains cpdomains, return an array of the count-paired domains of each subdomain in the input. You may return the answer in any order.

Example 1:

Input: cpdomains = ["9001 discuss.leetcode.com"]
Output: ["9001 leetcode.com","9001 discuss.leetcode.com","9001 com"]
Explanation: We only have one website domain: "discuss.leetcode.com".
As discussed above, the subdomain "leetcode.com" and "com" will also be visited. So they will all be visited 9001 times.

Example 2:

Input: cpdomains = ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
Output: ["901 mail.com","50 yahoo.com","900 google.mail.com","5 wiki.org","5 org","1 intel.mail.com","951 com"]
Explanation: We will visit "google.mail.com" 900 times, "yahoo.com" 50 times, "intel.mail.com" once and "wiki.org" 5 times.
For the subdomains, we will visit "mail.com" 900 + 1 = 901 times, "com" 900 + 50 + 1 = 951 times, and "org" 5 times.

Constraints:

  • 1 <= cpdomain.length <= 100
  • 1 <= cpdomain[i].length <= 100
  • cpdomain[i] follows either the "repi d1i.d2i.d3i" format or the "repi d1i.d2i" format.
  • repi is an integer in the range [1, 104].
  • d1i, d2i, and d3i consist of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range for the count values in each 'count domain' string?
  2. Can the input array `cpdomains` be empty or contain null/empty strings?
  3. What is the expected format for the returned list of strings? Is the order of the domains important?
  4. If a subdomain appears multiple times in the input with different counts, should the counts be summed?
  5. Are the domain names guaranteed to be valid, or do I need to handle invalid characters or formats?

Brute Force Solution

Approach

The brute force approach means we will go through the input data and process each record individually. For each domain, we will break it down into all its possible subdomains and then count the visits for each of them.

Here's how the algorithm would work step-by-step:

  1. Start with the first domain and its visit count.
  2. Separate the domain name into its parts based on the periods.
  3. Create all possible subdomains, starting from the rightmost part and moving left. For example, if the domain is 'discuss.leetcode.com', create 'discuss.leetcode.com', then 'leetcode.com', then 'com'.
  4. For each of these subdomains, increase its visit count by the original count for the entire domain. Keep track of all the subdomains and their visit counts.
  5. Repeat the above steps for all the other original domains provided as input.
  6. Finally, combine the visit counts for any identical subdomains that you saw in multiple input records. Then report all the subdomains and their total visit counts.

Code Implementation

def subdomain_visits(counts_and_domains):
    domain_counts = {}

    for count_and_domain in counts_and_domains:
        count, domain = count_and_domain.split()
        count = int(count)
        subdomains = domain.split('.')

        # Iterate through the subdomains from right to left
        for i in range(len(subdomains)):
            subdomain = '.'.join(subdomains[i:])

            # Update the count for this subdomain
            if subdomain in domain_counts:
                domain_counts[subdomain] += count

            else:
                domain_counts[subdomain] = count

    result = []
    for domain, count in domain_counts.items():
        result.append(str(count) + ' ' + domain)

    return result

Big(O) Analysis

Time Complexity
O(n*m)Let n be the number of domains in the input array, and m be the maximum number of parts in a domain (e.g., 'discuss.leetcode.com' has 3 parts). For each of the n domains, we iterate up to m times to create subdomains. In the worst case, m is a constant (usually a small number like 3 or 4), but it is still dependent on the input domain string length. Therefore, the time complexity is O(n*m) because, for each domain, we split it and process up to m subdomains. Because 'm' is the maximum number of parts in a domain name and not a constant, we consider it in the time complexity.
Space Complexity
O(N)The algorithm uses a hash map (or dictionary) to store the subdomains and their visit counts. In the worst-case scenario, each input domain could generate a unique set of subdomains. Therefore, the size of the hash map can grow proportionally to N, where N represents the total number of unique subdomains across all input domains. Thus, the auxiliary space used is directly related to the number of unique subdomains that are created and stored, leading to a space complexity of O(N).

Optimal Solution

Approach

The optimal approach involves processing each domain string and breaking it down into its subdomains. We then use a helpful tool to efficiently count the occurrences of each subdomain across all input strings.

Here's how the algorithm would work step-by-step:

  1. For each domain string provided, split it into the count and the domain part.
  2. Further split the domain into its subdomains by separating it at each period.
  3. Starting from the rightmost subdomain, generate all possible subdomains.
  4. For each generated subdomain, add its count to a running tally that we keep track of.
  5. After processing all domain strings, convert the tallies into the required output format: a list of strings, each showing subdomain and its count.

Code Implementation

def subdomain_visits(domain_counts):
    subdomain_counts = {}

    for domain_count in domain_counts:
        count, domain = domain_count.split() 
        count = int(count)

        # Split the domain into subdomains.
        subdomains = domain.split('.')

        for i in range(len(subdomains)):
            # Build each subdomain starting from the right.
            subdomain = '.'.join(subdomains[i:])
            if subdomain in subdomain_counts:
                subdomain_counts[subdomain] += count
            else:
                subdomain_counts[subdomain] = count

    result = []

    # Construct the output in the required format.
    for subdomain, count in subdomain_counts.items():
        result.append(str(count) + ' ' + subdomain)

    return result

Big(O) Analysis

Time Complexity
O(n*m)Let n be the number of domain strings in the input, and let m be the maximum length of a domain string. For each domain string, we split it into subdomains, which takes O(m) time. We then iterate through these subdomains to generate all possible subdomains, again taking O(m) time in the worst case for each domain string (as the maximum number of subdomains is dependent on the length of the domain). Since we perform these operations for each of the n domain strings, the overall time complexity is O(n*m).
Space Complexity
O(N)The primary auxiliary space is used by the hash map (or similar data structure) to store the counts of each subdomain. In the worst case, each domain string could have unique subdomains, leading to a number of unique subdomains proportional to the total number of characters across all input domain strings. Therefore, the space complexity is O(N), where N is the total length of all input strings including all the subdomains.

Edge Cases

CaseHow to Handle
Empty input array cpdomainsReturn an empty list, as there are no domains to process.
Null input array cpdomainsThrow an IllegalArgumentException or return an empty list to handle the invalid input.
Empty string in cpdomainsSkip the empty string and continue processing other valid entries to avoid errors.
String in cpdomains without a space separating count and domainReturn an empty list or throw an exception as the input is malformed and cannot be parsed.
String in cpdomains with multiple spacesSplit the string by spaces, taking only the first element as the count and the rest as the domain, handling potential extra spaces.
Non-integer count in cpdomainsThrow an exception or return an empty list or skip the record, logging an error, since the count must be a valid integer.
Domain with leading/trailing dotsTrim the domain string before processing subdomains to avoid invalid subdomain entries.
Integer overflow when summing countsUse a larger data type like `long` to store the counts to avoid potential integer overflow issues.