Taro Logo

People Whose List of Favorite Companies Is Not a Subset of Another List

Medium
Datadog logo
Datadog
7 views
Topics:
ArraysStringsTwo Pointers

Given the array favoriteCompanies where favoriteCompanies[i] is the list of favorites companies for the ith person (indexed from 0).

Return the indices of people whose list of favorite companies is not a subset of any other list of favorites companies. You must return the indices in increasing order.

Example 1:

Input: favoriteCompanies = [["leetcode","google","facebook"],["google","microsoft"],["google","facebook"],["google"],["amazon"]]
Output: [0,1,4] 
Explanation: 
Person with index=2 has favoriteCompanies[2]=["google","facebook"] which is a subset of favoriteCompanies[0]=["leetcode","google","facebook"] corresponding to the person with index 0. 
Person with index=3 has favoriteCompanies[3]=["google"] which is a subset of favoriteCompanies[0]=["leetcode","google","facebook"] and favoriteCompanies[1]=["google","microsoft"]. 
Other lists of favorite companies are not a subset of another list, therefore, the answer is [0,1,4].

Example 2:

Input: favoriteCompanies = [["leetcode","google","facebook"],["leetcode","amazon"],["facebook","google"]]
Output: [0,1] 
Explanation: In this case favoriteCompanies[2]=["facebook","google"] is a subset of favoriteCompanies[0]=["leetcode","google","facebook"], therefore, the answer is [0,1].

Example 3:

Input: favoriteCompanies = [["leetcode"],["google"],["facebook"],["amazon"]]
Output: [0,1,2,3]

Constraints:

  • 1 <= favoriteCompanies.length <= 100
  • 1 <= favoriteCompanies[i].length <= 500
  • 1 <= favoriteCompanies[i][j].length <= 20
  • All strings in favoriteCompanies[i] are distinct.
  • All lists of favorite companies are distinct, that is, If we sort alphabetically each list then favoriteCompanies[i] != favoriteCompanies[j].
  • All strings consist of lowercase English letters only.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the data types of the company names and the list of favorite companies? Can I assume they are strings?
  2. Can a person's list of favorite companies be empty? If so, what should be the output in that case?
  3. Are the lists of favorite companies guaranteed to be unique for each person, or can a person have duplicate companies in their list?
  4. If multiple people have lists that are not subsets of any other list, is the order of those people in the output important?
  5. What is the maximum number of people and the maximum number of companies in each person's list?

Brute Force Solution

Approach

We're given lists of favorite companies for different people, and we want to find people whose list isn't contained within anyone else's list. The brute force way is to simply compare each person's list to every other person's list to check for subsets.

Here's how the algorithm would work step-by-step:

  1. Take the first person's list of favorite companies.
  2. Compare this list to the second person's list. See if all the companies in the first person's list are also in the second person's list. If they are, it means the first person's list is a subset of the second person's.
  3. Then compare the first person's list to the third person's list, and so on, until you've compared it to everyone else's list.
  4. If after comparing the first person's list to everyone else's list, you find that it is NOT a subset of any other list, then the first person is one of the people we're looking for.
  5. Now, move on to the second person's list of favorite companies.
  6. Repeat the process of comparing the second person's list to every other person's list (including the first person's list).
  7. If the second person's list is not a subset of any other list, then the second person is also one of the people we're looking for.
  8. Continue this process for every person's list. By the end, you'll have identified all the people whose list of favorite companies is not a subset of anyone else's list.

Code Implementation

def find_non_subset_people(favorite_companies_lists):
    result = []
    number_of_people = len(favorite_companies_lists)

    for i in range(number_of_people):
        is_subset = False
        for j in range(number_of_people):
            # Skip comparing a list to itself
            if i == j:
                continue

            # Check if the current person's list is a subset of another person's list
            if set(favorite_companies_lists[i]).issubset(set(favorite_companies_lists[j])):
                is_subset = True

                # If it's a subset, no need to compare with other lists
                break

        # If the person's list is not a subset of any other list
        if not is_subset:
            # Add the person's index to the result
            result.append(i)

    return result

Big(O) Analysis

Time Complexity
O(n² * m)We iterate through each person's list of favorite companies (n lists). For each person, we compare their list to every other person's list (n-1 lists). The comparison of two lists to determine if one is a subset of the other takes O(m) time, where m is the average length of the company lists. Therefore, for each of the n people, we do n-1 comparisons, each taking O(m) time, resulting in roughly n * (n-1) * m operations. This simplifies to O(n² * m).
Space Complexity
O(1)The provided algorithm iterates through the input lists using index variables. It performs comparisons directly without creating any auxiliary data structures like temporary lists, hash maps, or sets to store intermediate results or mark visited lists. The space used is limited to a few index variables, and the number of these variables does not depend on the input size (N, where N is the number of lists and/or the total number of companies across all lists). Therefore, the auxiliary space complexity is constant.

Optimal Solution

Approach

The goal is to find people whose favorite companies are unique, meaning their list isn't fully contained within someone else's list. We can achieve this efficiently by comparing each person's list to all other lists and marking those that are subsets.

Here's how the algorithm would work step-by-step:

  1. Go through each person's list of favorite companies.
  2. For each person, compare their list with every other person's list.
  3. Check if one person's favorite companies are all found within another person's favorite companies. If so, we have found a subset.
  4. If we find that a person's list is a subset of another person's list, mark that person as not unique.
  5. After comparing a person's list against everyone else's, if they have not been marked as not unique, then their list of favorite companies is not a subset of any other list.
  6. Collect the people who were not marked as not unique; these are the people whose lists are not subsets of any other list.

Code Implementation

def find_non_subset_people(favorite_companies):
    number_of_people = len(favorite_companies)
    non_subset_people = []
    is_subset = [False] * number_of_people

    for i in range(number_of_people):
        # Compare the current person's list with everyone else's.
        for j in range(number_of_people):
            if i == j:
                continue

            # Check if favorite_companies[i] is a subset of favorite_companies[j].
            if set(favorite_companies[i]).issubset(set(favorite_companies[j])):

                # Mark person i as a subset
                is_subset[i] = True

                break

    # Collect people who are not subsets of any other list.
    for i in range(number_of_people):
        if not is_subset[i]:
            non_subset_people.append(i)

    return non_subset_people

Big(O) Analysis

Time Complexity
O(n*m^2)The outer loop iterates through each of the n people. For each person, the inner loop iterates through the remaining people, which takes O(n) time. Inside the inner loop, we are checking if one person's favorite company list is a subset of another person's. The subset check iterates through each element of both lists. Assuming each list on average has m elements, the subset check takes O(m^2) time. So we have n (outer loop) * n (inner loop) * m^2 (subset check), which simplifies to O(n^2 * m^2). However, in this case, the problem description does not indicate that the array length should be equal to the length of internal arrays (each person's favourite companies), so it should be more accurately represented as O(n*m^2) because for each of n people, we only go through the m elements to find whether each is a subset.
Space Complexity
O(N)The algorithm requires a data structure to mark people as 'not unique' if their list is a subset of another list. In the worst-case scenario, we might need to keep track of each person's 'uniqueness' status. This can be achieved using a boolean array or a set, where the size of the array or set is directly proportional to the number of people, N. Therefore, the auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Input list of people is null or emptyReturn an empty list indicating no people can be identified.
A person's favorite company list is null or emptyTreat a person with an empty favorite companies list as not being a subset of any other list, adding their ID to the result if no other person also has an empty list.
Two or more people have identical lists of favorite companiesThe problem doesn't specify distinct people, so treat them as distinct, and only keep one of the duplicated list from the result.
One person's list is exactly the same as all other people's favorite companies.This person will not be in the result, because their list *is* a subset of everyone else's list, unless everyone else has this same list.
Company names contain unusual characters or are case-sensitive but should be treated case-insensitivelyNormalize the company names to lowercase during processing, or use a case-insensitive comparison method.
Maximum number of people or maximum number of companies in a person's favorite list causing memory overflow or performance issuesOptimize for space complexity using appropriate data structures (e.g. sets) and algorithm (avoiding unnecessary comparisons).
Integer overflow when computing hashes of the company listsUse a sufficiently large hash function, consider using modulo operator or cryptographic hash function to prevent collisions
No one's list is a subset of another person's list.Return the list of all people's IDs because every person satisfies the condition.