Taro Logo

Find the Most Common Response

Medium
Meta logo
Meta
4 views
Topics:
ArraysStrings

You are given a 2D string array responses where each responses[i] is an array of strings representing survey responses from the ith day.

Return the most common response across all days after removing duplicate responses within each responses[i]. If there is a tie, return the lexicographically smallest response.

Example 1:

Input: responses = [["good","ok","good","ok"],["ok","bad","good","ok","ok"],["good"],["bad"]]

Output: "good"

Explanation:

  • After removing duplicates within each list, responses = [["good", "ok"], ["ok", "bad", "good"], ["good"], ["bad"]].
  • "good" appears 3 times, "ok" appears 2 times, and "bad" appears 2 times.
  • Return "good" because it has the highest frequency.

Example 2:

Input: responses = [["good","ok","good"],["ok","bad"],["bad","notsure"],["great","good"]]

Output: "bad"

Explanation:

  • After removing duplicates within each list we have responses = [["good", "ok"], ["ok", "bad"], ["bad", "notsure"], ["great", "good"]].
  • "bad", "good", and "ok" each occur 2 times.
  • The output is "bad" because it is the lexicographically smallest amongst the words with the highest frequency.

Constraints:

  • 1 <= responses.length <= 1000
  • 1 <= responses[i].length <= 1000
  • 1 <= responses[i][j].length <= 10
  • responses[i][j] consists of only lowercase English letters

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What data type will the responses be (e.g., strings, integers, objects)?
  2. What is considered a 'response'? Can it be an empty string or null?
  3. If there are multiple responses with the same highest frequency, which one should I return? Should I return one arbitrarily, or all of them in a specific order?
  4. What is the maximum number of unique responses expected? What is the maximum length of each individual response (assuming strings)?
  5. Is the input guaranteed to be non-empty? If the input is empty, what should I return (e.g., null, empty string, or throw an exception)?

Brute Force Solution

Approach

The brute force approach to finding the most common response involves exhaustively checking the frequency of each response. It's like manually counting how many times each response appears in the whole set of responses. Finally, we identify the one that showed up the most.

Here's how the algorithm would work step-by-step:

  1. Take the first response in the list.
  2. Go through the entire list and count how many times this first response appears.
  3. Remember that count.
  4. Now, take the second response in the list.
  5. Again, go through the entire list and count how many times the second response appears.
  6. Remember that count.
  7. Do this for every single response in the list.
  8. After counting all the responses, compare all the counts you remembered.
  9. The response with the highest count is the most common response.

Code Implementation

def find_most_common_response_brute_force(responses):

    most_common_response = None
    maximum_count = 0

    # Iterate through each response in the input list.
    for current_response in responses:

        current_response_count = 0

        # Count the occurrences of the current response.
        for response in responses:
            if response == current_response:
                current_response_count += 1

        # Update the most common response if the current response's count is higher.
        if current_response_count > maximum_count:
            maximum_count = current_response_count
            most_common_response = current_response

    return most_common_response

Big(O) Analysis

Time Complexity
O(n²)The brute force approach iterates through the list of responses. For each response, it iterates through the entire list again to count its occurrences. If n is the number of responses, the outer loop runs n times, and the inner loop also runs n times for each iteration of the outer loop. Therefore, the total number of operations is proportional to n * n, resulting in O(n²).
Space Complexity
O(1)The brute force approach described keeps track of only a few counts at a time: the current count for a given response and the highest count found so far. It doesn't store any extra data structures that scale with the input size N (the number of responses). Thus, the auxiliary space required remains constant irrespective of the input size. This implies that the space complexity is O(1).

Optimal Solution

Approach

The goal is to figure out which answer appears most often in a list. A clever approach is to count how many times each answer occurs, and then simply pick the answer with the highest count. This avoids comparing every answer to every other answer.

Here's how the algorithm would work step-by-step:

  1. Start by making a tally board for each unique answer.
  2. Go through the list of answers, and for each answer, add one to its tally.
  3. Once you've tallied all the answers, look at the tally board to find the answer with the biggest number.
  4. The answer with the biggest number is the most common response.

Code Implementation

def find_most_common_response(list_of_responses):
    response_tallies = {}

    # Tally each response to count occurrences.
    for response in list_of_responses:
        if response in response_tallies:
            response_tallies[response] += 1
        else:
            response_tallies[response] = 1

    most_common_response = None
    highest_tally = 0

    # Find the response with the highest tally.
    for response, tally in response_tallies.items():

        # Update the most common response.
        if tally > highest_tally:

            most_common_response = response
            highest_tally = tally

    return most_common_response

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the list of answers once to create a tally board (hash map or dictionary) where each unique answer's count is stored. In the worst-case scenario, if all answers are unique, the size of the tally board will be proportional to n, where n is the number of answers. After creating the tally board, it iterates through the board to find the answer with the highest count, which takes time proportional to the number of unique answers, but still bounded by n. Therefore, the overall time complexity is dominated by the single pass through the input list, making it O(n).
Space Complexity
O(N)The algorithm's space complexity is determined by the tally board, which stores counts for each unique answer. In the worst case, all N answers in the input list are unique. Therefore, the tally board will store N key-value pairs, where keys are the unique answers and values are their counts. This auxiliary data structure requires space proportional to the number of unique answers, which can be up to N. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or undefined input listReturn an appropriate error value or throw an exception if the input is null or undefined, as no processing is possible.
Empty input listReturn a default value indicating no most common response was found such as null, empty string or appropriate error.
List with all identical responsesThe solution should correctly identify and return the repeated response as the most common.
Maximum-sized input list exceeding available memoryConsider a streaming approach or chunking the data if the input list is too large to fit in memory.
Multiple responses with the same highest frequencyReturn one of the responses arbitrarily or all responses if problem requires it (e.g., return a list).
Input list contains non-comparable data typesEnsure input is homogenous and comparable or convert/filter as appropriate.
Case sensitivity for string responsesNormalize the casing (e.g., to lowercase) before counting occurrences to treat different cases as the same response if needed.
Input contains very long strings or very large numbersEnsure the counting mechanism can handle large strings/numbers to prevent hash collisions or overflow errors.