Taro Logo

High Five

Easy
Goldman Sachs logo
Goldman Sachs
5 views
Topics:
ArraysGreedy AlgorithmsStacks

Given a 2D binary matrix matrix of size m x n, return the number of submatrices that sum to target.

A submatrix x1, y1, x2, y2 is the set of all cells matrix[x][y] with x1 <= x <= x2 and y1 <= y <= y2.

If two submatrices differ in any coordinate, they are considered different.

Example 1:

Input: matrix = [[0,1,0],[1,1,0],[0,1,0]], target = 0
Output: 4
Explanation: The four 1x1 submatrices that only contain 0.

Example 2:

Input: matrix = [[1,-1],[-1,1]], target = 0
Output: 5
Explanation: The two 1x2 submatrices, plus the two 2x1 submatrices, plus the 2x2 submatrix.

Example 3:

Input: matrix = [[904]], target = 0
Output: 0

Constraints:

  • m == matrix.length
  • n == matrix[i].length
  • 1 <= m, n <= 100
  • -1000 <= matrix[i][j] <= 1000
  • -108 <= target <= 108

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible score ranges for each student, and what data type should I use to store them?
  2. What should I return if a student has fewer than 5 scores, or if the input is empty or null?
  3. If there are multiple 'high five' scores for a student (more than five), how should I handle ties?
  4. Can I assume that student IDs are unique, and that each student will have at least one score?
  5. How should the output be formatted? Should it be a map/dictionary of student ID to average high five scores, or something else?

Brute Force Solution

Approach

The goal is to find the average of the top five scores for each student. The brute force method involves going through all the scores for each student and explicitly finding the five highest scores before calculating their average.

Here's how the algorithm would work step-by-step:

  1. For each student, collect all of their scores.
  2. Among the scores for that student, identify the five highest scores. If a student has fewer than five scores, select all available scores.
  3. Calculate the sum of these top five scores.
  4. Divide the sum by five (or by the actual number of scores if the student has fewer than five scores) to get the average.
  5. Store the student's ID and their calculated average.
  6. Repeat this process for every student in the dataset.

Code Implementation

def high_five_brute_force(student_scores):
    student_grades = {}
    
    for student_id, score in student_scores:
        if student_id not in student_grades:
            student_grades[student_id] = []
        student_grades[student_id].append(score)

    average_high_scores = []

    for student_id, scores in student_grades.items():
        # Sort scores to easily pick the top five

        scores.sort(reverse=True)
        
        # Limit to the top five scores or fewer
        top_five_scores = scores[:5]
        
        # Calculate the average of the top scores

        average_score = sum(top_five_scores) / len(top_five_scores)
        average_high_scores.append([student_id, int(average_score)])
    
    return average_high_scores

Big(O) Analysis

Time Complexity
O(n*k*log(k))Let n be the number of students and k be the number of scores per student (in the worst case, every entry in the input represents a different student). For each student, we need to find the top five scores. Collecting the scores for each student takes O(k) time. Identifying the top five scores for each student from a list of k scores takes O(k*log(k)) if we sort them (or use a heap). We repeat this for each student, hence O(n*k*log(k)).
Space Complexity
O(N)The described solution iterates through the input and, for each student, it collects all of their scores in a temporary data structure like a list or array. In the worst-case scenario, a single student could have all N scores, where N is the total number of scores in the input. Therefore, the auxiliary space required to store the scores for that student would be proportional to N. Hence the space complexity is O(N).

Optimal Solution

Approach

The task is to find the average of the top five scores for each student. We can keep track of each student's scores, sort them, and then sum the top five before dividing. Using a method to efficiently manage and extract the top scores leads to the optimal solution.

Here's how the algorithm would work step-by-step:

  1. First, organize the data by grouping all scores belonging to the same student together.
  2. Then, for each student, determine their top five scores.
  3. To do this efficiently, arrange the scores in order from highest to lowest. You only need to focus on the top five, not the entire list.
  4. Once you have the top five scores for a student, add them up to get the total.
  5. Finally, divide the total by five to find the average of those top five scores.
  6. Repeat this process for each student to get their individual average.

Code Implementation

def calculate_high_five(student_scores):
    student_grades = {}
    for student_id, score in student_scores:
        if student_id not in student_grades:
            student_grades[student_id] = []
        student_grades[student_id].append(score)

    result = []
    for student_id, scores in student_grades.items():
        # Sort scores in descending order to easily get top 5
        scores.sort(reverse=True)

        top_five_scores = scores[:5]

        # Sum the top five scores for averaging
        total_of_top_five = sum(top_five_scores)

        average_of_top_five = total_of_top_five // 5
        result.append([student_id, average_of_top_five])

    return result

Big(O) Analysis

Time Complexity
O(n log n)Assuming the input array has 'n' student-score pairs, the first step involves grouping scores by student. In the worst case, each student has a different number of scores. For each student, we sort their scores to find the top five. Sorting takes O(k log k) time where k is the number of scores for a student. In the worst case, one student has almost all 'n' scores, resulting in O(n log n) time complexity for sorting. The remaining steps of summing the top five scores and calculating the average take constant time, O(1), for each student. Thus, the dominant factor is the sorting, leading to an overall time complexity of O(n log n).
Space Complexity
O(N)The primary space usage comes from grouping scores by student. This requires a data structure, likely a hash map, to store each student's scores. In the worst-case scenario, each score in the input belongs to a different student, resulting in a hash map entry for each of the N scores. Therefore, the auxiliary space used grows linearly with the number of input scores, N, making the space complexity O(N).

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn an empty list or throw an IllegalArgumentException as no high five scores exist.
Input array with fewer than 5 student scoresReturn an empty list or throw an exception indicating insufficient data for high five calculation.
A student has fewer than 5 scoresPad the student's scores with a default value (e.g., 0) or skip them, documenting the chosen behavior.
Scores contain negative valuesHandle negative scores by either considering them valid scores within a range or filtering/rejecting them based on problem specification.
Scores contain extremely large values leading to potential integer overflowUse data types that can accommodate large values (e.g., long) or normalize scores to a smaller range.
Multiple students have the same IDEnsure that the student ID is not treated as unique and aggregate all scores for each unique student when averaging.
All scores are identical for one or more studentsThe top 5 selection should still work correctly, returning these identical scores as the 'high five'.
A student only has scores of 0The algorithm should correctly identify and handle these zero values as valid scores within the context of finding the top 5.