Taro Logo

Reward Top K Students

Medium
Booking.com logo
Booking.com
3 views
Topics:
ArraysStrings

You are given two string arrays positive_feedback and negative_feedback, containing the words denoting positive and negative feedback, respectively. Note that no word is both positive and negative.

Initially every student has 0 points. Each positive word in a feedback report increases the points of a student by 3, whereas each negative word decreases the points by 1.

You are given n feedback reports, represented by a 0-indexed string array report and a 0-indexed integer array student_id, where student_id[i] represents the ID of the student who has received the feedback report report[i]. The ID of each student is unique.

Given an integer k, return the top k students after ranking them in non-increasing order by their points. In case more than one student has the same points, the one with the lower ID ranks higher.

Example 1:

Input: positive_feedback = ["smart","brilliant","studious"], negative_feedback = ["not"], report = ["this student is studious","the student is smart"], student_id = [1,2], k = 2
Output: [1,2]
Explanation: 
Both the students have 1 positive feedback and 3 points but since student 1 has a lower ID he ranks higher.

Example 2:

Input: positive_feedback = ["smart","brilliant","studious"], negative_feedback = ["not"], report = ["this student is not studious","the student is smart"], student_id = [1,2], k = 2
Output: [2,1]
Explanation: 
- The student with ID 1 has 1 positive feedback and 1 negative feedback, so he has 3-1=2 points. 
- The student with ID 2 has 1 positive feedback, so he has 3 points. 
Since student 2 has more points, [2,1] is returned.

Constraints:

  • 1 <= positive_feedback.length, negative_feedback.length <= 104
  • 1 <= positive_feedback[i].length, negative_feedback[j].length <= 100
  • Both positive_feedback[i] and negative_feedback[j] consists of lowercase English letters.
  • No word is present in both positive_feedback and negative_feedback.
  • n == report.length == student_id.length
  • 1 <= n <= 104
  • report[i] consists of lowercase English letters and spaces ' '.
  • There is a single space between consecutive words of report[i].
  • 1 <= report[i].length <= 100
  • 1 <= student_id[i] <= 109
  • All the values of student_id[i] are unique.
  • 1 <= k <= n

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the possible ranges for the student IDs and the reward values?
  2. Can the input arrays `positive_feedback`, `negative_feedback`, and `report` contain duplicate words?
  3. If multiple students have the same reward score, how should I rank them to select the top K? (e.g., by student ID, order in report array)
  4. What should the return type be? (e.g., a list of student IDs, ordered by rank)
  5. Is it possible for K to be larger than the number of students? If so, what should I return?

Brute Force Solution

Approach

The brute force approach to finding the top students is like checking every single student individually. We will calculate a score for each student based on the positive and negative feedback words and then compare everyone's score to find the top ones.

Here's how the algorithm would work step-by-step:

  1. Go through the list of student reports, one by one.
  2. For each student's report, check every word in their report to see if it is a positive feedback word or a negative feedback word.
  3. If a word is a positive feedback word, add a point to the student's score.
  4. If a word is a negative feedback word, subtract a point from the student's score.
  5. Once you've checked all the words in a student's report, you have their total score.
  6. Keep track of each student's score.
  7. After calculating the score for every student, sort the students based on their scores from highest to lowest.
  8. Take the top 'K' students from the sorted list. These are the students with the highest scores.

Code Implementation

def reward_top_k_students_brute_force(positive_feedback, negative_feedback, student_reports, k_students):

    student_scores = []
    for report in student_reports:
        score = 0
        words = report.split()

        for word in words:
            if word in positive_feedback:
                # Increment score for positive feedback
                score += 1

            if word in negative_feedback:
                # Decrement score for negative feedback
                score -= 1

        student_scores.append(score)

    # Create a list of (student_id, score) pairs
    student_ids_with_scores = list(enumerate(student_scores))

    # Sort students based on their scores in descending order
    student_ids_with_scores.sort(key=lambda x: x[1], reverse=True)

    # Extract the student IDs of the top K students
    top_k_students = [student_id for student_id, score in student_ids_with_scores[:k_students]]

    return top_k_students

Big(O) Analysis

Time Complexity
O(n log n)Iterating through the reports list takes O(n) time, where n is the number of student reports. For each report, checking the words against positive and negative feedbacks has a time complexity dependent on the report length, but in the worst case is bounded. After calculating the score for each student, we sort the scores, which takes O(n log n) time. Since sorting dominates the time complexity, the overall time complexity is O(n log n).
Space Complexity
O(N)The dominant space usage comes from keeping track of each student's score. We need to store the calculated score for every student report. Assuming there are N student reports, this requires an auxiliary array or dictionary to store N scores. Therefore, the space complexity is O(N).

Optimal Solution

Approach

To efficiently find the top students, we'll use a system to quickly rank them based on positive and negative feedback. The goal is to avoid comparing every single student directly to each other, saving a lot of time.

Here's how the algorithm would work step-by-step:

  1. First, give each positive word a certain score, and each negative word a different (negative) score.
  2. Then, for each student's feedback, calculate their total score by adding up the scores of the words they received.
  3. Now, put all the students into a system that automatically keeps track of the top K highest scores. Only maintain the top K students in this system.
  4. Finally, extract the students in the top K to get the students with the highest scores.

Code Implementation

import heapq

def reward_top_k_students(student_ids, student_reports, positive_feedback, negative_feedback, k):
    positive_words = set(positive_feedback)
    negative_words = set(negative_feedback)
    student_scores = {}

    for index, student_id in enumerate(student_ids):
        student_scores[student_id] = 0
        report_words = student_reports[index].split()

        for word in report_words:
            if word in positive_words:
                student_scores[student_id] += 3
            elif word in negative_words:
                student_scores[student_id] -= 1

    # Use a min-heap to keep track of the top K students by score
    top_k_heap = []

    for student_id, score in student_scores.items():
        heapq.heappush(top_k_heap, (score, student_id))

        # Only keep the k highest scoring students
        if len(top_k_heap) > k:
            heapq.heappop(top_k_heap)

    # Extract student IDs from the heap and reverse for descending order
    top_k_students = [student_id for score, student_id in sorted(top_k_heap, reverse=True)]

    return top_k_students

Big(O) Analysis

Time Complexity
O(n log k)The time complexity is determined by several factors. Let n be the number of students. The initial scoring of each student involves iterating through their feedback (which is assumed to be relatively small compared to the number of students) and the dictionaries of positive and negative words. This can be treated as O(1) per student. The dominant factor is inserting each student's score into a data structure (likely a min-heap) that maintains the top k scores. Each insertion takes O(log k) time. Since we insert n students, the total time complexity is O(n log k).
Space Complexity
O(K)The dominant space complexity comes from maintaining the top K students. The system described in the problem needs to store these K students, likely in a data structure like a min-heap or a simple array/list. Other variables, such as scores for individual students, take up constant space. Therefore, the auxiliary space required is proportional to K, resulting in O(K) space complexity.

Edge Cases

CaseHow to Handle
Empty positive_feedback or negative_feedback arrayInitialize the feedback words as empty sets to avoid errors when checking for inclusion.
Empty report arrayReturn an empty list if no student reports are available.
k is zeroReturn an empty list if k is zero, as no students need to be rewarded.
k is greater than the number of studentsReturn the sorted reward list of all students if k exceeds the total number of students.
Reports contain empty stringsHandle empty strings in reports by skipping them or treating them as not containing feedback words to avoid errors.
Large number of students or long reports causing memory issuesConsider using efficient data structures or algorithms (e.g., min-heap of fixed size k) to minimize memory usage.
Tie scores for top K students.Sort the scores based on student ID as a secondary key to ensure a deterministic result in the event of ties.
Very long reports which may cause timeouts due to excessive processing.Optimize the report processing by using efficient string search algorithms or limiting the number of words processed per report.