Taro Logo

High-Access Employees

Medium
Atlassian logo
Atlassian
2 views
Topics:
ArraysStringsSliding Windows

You are given a 2D 0-indexed array of strings, access_times, with size n. For each i where 0 <= i <= n - 1, access_times[i][0] represents the name of an employee, and access_times[i][1] represents the access time of that employee. All entries in access_times are within the same day.

The access time is represented as four digits using a 24-hour time format, for example, "0800" or "2250".

An employee is said to be high-access if he has accessed the system three or more times within a one-hour period.

Times with exactly one hour of difference are not considered part of the same one-hour period. For example, "0815" and "0915" are not part of the same one-hour period.

Access times at the start and end of the day are not counted within the same one-hour period. For example, "0005" and "2350" are not part of the same one-hour period.

Return a list that contains the names of high-access employees with any order you want.

Example 1:

Input: access_times = [["a","0549"],["b","0457"],["a","0532"],["a","0621"],["b","0540"]]
Output: ["a"]
Explanation: "a" has three access times in the one-hour period of [05:32, 06:31] which are 05:32, 05:49, and 06:21.
But "b" does not have more than two access times at all.
So the answer is ["a"].

Example 2:

Input: access_times = [["d","0002"],["c","0808"],["c","0829"],["e","0215"],["d","1508"],["d","1444"],["d","1410"],["c","0809"]]
Output: ["c","d"]
Explanation: "c" has three access times in the one-hour period of [08:08, 09:07] which are 08:08, 08:09, and 08:29.
"d" has also three access times in the one-hour period of [14:10, 15:09] which are 14:10, 14:44, and 15:08.
However, "e" has just one access time, so it can not be in the answer and the final answer is ["c","d"].

Example 3:

Input: access_times = [["cd","1025"],["ab","1025"],["cd","1046"],["cd","1055"],["ab","1124"],["ab","1120"]]
Output: ["ab","cd"]
Explanation: "ab" has three access times in the one-hour period of [10:25, 11:24] which are 10:25, 11:20, and 11:24.
"cd" has also three access times in the one-hour period of [10:25, 11:24] which are 10:25, 10:46, and 10:55.
So the answer is ["ab","cd"].

Constraints:

  • 1 <= access_times.length <= 100
  • access_times[i].length == 2
  • 1 <= access_times[i][0].length <= 10
  • access_times[i][0] consists only of English small letters.
  • access_times[i][1].length == 4
  • access_times[i][1] is in 24-hour time format.
  • access_times[i][1] consists only of '0' to '9'.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the data type of the timestamps? Are they strings, integers representing seconds since epoch, or something else?
  2. How is "high access" defined precisely? Is it a fixed threshold (e.g., accessed at least three times) or a percentage (e.g., accessed in at least half of the timestamps)?
  3. If multiple employees meet the high-access criteria, in what format should I return the list of high-access employees? Should it be sorted?
  4. Can the input list of employee-timestamp pairs be empty or contain null values?
  5. Are the timestamps guaranteed to be chronologically ordered within a single employee's access records, or can they be out of order?

Brute Force Solution

Approach

The brute force approach to finding high-access employees means we'll check every possible combination of employees and times. We'll look at each employee and see if they meet the access criteria by checking all their access times against everyone else's.

Here's how the algorithm would work step-by-step:

  1. For each employee, look at each of their access times one by one.
  2. For each access time of that employee, count how many other employees accessed the system within the specified time window.
  3. To do this, compare that access time with every other employee's access times.
  4. If the number of other employees who accessed the system within the window is high enough, mark that employee as a high-access employee.
  5. Repeat this process for every employee and every one of their access times.
  6. At the end, collect all the employees who were marked as high-access.

Code Implementation

def high_access_employees_brute_force(access_times):
   high_access_employees = set()
   
   for employee_access_data in access_times:
       employee_id = employee_access_data[0]
       access_time = int(employee_access_data[1])
       
       access_count = 0
       
       #Check every other employee and their access times
       for other_employee_access_data in access_times:
           other_employee_id = other_employee_access_data[0]
           other_access_time = int(other_employee_access_data[1])
           
           #We need to exclude comparing the same access time to itself.
           if employee_id == other_employee_id and access_time == other_access_time:
               continue
           
           #Check the access time window to count employees within it
           if abs(access_time - other_access_time) < 60:
               access_count += 1
       
       if access_count >= 2:
           #Mark employee if the access count meets criteria
           high_access_employees.add(employee_id)
\
   return list(high_access_employees)

Big(O) Analysis

Time Complexity
O(n^3)Let 'n' be the total number of access records across all employees. The algorithm iterates through each employee and each of their access times. For each access time, it compares it with the access times of all other employees to count those within the specified time window. This involves nested loops: one to iterate through employees and their access times, and another to compare each access time with every other access time. Therefore, the time complexity is proportional to n * n * k, where k is the average number of accesses per employee which is less than or equal to n. This results in O(n^3) time complexity.
Space Complexity
O(1)The described brute force approach iterates through the input but doesn't mention creating any auxiliary data structures that scale with the input size. There are no temporary arrays, hash maps, or other collections used to store intermediate results related to the number of employees or their access times. Therefore, the space complexity remains constant, regardless of the input size (N), where N is the total number of access records. The algorithm appears to only use a few variables for counting and comparisons, resulting in O(1) space.

Optimal Solution

Approach

The problem requires finding employees who have access to the system at the same time as other employees. We can efficiently solve it by tracking how many employees are actively logged in at any moment and pinpointing times when the count is high. This avoids checking every possible combination of employees.

Here's how the algorithm would work step-by-step:

  1. First, create a timeline of when employees log in and log out of the system, sorted by time.
  2. Then, go through the timeline step by step. Whenever an employee logs in, increase a counter that tracks the number of currently logged-in employees.
  3. Whenever an employee logs out, decrease the counter.
  4. While going through the timeline, check if the counter exceeds a given threshold. If it does, save the employee names that are currently logged in.
  5. Once you've processed the entire timeline, you will have the list of employees who were logged in when the number of active users was above the threshold.

Code Implementation

def find_high_access_employees(access_times, threshold):
    timeline_events = []
    for employee, start_time, end_time in access_times:
        timeline_events.append((start_time, 'start', employee))
        timeline_events.append((end_time, 'end', employee))

    timeline_events.sort()
    
    active_employees = set()
    high_access_employees = set()
    employee_count = 0

    for time, event_type, employee in timeline_events:
        if event_type == 'start':
            # Increment employee count when an employee logs in
            employee_count += 1
            active_employees.add(employee)

            if employee_count > threshold:
                # If count exceeds threshold, add logged in employees
                high_access_employees.update(active_employees)

        else:
            # Decrement employee count when an employee logs out
            employee_count -= 1
            active_employees.remove(employee)

    return list(high_access_employees)

Big(O) Analysis

Time Complexity
O(n log n)The dominant operation is sorting the timeline of login and logout events. If there are 'n' login/logout events (where 'n' is twice the number of employees if each employee logs in and out once), sorting this timeline takes O(n log n) time. Iterating through the sorted timeline to update the counter and identify high-access employees takes O(n) time. Since O(n log n) grows faster than O(n), the overall time complexity is O(n log n).
Space Complexity
O(N)The auxiliary space is dominated by the sorted timeline of login and logout events, which requires storing up to 2N events (N login and N logout events in the worst case), where N is the number of employees. Additionally, space is needed to store the list of currently logged-in employees when the threshold is exceeded; in the worst case, this list could contain all N employees. Therefore, the space complexity is O(N) due to the timeline and the potential storage of all employees logged in concurrently.

Edge Cases

CaseHow to Handle
Empty access_times listReturn an empty list since there are no employees to analyze.
Access times list with only one employeeReturn an empty list, as no pair of employees can be compared.
Very large access_times list (scalability)The solution should use efficient data structures (hash maps) and algorithms (sorting) to avoid exceeding time limits.
All employees have identical access timesThe code should correctly identify those employees who have high access given the same times.
Extreme access times (start or end of the day)Ensure the comparison logic handles wrap-around cases correctly, accounting for the 24-hour clock.
Multiple sets of employees exceed the access thresholdThe solution should return all employees who meet the high-access criteria, not just the first found set.
Input access times are not sortedSort the access times for each employee to make comparison efficient; if sort is not applied, then the solution would incorrectly report employees.
Access times are invalid or out of rangeValidate the access times to be within reasonable range (00:00 to 23:59) and throw or log an error if invalid.