Taro Logo

Analyze User Website Visit Pattern

Medium
Spotify logo
Spotify
2 views
Topics:
ArraysTwo PointersSliding Windows

You are given three arrays of strings username, timestamp and website of the same length n, where username[i], timestamp[i] and website[i] indicate that user username[i] visited the website website[i] at time timestamp[i].

A pattern is a list of three websites (not necessarily distinct) visited by one user.

  • For example, ["home", "away", "love"], ["leetcode", "love", "leetcode"], and ["luffy", "luffy", "luffy"] are all patterns.

The score of a pattern is the number of users that visited in order all the websites in the pattern.

  • For example, if the pattern is ["home", "away", "love"], and users ["alice","bob","alice","eve"] visited the websites ["home","away","away","home"], ["away","love","love","away"], ["home","away","love","love"], then the score of the pattern is 2 because both "alice" and "eve" visited in order the three websites in the pattern.

You are asked to find the most frequent pattern. If there is more than one most frequent pattern, return the lexicographically smallest one.

Return the most frequent pattern.

Example 1:

Input: username = ["joe","joe","joe","james","james","james","james","mary","mary","mary"], timestamp = [1,2,3,4,5,6,7,8,9,10], website = ["home","about","career","home","cart","maps","home","home","about","career"]
Output: ["home","about","career"]
Explanation: The pattern ["home","about","career"] has score 2 (joe and mary).
The pattern ["home","cart","maps"] has score 1 (james).
The pattern ["home","cart","home"] has score 1 (james).
The pattern ["home","maps","home"] has score 1 (james).
The pattern ["cart","maps","home"] has score 1 (james).
So the pattern ["home","about","career"] is the most frequent pattern, so you should return it.

Example 2:

Input: username = ["uaice","uaice","uaice","alice","alice","alice","alice","bob","bob","bob"], timestamp = [1,2,3,4,5,6,7,8,9,10], website = ["a","b","c","a","b","c","d","a","b","c"]
Output: ["a","b","c"]

Constraints:

  • n == username.length == timestamp.length == website.length
  • 3 <= n <= 50
  • 1 <= username[i].length <= 10
  • 0 <= timestamp[i] <= 109
  • 1 <= website[i].length <= 10
  • username[i] and website[i] consist of lowercase English letters.
  • It is guaranteed that there is at least one user who visited at least three websites.
  • All the timestamps are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can you provide the constraints on the size of the input arrays (username, timestamp, website)?
  2. If multiple visiting patterns have the same highest similarity score, what is the criteria for lexicographical order? Specifically, how are strings compared?
  3. Are the timestamps unique for a given user, or is it possible for a user to visit multiple websites at the same timestamp? If so, how should I handle that?
  4. Is it possible for any of the input arrays to be empty or null? If so, what should the function return in those cases?
  5. Is it guaranteed that each user will visit at least 3 websites? If a user visits fewer than 3 websites, should they be considered at all?

Brute Force Solution

Approach

To find the most frequent user visit pattern on a website, the brute force approach considers every possible sequence of three website visits a user makes. We systematically examine all combinations of three websites visited by each user to see which combination occurs most often across all users.

Here's how the algorithm would work step-by-step:

  1. First, for each user, list out all the websites they visited and the order they visited them in.
  2. Then, for each user, find all possible combinations of three websites they visited in the order they visited them.
  3. Now, look at all the three-website sequences generated from all users.
  4. Count how many times each three-website sequence appears across all users.
  5. Finally, identify the three-website sequence that appears the most often. If there's a tie, pick the one that comes earlier alphabetically.

Code Implementation

def analyze_user_website_visit_pattern(username, timestamp, website):
    user_website_visits = {}
    for i in range(len(username)):
        user = username[i]
        time = timestamp[i]
        site = website[i]
        if user not in user_website_visits:
            user_website_visits[user] = []
        user_website_visits[user].append((time, site))

    for user in user_website_visits:
        user_website_visits[user].sort()

    all_sequences = {}

    for user, visits in user_website_visits.items():
        websites = [visit[1] for visit in visits]
        sequences = set()
        for i in range(len(websites)):
            for j in range(i + 1, len(websites)):
                for k in range(j + 1, len(websites)):
                    sequences.add((websites[i], websites[j], websites[k]))

        # Count the frequency of each sequence
        for sequence in sequences:
            if sequence not in all_sequences:
                all_sequences[sequence] = 0
            all_sequences[sequence] += 1

    most_frequent_sequence = None
    max_frequency = 0

    # Find the sequence with the highest frequency
    for sequence, frequency in all_sequences.items():
        if frequency > max_frequency:
            max_frequency = frequency
            most_frequent_sequence = sequence
        elif frequency == max_frequency:
            if sequence < most_frequent_sequence:
                most_frequent_sequence = sequence

    return list(most_frequent_sequence)

Big(O) Analysis

Time Complexity
O(N * U + U * C^3) – Let N be the total number of users and U be the maximum number of websites visited by any single user. For each user, we first iterate through their website visits which takes O(U) time. Then, for each user, we generate all possible combinations of three websites, which would be proportional to U choose 3 or U!/(3!(U-3)!), which simplifies to U * (U-1) * (U-2) / 6, approximating O(U^3). Since we have to repeat this for all N users, the combinations part becomes O(N * U^3). After we get all the three-website sequences, in worst case, we need to count the frequency of each, which could involve examining each sequence once, adding a linear contribution of O(N * U^3). However we are accumulating the sequences in a hashmap/dictionary, leading to lookup and update times considered negligible compared to the combination generation. Thus the final time complexity is dominated by the combination generation step performed for each user, yielding approximately O(N * U^3) which can be expressed as O(N * U + U * C^3) where C is the combinations of website visits for all users.
Space Complexity
O(N^3) – The algorithm generates all possible three-website sequences for each user. In the worst case, a user could have visited N websites, leading to O(N^3) combinations per user. Storing these combinations for all users requires space proportional to the total number of combinations generated across all users, which can be approximated as O(N^3) in the worst-case scenario where the average user visits a significant portion of the total website set, where N is the number of websites visited by a single user in the worst case. Furthermore, a hash map or similar data structure is used to count the frequency of these three-website sequences, which in the worst case, stores all the generated combinations, resulting in O(N^3) space. Therefore the overall auxiliary space complexity is O(N^3).

Optimal Solution

Approach

The goal is to find the most frequent sequence of three website pages visited by each user. The clever approach involves organizing visits chronologically and then efficiently identifying all possible three-page sequences for each person.

Here's how the algorithm would work step-by-step:

  1. First, organize all website visits by each user, making sure the visits are in the correct order based on time.
  2. For each user, create every possible sequence of three pages they visited. For example, if someone visited A, B, C, D, you would generate the sequences ABC, BCD.
  3. Keep track of how many times each three-page sequence appears across all users.
  4. Find the sequence that appears most often. If there are ties, sort them alphabetically to pick the very first one.
  5. Return the most frequent sequence, or the first one if there's a tie.

Code Implementation

def analyze_user_website_visit_pattern(username, timestamp, website):
    user_visits = {}
    for i in range(len(username)):
        user = username[i]
        time = timestamp[i]
        webpage = website[i]
        if user not in user_visits:
            user_visits[user] = []
        user_visits[user].append((time, webpage))

    three_sequence_counts = {}
    for user, visits in user_visits.items():
        # Sort visits chronologically for each user.
        visits.sort()
        webpages = [visit[1] for visit in visits]
        sequences = set()
        # Generate all possible three-page sequences.
        for i in range(len(webpages) - 2):
            sequence = (webpages[i], webpages[i+1], webpages[i+2])
            sequences.add(sequence)

        for sequence in sequences:
            if sequence not in three_sequence_counts:
                three_sequence_counts[sequence] = 0
            three_sequence_counts[sequence] += 1

    most_frequent_sequence = None
    max_count = 0

    #Find the sequence with the highest frequency,
    #or the smallest sequence if there are ties.
    for sequence, count in three_sequence_counts.items():
        if count > max_count:
            max_count = count
            most_frequent_sequence = sequence
        elif count == max_count:
            if sequence < most_frequent_sequence:
                most_frequent_sequence = sequence

    return list(most_frequent_sequence)

Big(O) Analysis

Time Complexity
O(N^3) – Let N be the total number of visits across all users. First, organizing visits by user involves sorting, which can take O(N log N) time, but this is dominated by subsequent steps. For each user, generating all possible three-page sequences from their visits (up to N visits in total) results in at most O(N^3) combinations. Finally, counting the frequency of each three-page sequence across all users requires iterating through these sequences, taking O(N^3) time in the worst case. Therefore, the overall time complexity is approximately O(N^3).
Space Complexity
O(N^3) – The primary space consumption arises from storing website visits organized by user and the count of three-page sequences. Organizing visits by user requires space proportional to N, where N is the total number of website visits. Generating and storing all possible three-page sequences for each user, in the worst case where a single user visits almost all N sites, requires storing combinations of size 3, leading to O(N^3) space for the count map. This count map dominates the auxiliary space usage, as it stores counts for each unique three-page sequence encountered across all users. Therefore, the overall auxiliary space complexity is O(N^3).

Edge Cases

CaseHow to Handle
Empty username, timestamp, or website arraysReturn an empty list immediately as no patterns can be formed.
Arrays of different lengthsThrow an IllegalArgumentException or return an empty list if array lengths differ.
Arrays with fewer than 3 visits by any single userIgnore users with fewer than 3 visits when generating patterns.
Large input arrays (performance concerns)Ensure the solution uses efficient data structures (e.g., HashMaps) and algorithms to handle large inputs without exceeding time limits.
Multiple patterns with the same highest similarity scoreStore all patterns with the highest score and return the lexicographically smallest one.
Timestamps not in ascending order for a given userSort the visits by timestamp for each user before generating patterns.
Users visiting the same website multiple times in a rowEnsure the pattern generation logic considers consecutive visits to the same website as part of valid 3-sequence patterns.
All users have the same visit patternsThe solution correctly identifies and returns the most frequent pattern even when all users share it.