Taro Logo

Tweet Counts Per Frequency

Medium
Google logo
Google
1 view

A social media company wants to monitor activity on their site by analyzing the number of tweets in select time periods. These periods are divided into smaller time chunks based on frequency (every minute, hour, or day).

For example, the period [10, 10000] (in seconds) partitions into:

  • Minute (60-second chunks): [10,69], [70,129], ..., [9970,10000]
  • Hour (3600-second chunks): [10,3609], [3610,7209], [7210,10000]
  • Day (86400-second chunks): [10,10000]

The last chunk can be shorter than the frequency's chunk size, ending at the period's end time.

Design an API to help the company analyze their tweets.

Implement the TweetCounts class:

  • TweetCounts(): Initializes the object.
  • void recordTweet(String tweetName, int time): Stores tweetName at time (in seconds).
  • List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime): Returns the number of tweets with tweetName in each time chunk for [startTime, endTime] and frequency freq.
    • freq is "minute", "hour", or "day".

Example:

Input:
["TweetCounts","recordTweet","recordTweet","recordTweet","getTweetCountsPerFrequency","getTweetCountsPerFrequency","recordTweet","getTweetCountsPerFrequency"]
[[],["tweet3",0],["tweet3",60],["tweet3",10],["minute","tweet3",0,59],["minute","tweet3",0,60],["tweet3",120],["hour","tweet3",0,210]]

Output:
[null,null,null,null,[2],[2,1],null,[4]]

How would you implement this class efficiently? What are the time and space complexities of the operations?

Solution


TweetCounts Class Design

This problem requires designing a TweetCounts class to record tweets and retrieve tweet counts within specified time chunks for different frequencies.

Naive Approach

A straightforward approach is to store the tweets in a hash map where the key is the tweet name and the value is a list of timestamps. For the getTweetCountsPerFrequency method, iterate through the list of timestamps for the given tweet name and count the tweets within each time chunk.

Implementation Details:

  • Data Structures:
    • Map<String, List<Integer>> tweets: Stores tweet names as keys and a sorted list of timestamps as values.
  • recordTweet(String tweetName, int time):
    • Adds the time to the list of timestamps associated with the tweetName in the tweets map. If the tweet name doesn't exist, create a new list.
  • getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime):
    • Determines the chunk size based on freq (minute: 60, hour: 3600, day: 86400).
    • Iterates through the time range [startTime, endTime] in chunks.
    • For each chunk, iterates through the timestamps of the given tweetName and counts the tweets that fall within the current chunk.

Code (Java):

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

class TweetCounts {

    private Map<String, List<Integer>> tweets;

    public TweetCounts() {
        tweets = new HashMap<>();
    }

    public void recordTweet(String tweetName, int time) {
        tweets.computeIfAbsent(tweetName, k -> new ArrayList<>()).add(time);
    }

    public List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime) {
        int chunkSize = 60; // minute
        if (freq.equals("hour")) {
            chunkSize = 3600;
        } else if (freq.equals("day")) {
            chunkSize = 86400;
        }

        List<Integer> result = new ArrayList<>();
        for (int i = startTime; i <= endTime; i += chunkSize) {
            int chunkEnd = Math.min(i + chunkSize - 1, endTime);
            int count = 0;
            if (tweets.containsKey(tweetName)) {
                for (int time : tweets.get(tweetName)) {
                    if (time >= i && time <= chunkEnd) {
                        count++;
                    }
                }
            }
            result.add(count);
        }
        return result;
    }
}

Time and Space Complexity (Naive Approach):

  • recordTweet: O(1) (average case) for adding to the list.
  • getTweetCountsPerFrequency: O(N * M), where N is the number of chunks and M is the number of tweets for a given tweetName. This is because, for each chunk, we iterate through all the tweets for the given tweetName.
  • Space Complexity: O(T), where T is the total number of tweets recorded.

Optimal Approach

To optimize the getTweetCountsPerFrequency method, we can maintain the timestamps in a sorted order (e.g., using a TreeMap). This allows us to use binary search to efficiently count the tweets within each chunk.

Implementation Details:

  • Data Structures:
    • Map<String, List<Integer>> tweets: Stores tweet names as keys and a sorted list of timestamps as values.
  • recordTweet(String tweetName, int time):
    • Adds the time to the list of timestamps associated with the tweetName in the tweets map. Keep the list sorted.
  • getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime):
    • Determines the chunk size based on freq (minute: 60, hour: 3600, day: 86400).
    • Iterates through the time range [startTime, endTime] in chunks.
    • For each chunk, use binary search (or two pointers) to count the tweets within the current chunk.

Code (Java):

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

class TweetCounts {

    private Map<String, List<Integer>> tweets;

    public TweetCounts() {
        tweets = new HashMap<>();
    }

    public void recordTweet(String tweetName, int time) {
        List<Integer> times = tweets.computeIfAbsent(tweetName, k -> new ArrayList<>());
        times.add(time);
        times.sort(Integer::compare); // Keep the list sorted
    }

    public List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime) {
        int chunkSize = 60;
        if (freq.equals("hour")) {
            chunkSize = 3600;
        } else if (freq.equals("day")) {
            chunkSize = 86400;
        }

        List<Integer> result = new ArrayList<>();
        for (int i = startTime; i <= endTime; i += chunkSize) {
            int chunkEnd = Math.min(i + chunkSize - 1, endTime);
            int count = 0;
            if (tweets.containsKey(tweetName)) {
                List<Integer> times = tweets.get(tweetName);
                int left = 0;
                int right = times.size() - 1;
                int firstIndex = -1;

                // Find the first index greater or equal to i using binary search
                while (left <= right) {
                    int mid = left + (right - left) / 2;
                    if (times.get(mid) >= i) {
                        firstIndex = mid;
                        right = mid - 1;
                    } else {
                        left = mid + 1;
                    }
                }

                if (firstIndex != -1) {
                    for (int k = firstIndex; k < times.size(); k++) {
                        if (times.get(k) <= chunkEnd) {
                            count++;
                        } else {
                            break;
                        }
                    }
                }
            }
            result.add(count);
        }
        return result;
    }
}

Time and Space Complexity (Optimal Approach):

  • recordTweet: O(log T) (due to the sort which happens after insertion), where T is the total number of tweets for that tweetName. If we used a balanced tree structure, this would be O(log T) for insertion.
  • getTweetCountsPerFrequency: O(N * log M), where N is the number of chunks and M is the number of tweets for a given tweetName. Binary search takes log(M) time, and it is performed for each of the N chunks.
  • Space Complexity: O(T), where T is the total number of tweets recorded.

Edge Cases:

  • Empty tweet name.
  • Invalid frequency.
  • startTime greater than endTime.
  • No tweets recorded for a given tweet name.

These edge cases are handled in the provided code by checking for null or empty values and providing default return values or throwing exceptions as necessary. Binary search is used to find the lower bound, and the loop exists early.