A social media company wants to monitor activity on their site by analyzing the number of tweets in select time periods. These periods are divided into smaller time chunks based on frequency (every minute, hour, or day).
For example, the period [10, 10000]
(in seconds) partitions into:
[10,69]
, [70,129]
, ..., [9970,10000]
[10,3609]
, [3610,7209]
, [7210,10000]
[10,10000]
The last chunk can be shorter than the frequency's chunk size, ending at the period's end time.
Design an API to help the company analyze their tweets.
Implement the TweetCounts
class:
TweetCounts()
: Initializes the object.void recordTweet(String tweetName, int time)
: Stores tweetName
at time
(in seconds).List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime)
: Returns the number of tweets with tweetName
in each time chunk for [startTime, endTime]
and frequency freq
.
freq
is "minute", "hour", or "day".Example:
Input:
["TweetCounts","recordTweet","recordTweet","recordTweet","getTweetCountsPerFrequency","getTweetCountsPerFrequency","recordTweet","getTweetCountsPerFrequency"]
[[],["tweet3",0],["tweet3",60],["tweet3",10],["minute","tweet3",0,59],["minute","tweet3",0,60],["tweet3",120],["hour","tweet3",0,210]]
Output:
[null,null,null,null,[2],[2,1],null,[4]]
How would you implement this class efficiently? What are the time and space complexities of the operations?
This problem requires designing a TweetCounts
class to record tweets and retrieve tweet counts within specified time chunks for different frequencies.
A straightforward approach is to store the tweets in a hash map where the key is the tweet name and the value is a list of timestamps. For the getTweetCountsPerFrequency
method, iterate through the list of timestamps for the given tweet name and count the tweets within each time chunk.
Map<String, List<Integer>> tweets
: Stores tweet names as keys and a sorted list of timestamps as values.recordTweet(String tweetName, int time)
:
time
to the list of timestamps associated with the tweetName
in the tweets
map. If the tweet name doesn't exist, create a new list.getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime)
:
freq
(minute: 60, hour: 3600, day: 86400).[startTime, endTime]
in chunks.tweetName
and counts the tweets that fall within the current chunk.import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
class TweetCounts {
private Map<String, List<Integer>> tweets;
public TweetCounts() {
tweets = new HashMap<>();
}
public void recordTweet(String tweetName, int time) {
tweets.computeIfAbsent(tweetName, k -> new ArrayList<>()).add(time);
}
public List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime) {
int chunkSize = 60; // minute
if (freq.equals("hour")) {
chunkSize = 3600;
} else if (freq.equals("day")) {
chunkSize = 86400;
}
List<Integer> result = new ArrayList<>();
for (int i = startTime; i <= endTime; i += chunkSize) {
int chunkEnd = Math.min(i + chunkSize - 1, endTime);
int count = 0;
if (tweets.containsKey(tweetName)) {
for (int time : tweets.get(tweetName)) {
if (time >= i && time <= chunkEnd) {
count++;
}
}
}
result.add(count);
}
return result;
}
}
recordTweet
: O(1) (average case) for adding to the list.getTweetCountsPerFrequency
: O(N * M), where N is the number of chunks and M is the number of tweets for a given tweetName. This is because, for each chunk, we iterate through all the tweets for the given tweetName.To optimize the getTweetCountsPerFrequency
method, we can maintain the timestamps in a sorted order (e.g., using a TreeMap
). This allows us to use binary search to efficiently count the tweets within each chunk.
Map<String, List<Integer>> tweets
: Stores tweet names as keys and a sorted list of timestamps as values.recordTweet(String tweetName, int time)
:
time
to the list of timestamps associated with the tweetName
in the tweets
map. Keep the list sorted.getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime)
:
freq
(minute: 60, hour: 3600, day: 86400).[startTime, endTime]
in chunks.import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
class TweetCounts {
private Map<String, List<Integer>> tweets;
public TweetCounts() {
tweets = new HashMap<>();
}
public void recordTweet(String tweetName, int time) {
List<Integer> times = tweets.computeIfAbsent(tweetName, k -> new ArrayList<>());
times.add(time);
times.sort(Integer::compare); // Keep the list sorted
}
public List<Integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime) {
int chunkSize = 60;
if (freq.equals("hour")) {
chunkSize = 3600;
} else if (freq.equals("day")) {
chunkSize = 86400;
}
List<Integer> result = new ArrayList<>();
for (int i = startTime; i <= endTime; i += chunkSize) {
int chunkEnd = Math.min(i + chunkSize - 1, endTime);
int count = 0;
if (tweets.containsKey(tweetName)) {
List<Integer> times = tweets.get(tweetName);
int left = 0;
int right = times.size() - 1;
int firstIndex = -1;
// Find the first index greater or equal to i using binary search
while (left <= right) {
int mid = left + (right - left) / 2;
if (times.get(mid) >= i) {
firstIndex = mid;
right = mid - 1;
} else {
left = mid + 1;
}
}
if (firstIndex != -1) {
for (int k = firstIndex; k < times.size(); k++) {
if (times.get(k) <= chunkEnd) {
count++;
} else {
break;
}
}
}
}
result.add(count);
}
return result;
}
}
recordTweet
: O(log T) (due to the sort
which happens after insertion), where T is the total number of tweets for that tweetName. If we used a balanced tree structure, this would be O(log T) for insertion.getTweetCountsPerFrequency
: O(N * log M), where N is the number of chunks and M is the number of tweets for a given tweetName. Binary search takes log(M) time, and it is performed for each of the N chunks.startTime
greater than endTime
.These edge cases are handled in the provided code by checking for null or empty values and providing default return values or throwing exceptions as necessary. Binary search is used to find the lower bound, and the loop exists early.