Taro Logo

Design an Ordered Stream

Easy
Asked by:
Profile picture
Profile picture
11 views
Topics:
ArraysStrings

There is a stream of n (idKey, value) pairs arriving in an arbitrary order, where idKey is an integer between 1 and n and value is a string. No two pairs have the same id.

Design a stream that returns the values in increasing order of their IDs by returning a chunk (list) of values after each insertion. The concatenation of all the chunks should result in a list of the sorted values.

Implement the OrderedStream class:

  • OrderedStream(int n) Constructs the stream to take n values.
  • String[] insert(int idKey, String value) Inserts the pair (idKey, value) into the stream, then returns the largest possible chunk of currently inserted values that appear next in the order.

Example:

Input
["OrderedStream", "insert", "insert", "insert", "insert", "insert"]
[[5], [3, "ccccc"], [1, "aaaaa"], [2, "bbbbb"], [5, "eeeee"], [4, "ddddd"]]
Output
[null, [], ["aaaaa"], ["bbbbb", "ccccc"], [], ["ddddd", "eeeee"]]

Explanation
// Note that the values ordered by ID is ["aaaaa", "bbbbb", "ccccc", "ddddd", "eeeee"].
OrderedStream os = new OrderedStream(5);
os.insert(3, "ccccc"); // Inserts (3, "ccccc"), returns [].
os.insert(1, "aaaaa"); // Inserts (1, "aaaaa"), returns ["aaaaa"].
os.insert(2, "bbbbb"); // Inserts (2, "bbbbb"), returns ["bbbbb", "ccccc"].
os.insert(5, "eeeee"); // Inserts (5, "eeeee"), returns [].
os.insert(4, "ddddd"); // Inserts (4, "ddddd"), returns ["ddddd", "eeeee"].
// Concatentating all the chunks returned:
// [] + ["aaaaa"] + ["bbbbb", "ccccc"] + [] + ["ddddd", "eeeee"] = ["aaaaa", "bbbbb", "ccccc", "ddddd", "eeeee"]
// The resulting order is the same as the order above.

Constraints:

  • 1 <= n <= 1000
  • 1 <= id <= n
  • value.length == 5
  • value consists only of lowercase letters.
  • Each call to insert will have a unique id.
  • Exactly n calls will be made to insert.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible values for the `idKey`?
  2. Is the input stream guaranteed to contain all IDs from 1 to n, or could there be gaps?
  3. If an `idKey` is received out of order but would eventually complete a chunk (e.g., chunk starts at 3, but we first receive 5, then 3, then 4), should we buffer those until the chunk is complete?
  4. What should be returned if no new chunk can be formed with the current data in the stream?
  5. If the input contains duplicate `idKey` values, what should the behaviour be?

Brute Force Solution

Approach

The Ordered Stream problem requires us to assemble a sequence of values in the correct order based on their positions. A brute-force approach involves repeatedly checking for the next expected value in the stream and adding it to our result if it's available.

Here's how the algorithm would work step-by-step:

  1. We receive pieces of the stream one at a time.
  2. We keep track of the next position we expect to see.
  3. When a new piece arrives, we see if it's the piece we were expecting.
  4. If it is, we add it to our ordered sequence.
  5. Then, we check if the next several pieces after that are also in the correct order.
  6. We continue adding pieces as long as they are in the correct sequence following the last added piece.
  7. If we receive a piece that is not the next expected piece, we simply store it for later.
  8. When asked to provide the output, we return the sequence we've built so far.

Code Implementation

class OrderedStream:

    def __init__(self, stream_size: int):
        self.stream_array = [None] * (stream_size + 1)
        self.pointer = 1

    def insert(self, id_key: int, value: str) -> list[str]:
        self.stream_array[id_key] = value
        result = []

        # Keep adding values to result while in order
        while self.pointer < len(self.stream_array) and self.stream_array[self.pointer] is not None:
            result.append(self.stream_array[self.pointer])
            self.pointer += 1

        return result

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input stream. While receiving new stream pieces, it adds them to the ordered sequence if they are in the correct order. The while loop for checking the consecutive ordered pieces will not run more than n times in total, given that there are n elements in the stream. Therefore, the time complexity is O(n).
Space Complexity
O(N)The Ordered Stream implementation requires a data structure to store the received pieces that are not yet part of the ordered sequence. In the worst-case scenario, we might receive all N pieces out of order before we get the piece with the lowest index. These out-of-order pieces are stored for later processing which uses a space proportional to N. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

The goal is to process a stream of data in order. Instead of immediately returning data as it arrives, we hold onto it until we have a continuous sequence starting from the expected position. This allows us to efficiently output the ordered stream.

Here's how the algorithm would work step-by-step:

  1. Keep track of the next expected position in the ordered stream.
  2. As data chunks arrive, store them in a temporary holding place.
  3. Check if the current chunk's position matches the next expected position.
  4. If it matches, extract this chunk and any immediately following chunks from the holding place that form a continuous sequence.
  5. Output the extracted continuous sequence.
  6. Update the next expected position to reflect the end of the outputted sequence.
  7. If the current chunk's position doesn't match the next expected position, do nothing and keep storing the chunk until it is needed.

Code Implementation

class OrderedStream:
    def __init__(self, number_of_values):
        self.data_store = {}
        self.current_position = 1

    def insert(self, id_key, value):
        self.data_store[id_key] = value
        output = []

        # Check if the current id matches expected position
        if id_key == self.current_position:

            # Extract and append continuous sequence
            while self.current_position in self.data_store:
                output.append(self.data_store[self.current_position])
                self.current_position += 1

        # We increment the position only when a continuous sequence is extracted
        return output

Big(O) Analysis

Time Complexity
O(n)The OrderedStream processes n chunks of data. Inserting each chunk into the holding place (if needed) takes constant time O(1) on average, assuming a hash map or similar data structure is used. Extracting the continuous sequence involves iterating through a portion of the stored data. In the worst-case scenario, a significant portion of the stored data might be part of the outputted sequence in a single call to the insert method but across all calls, each chunk is considered for output at most once. Thus, extracting and outputting has a cumulative cost of O(n) since we only iterate and process each inserted chunk once across all invocations of the insert method.
Space Complexity
O(N)The described solution uses a temporary holding place to store data chunks. In the worst case, if the incoming chunks are all out of order and none can be immediately outputted, this holding place might need to store all N data chunks, where N represents the total number of chunks inserted into the Ordered Stream. The holding place acts as an auxiliary data structure that grows linearly with the number of inserted chunks. Therefore, the auxiliary space complexity is O(N).

Edge Cases

Empty input stream
How to Handle:
Initialize the internal data structure to an appropriate empty state and return an empty list for any `insert` call.
Input `idKey` less than 1
How to Handle:
Throw an IllegalArgumentException or return an empty list as invalid idKey leads to out-of-bounds access if used as index.
Input `idKey` greater than stream length
How to Handle:
Throw an IllegalArgumentException or return an empty list because this is an invalid insertion location beyond the stream's capacity.
Maximum stream length reached (consider memory constraints)
How to Handle:
Check if the stream has reached its maximum capacity before inserting; if so, either resize (if allowed) or throw an exception to prevent memory overflow.
Inserting out-of-order, but contiguous elements
How to Handle:
Store the value at the given index; when the pointer reaches the stored index, start returning the contiguous chunk.
Inserting out-of-order and non-contiguous elements (gaps in the ordered stream)
How to Handle:
Store the inserted values and only return a chunk when the current pointer finds the start of a contiguous block of non-null values.
Null value being inserted
How to Handle:
Store the null value in the array and skip it when returning chunks as per expected behavior (depending on requirements, it might be better to throw an exception).
Large number of inserts that don't release any data.
How to Handle:
The solution should handle a continuous stream of inserts where `ptr` never advances, potentially indicating a problem with the input order; this can lead to memory buildup and the solution should scale efficiently by managing memory appropriately.