Taro Logo

Data Stream as Disjoint Intervals

Hard
Amazon logo
Amazon
2 views
Topics:
ArraysBinary Search

Given a data stream input of non-negative integers a1, a2, ..., an, summarize the numbers seen so far as a list of disjoint intervals.

Implement the SummaryRanges class:

  • SummaryRanges() Initializes the object with an empty stream.
  • void addNum(int value) Adds the integer value to the stream.
  • int[][] getIntervals() Returns a summary of the integers in the stream currently as a list of disjoint intervals [starti, endi]. The answer should be sorted by starti.

Example 1:

Input
["SummaryRanges", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals"]
[[], [1], [], [3], [], [7], [], [2], [], [6], []]
Output
[null, null, [[1, 1]], null, [[1, 1], [3, 3]], null, [[1, 1], [3, 3], [7, 7]], null, [[1, 3], [7, 7]], null, [[1, 3], [6, 7]]]

Explanation
SummaryRanges summaryRanges = new SummaryRanges();
summaryRanges.addNum(1);      // arr = [1]
summaryRanges.getIntervals(); // return [[1, 1]]
summaryRanges.addNum(3);      // arr = [1, 3]
summaryRanges.getIntervals(); // return [[1, 1], [3, 3]]
summaryRanges.addNum(7);      // arr = [1, 3, 7]
summaryRanges.getIntervals(); // return [[1, 1], [3, 3], [7, 7]]
summaryRanges.addNum(2);      // arr = [1, 2, 3, 7]
summaryRanges.getIntervals(); // return [[1, 3], [7, 7]]
summaryRanges.addNum(6);      // arr = [1, 2, 3, 6, 7]
summaryRanges.getIntervals(); // return [[1, 3], [6, 7]]

Constraints:

  • 0 <= value <= 104
  • At most 3 * 104 calls will be made to addNum and getIntervals.
  • At most 102 calls will be made to getIntervals.

Follow up: What if there are lots of merges and the number of disjoint intervals is small compared to the size of the data stream?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible integer values in the data stream? Could there be negative numbers?
  2. Will the input integers always be unique, or could there be duplicates in the stream?
  3. If a number in the stream creates a new interval that merges with existing adjacent intervals, should I merge them into one larger interval immediately?
  4. What is the expected format of the output for the getIntervals() function? Should the intervals be sorted, and if so, by which value (start or end)?
  5. If the data stream contains extremely large numbers, is memory usage a concern, and are there any limitations on the number of intervals I can store?

Brute Force Solution

Approach

The brute force approach to maintaining disjoint intervals from a data stream involves exhaustively checking every possible interval combination. We repeatedly merge overlapping intervals as new data arrives, ensuring a disjointed representation. The process prioritizes exploring all merging scenarios at each step.

Here's how the algorithm would work step-by-step:

  1. Start with an empty collection of intervals.
  2. When a new number arrives from the data stream, create a new interval containing only that number.
  3. Compare the new interval with every interval currently in the collection.
  4. If the new interval overlaps with an existing interval, merge them into a single larger interval.
  5. After merging, compare the newly created larger interval with all other intervals in the collection to see if further merging is needed.
  6. Repeat the merging process until the newly created larger interval is disjoint from all other intervals in the collection.
  7. If the new interval does not overlap with any existing intervals, simply add it to the collection.
  8. Repeat this process for every new number that arrives from the data stream, ensuring that the collection always contains disjoint intervals.

Code Implementation

class SummaryRanges:
    def __init__(self):
        self.intervals_list = []

    def addNum(self, value):
        new_interval = [value, value]
        merged = True

        while merged:
            merged = False
            interval_index = 0

            while interval_index < len(self.intervals_list):
                existing_interval = self.intervals_list[interval_index]

                # Check if the new interval overlaps with an existing interval
                if (new_interval[0] <= existing_interval[1] and new_interval[1] >= existing_interval[0]):
                    # Merge the intervals because they overlap
                    new_interval[0] = min(new_interval[0], existing_interval[0])
                    new_interval[1] = max(new_interval[1], existing_interval[1])
                    self.intervals_list.pop(interval_index)
                    merged = True
                    # Reset index to start checking from the beginning after merging
                    interval_index = 0

                    continue

                interval_index += 1

        # Insert the new interval into the sorted list
        inserted = False
        for index, interval in enumerate(self.intervals_list):
            if new_interval[0] < interval[0]:
                self.intervals_list.insert(index, new_interval)
                inserted = True
                break

        #Add new interval to end if it's larger than any existing
        if not inserted:
            self.intervals_list.append(new_interval)

    def getIntervals(self):
        return self.intervals_list

Big(O) Analysis

Time Complexity
O(n²)For each incoming number from the data stream, we create a new interval. Then, we iterate through the existing collection of intervals (which can grow up to size n, where n is the number of incoming numbers) to check for overlaps and potentially merge. In the worst-case scenario, after each merge, we need to re-iterate through the entire collection again to check if further merging is required, potentially leading to nested iterations. Therefore, for n incoming numbers, the time complexity can approach n * n which simplifies to O(n²).
Space Complexity
O(N)The primary auxiliary space is used to store the collection of disjoint intervals. In the worst-case scenario, where no intervals can be merged, each incoming number from the data stream will result in a new interval being added to the collection. Therefore, if N is the number of incoming numbers (data stream size), the collection could potentially store N intervals. Consequently, the space complexity is O(N).

Optimal Solution

Approach

The goal is to efficiently maintain a collection of non-overlapping number ranges as new numbers arrive. We keep our ranges organized so we can quickly find where a new number belongs and merge ranges when necessary. This avoids checking every possible range combination each time a new number is added.

Here's how the algorithm would work step-by-step:

  1. Imagine you have a sorted list of number ranges that don't overlap. When a new number comes in, the first thing you need to do is quickly find the range, if any, that the number might belong to.
  2. If the new number fits perfectly inside an existing range, you don't need to do anything; the number is already covered.
  3. If the new number is just outside an existing range and can extend that range, then extend the range to include it.
  4. If the new number is between two existing ranges and can connect them into a single larger range, merge those ranges.
  5. If the new number doesn't fit into any existing ranges or connect any ranges, then create a new range containing just that number, and insert it into the correct sorted position.
  6. By keeping the ranges sorted, you only need to check nearby ranges when a new number comes in, which is much faster than checking all possible range combinations.

Code Implementation

class SummaryRanges:

    def __init__(self):
        self.ranges = []

    def addNum(self, value):
        new_interval = [value, value]
        index_to_insert = 0

        while index_to_insert < len(self.ranges):
            interval = self.ranges[index_to_insert]

            # Case 1: New interval is completely disjoint and to the left
            if new_interval[1] < interval[0] - 1:
                self.ranges.insert(index_to_insert, new_interval)
                return
            # Case 2: New interval is completely disjoint and to the right
            elif new_interval[0] > interval[1] + 1:
                index_to_insert += 1
                continue
            # Case 3: Intervals overlap, so we merge
            else:
                # Merge the intervals
                new_interval[0] = min(new_interval[0], interval[0])
                new_interval[1] = max(new_interval[1], interval[1])
                self.ranges.pop(index_to_insert)

        # If the new interval wasn't inserted, add it to the end
        self.ranges.insert(index_to_insert, new_interval)

        # Now, we must merge overlapping intervals
        index = 0
        while index < len(self.ranges) - 1:
            first_interval = self.ranges[index]
            second_interval = self.ranges[index+1]

            # We merge intervals that overlap or are adjacent
            if first_interval[1] >= second_interval[0] -1:
                # This handles overlapping / adjacent intervals
                first_interval[1] = max(first_interval[1], second_interval[1])
                self.ranges.pop(index+1)
            else:
                index += 1

    def getIntervals(self):
        return self.ranges

Big(O) Analysis

Time Complexity
O(n)The critical operation is inserting a new number and merging intervals, which requires finding the appropriate position in the sorted list of intervals. In the worst-case scenario, we might have to traverse the entire list of intervals to find the insertion point and potentially merge intervals. With 'n' being the number of intervals, finding the correct position involves iterating through the list, taking O(n) time per insertion. Consequently, the overall complexity, considering the sorted maintenance of intervals, remains O(n).
Space Complexity
O(N)The algorithm maintains a collection of non-overlapping number ranges. In the worst-case scenario, where each added number is distinct and doesn't merge with existing ranges, we would need to store a new range for each number. This could result in a data structure (e.g., a list or tree) holding up to N ranges, where N is the number of numbers added to the data stream. Therefore, the auxiliary space used grows linearly with the input size N.

Edge Cases

CaseHow to Handle
Empty input streamInitialize the intervals data structure as empty and handle empty ranges appropriately.
Stream contains duplicate valuesHandle duplicates by merging intervals they fall into or ignoring if already covered.
Stream contains negative numbersThe interval representation should correctly handle negative start and end points.
Stream with large, consecutive values causing integer overflow when determining interval boundariesUse appropriate data types (e.g., long) or range checks to prevent integer overflow.
Maximum number of intervals causes memory exhaustionConsider alternative data structures or strategies if memory usage becomes excessive, such as summarization of intervals.
Input number is already contained within an existing intervalNo new interval is added and the existing interval remains unchanged.
Input number can merge two existing intervalsCorrectly merge the two adjacent intervals and remove the original intervals from the data structure.
Interval merges can cascade resulting in multiple interval mergesEnsure the interval merging logic recursively or iteratively handles the necessary combinations for merging.