Taro Logo

Online Majority Element In Subarray

Hard
Nutanix logo
Nutanix
0 views
Topics:
Arrays

Design a data structure that efficiently finds the majority element of a given subarray.

The majority element of a subarray is an element that occurs threshold times or more in the subarray.

Implementing the MajorityChecker class:

  • MajorityChecker(int[] arr) Initializes the instance of the class with the given array arr.
  • int query(int left, int right, int threshold) returns the element in the subarray arr[left...right] that occurs at least threshold times, or -1 if no such element exists.

Example 1:

Input
["MajorityChecker", "query", "query", "query"]
[[[1, 1, 2, 2, 1, 1]], [0, 5, 4], [0, 3, 3], [2, 3, 2]]
Output
[null, 1, -1, 2]

Explanation
MajorityChecker majorityChecker = new MajorityChecker([1, 1, 2, 2, 1, 1]);
majorityChecker.query(0, 5, 4); // return 1
majorityChecker.query(0, 3, 3); // return -1
majorityChecker.query(2, 3, 2); // return 2

Constraints:

  • 1 <= arr.length <= 2 * 104
  • 1 <= arr[i] <= 2 * 104
  • 0 <= left <= right < arr.length
  • threshold <= right - left + 1
  • 2 * threshold > right - left + 1
  • At most 104 calls will be made to query.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `nums` and the number of queries?
  2. What is the range of values for the integers in the `nums` array? Can they be negative, zero, or very large?
  3. If multiple elements appear more than half the time in the subarray [left, right], which one should I return? Or is it guaranteed that there will be at most one majority element?
  4. What should I return if `left` is greater than `right`?
  5. How should the class be initialized? Can I preprocess the array to optimize query performance?

Brute Force Solution

Approach

To solve the problem using the brute force approach, we essentially look at every possible subarray within the given range. For each subarray, we count the occurrences of each number and check if any number appears more than half the time, which would make it the majority element.

Here's how the algorithm would work step-by-step:

  1. Consider all possible subarrays defined by the start and end positions within the given range.
  2. For each subarray, count how many times each number appears.
  3. For each subarray, check if any number's count is greater than half the length of the subarray.
  4. If a number appears more than half the time in a subarray, that number is a potential majority element.
  5. If no number meets the majority condition within a subarray, then there is no majority element for that subarray.
  6. Repeat these steps for every possible subarray to find all potential majority elements and determine if one exists.

Code Implementation

def online_majority_element_in_subarray_brute_force(numbers, left, right):
    for start_index in range(left, right + 1): 
        for end_index in range(start_index, right + 1):
            subarray_length = end_index - start_index + 1
            element_counts = {}

            for index in range(start_index, end_index + 1):
                element = numbers[index]
                element_counts[element] = element_counts.get(element, 0) + 1

            # Determine if any element is the majority element
            for element, count in element_counts.items():
                if count > subarray_length // 2:

                    #Found majority element for this subarray
                    return element

    # If no majority element was found return -1
    return -1

Big(O) Analysis

Time Complexity
O(n^3)The brute force approach considers all possible subarrays. There are O(n^2) subarrays within the given range. For each subarray, which has a maximum length of n, we iterate through the elements to count the occurrences of each number. This counting process takes O(n) time for each subarray. Therefore, the overall time complexity is O(n^2) * O(n), which simplifies to O(n^3).
Space Complexity
O(N)The brute force approach iterates through every possible subarray. For each subarray, a data structure (implicitly a hash map or similar counting mechanism) is used to store the counts of each number to determine the majority element. In the worst case, the counting structure may need to store the count of each unique element in the subarray, which could be up to N elements in the original array, where N is the size of the input array. Therefore, the auxiliary space is O(N).

Optimal Solution

Approach

The challenge involves efficiently identifying the most frequent number within different parts of a given list. Instead of counting every number each time we're asked, we'll pre-process the list to make answering these requests much faster.

Here's how the algorithm would work step-by-step:

  1. First, divide the list into smaller, equal-sized chunks or blocks.
  2. For each chunk, find the most frequent number and store it along with its count within that chunk. This is pre-calculation.
  3. When someone asks about a section of the list, we only need to look at the relevant chunks and the numbers at the edges of that section.
  4. Count the occurrences of the most frequent numbers from the fully contained chunks within the given section. Also count the occurrences of any numbers that exist at the start or end of the given section, because their respective chunks were not fully contained.
  5. The number with the highest count amongst those considered is the most likely majority element in that section.
  6. Finally, to be absolutely sure, double-check that the most likely candidate appears more than half the time within the specified section of the list. If it does, we've found our answer.

Code Implementation

class MajorityChecker:
    def __init__(self, array):
        self.array = array
        self.block_size = int((len(array))**0.5)
        self.blocks = {}

        # Precompute majority element for each block
        for i in range(0, len(array), self.block_size):
            block = array[i:i + self.block_size]
            counts = {}
            majority_element = None
            max_count = 0
            for number in block:
                counts[number] = counts.get(number, 0) + 1
                if counts[number] > max_count:
                    max_count = counts[number]
                    majority_element = number
            self.blocks[i // self.block_size] = (majority_element, max_count)

    def query(self, left, right, threshold):
        block_start = left // self.block_size
        block_end = right // self.block_size
        counts = {}

        # Count majority elements from relevant blocks
        for i in range(block_start + 1, block_end):
            start_index = i * self.block_size
            majority_element, _ = self.blocks[i]
            counts[majority_element] = counts.get(majority_element, 0) + self.block_size

        # Count elements from partial blocks on the edges
        for i in range(left, min((block_start + 1) * self.block_size, right + 1)): 
            counts[self.array[i]] = counts.get(self.array[i], 0) + 1

        for i in range(block_end * self.block_size, right + 1): 
            # Ensures right boundary is fully accounted for
            counts[self.array[i]] = counts.get(self.array[i], 0) + 1

        # Determine potential majority element and verify count
        potential_majority_element = None
        max_count = 0
        for number, count in counts.items():
            if count > max_count:
                max_count = count
                potential_majority_element = number

        if potential_majority_element is None:
            return -1

        actual_count = 0
        for i in range(left, right + 1):
            if self.array[i] == potential_majority_element:
                actual_count += 1

        # Verify if the element occurs more than threshold times
        if actual_count > threshold:
            return potential_majority_element
        else:
            return -1

Big(O) Analysis

Time Complexity
O(sqrt(n))The preprocessing step to divide the list of size n into blocks of size sqrt(n) takes O(n) time. Finding the majority element within each block of size sqrt(n) also takes O(sqrt(n)) time, and since we have n/sqrt(n) = sqrt(n) blocks, the total preprocessing time remains O(n). When querying a subarray, we iterate through at most sqrt(n) fully contained blocks, taking O(sqrt(n)) time. Additionally, we iterate through the elements at the edges of the subarray, which is at most 2 * sqrt(n) elements, also taking O(sqrt(n)) time. Finally, we verify the candidate majority element in the given subarray, which has a length of at most n and therefore takes O(n) time in the worst case, which dominates the query time if we only do a single verification. However, we only need to verify a few candidates for majority element (at most 2 * sqrt(n) including the edge elements and the block majority elements), so the total verification time across all candidates is O(sqrt(n) * n). But because the problem asks for an online algorithm with possibly multiple queries, the bottleneck becomes the processing of the individual queries which consist of checking the precomputed blocks of size sqrt(n) to see if one is the majority. In a single query, we iterate the at most sqrt(n) candidate majorities and compare to the edges, thus query is sqrt(n). Each preprocessing step only happens once so it doesn't influence the complexity of an online query.
Space Complexity
O(sqrt(N))The algorithm divides the input list of size N into chunks of size sqrt(N). For each chunk, it stores the most frequent number and its count. This pre-calculation results in an auxiliary array of size sqrt(N) to store the majority element of each chunk. Therefore, the auxiliary space used is proportional to the number of chunks, which is sqrt(N), giving a space complexity of O(sqrt(N)).

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn -1 immediately, indicating no majority element exists.
Left index is out of bounds (less than 0)Treat left index as 0, clamping it to the array's start.
Right index is out of bounds (greater than array length - 1)Treat right index as array length - 1, clamping it to the array's end.
Left index is greater than the right index (invalid range)Return -1 immediately, as an invalid range cannot contain a majority element.
Array contains all identical valuesAny element will be the majority element, so return that value.
Very large array size exceeding available memory for auxiliary data structuresChoose a data structure (like segment tree or sqrt decomposition) with efficient memory usage and consider limiting the input size within practical bounds.
Subarray has even length and no true majority exists (e.g., two elements appear exactly half the time)The algorithm should return -1 in this case since the majority element must appear strictly more than half the time.
Integer overflow during counting if the number of occurrences is extremely high.Use a data type capable of holding large counts, such as a 64-bit integer, or a probabilistic data structure if approximate results are acceptable and overflows are likely.