Taro Logo

Range Frequency Queries

Medium
Quora logo
Quora
1 view
Topics:
Arrays

Design a data structure to find the frequency of a given value in a given subarray.

The frequency of a value in a subarray is the number of occurrences of that value in the subarray.

Implement the RangeFreqQuery class:

  • RangeFreqQuery(int[] arr) Constructs an instance of the class with the given 0-indexed integer array arr.
  • int query(int left, int right, int value) Returns the frequency of value in the subarray arr[left...right].

A subarray is a contiguous sequence of elements within an array. arr[left...right] denotes the subarray that contains the elements of nums between indices left and right (inclusive).

Example 1:

Input
["RangeFreqQuery", "query", "query"]
[[[12, 33, 4, 56, 22, 2, 34, 33, 22, 12, 34, 56]], [1, 2, 4], [0, 11, 33]]
Output
[null, 1, 2]

Explanation
RangeFreqQuery rangeFreqQuery = new RangeFreqQuery([12, 33, 4, 56, 22, 2, 34, 33, 22, 12, 34, 56]);
rangeFreqQuery.query(1, 2, 4); // return 1. The value 4 occurs 1 time in the subarray [33, 4]
rangeFreqQuery.query(0, 11, 33); // return 2. The value 33 occurs 2 times in the whole array.

Constraints:

  • 1 <= arr.length <= 105
  • 1 <= arr[i], value <= 104
  • 0 <= left <= right < arr.length
  • At most 105 calls will be made to query

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the size of the input array `arr`, and the range of values within it?
  2. Are the `left` and `right` indices guaranteed to be within the bounds of the array?
  3. What should I return if `left` is greater than `right`?
  4. Is the input array `arr` mutable, or should I treat it as immutable?
  5. Will the frequency queries be significantly more numerous than the number of elements in the array, justifying pre-computation or caching strategies?

Brute Force Solution

Approach

The brute force approach to answering frequency queries within a range is straightforward. For each query, we'll simply examine every number within the specified range. We'll count how many times the target number appears within that range to provide the answer.

Here's how the algorithm would work step-by-step:

  1. When you get a question asking how many times a specific number appears within a certain range, you need to look at each number within that range one by one.
  2. Start at the beginning of the range and check if the number you're looking at is the number you're searching for.
  3. If it is, add one to your running count.
  4. Move to the next number in the range and repeat the checking and counting process.
  5. Keep doing this until you reach the end of the range.
  6. Once you've checked every number in the range, the final count is your answer to how many times that specific number appeared within that range.

Code Implementation

class RangeFrequencyQueries:

    def __init__(self, array_of_numbers):
        self.array_of_numbers = array_of_numbers

    def query(self, left_index, right_index, value_to_find):
        frequency_count = 0

        # Iterate through the specified range.
        for index in range(left_index, right_index + 1):
            # Count occurrences of the target value.
            if self.array_of_numbers[index] == value_to_find:

                # Increment count only when target is found
                frequency_count += 1

        return frequency_count

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through a specified range within the input array for each query. The size of this range is at most n, where n is the size of the input array. Therefore, for each query, the algorithm performs a single loop that iterates up to n times to count the occurrences of the target value. Thus, the time complexity is O(n).
Space Complexity
O(1)The provided solution iterates through a given range of the input array and counts occurrences of a specific value. The plain English explanation only describes iterating and counting within the existing array. No auxiliary data structures like lists, hash maps, or additional arrays are created to store intermediate results; only a counter variable is used. Therefore, the space used beyond the input array is constant and does not depend on the input size (N, the number of elements in the array) which means the space complexity is O(1).

Optimal Solution

Approach

The core idea is to store, for each number, where it appears in the original list. Then, when answering a question, we can quickly find how many times a number appears within a given range. This avoids checking every single number in the range each time.

Here's how the algorithm would work step-by-step:

  1. First, create a way to remember the position(s) of each distinct number from the list.
  2. When asked how many times a particular number appears between two positions, use the memory created in the first step to find all positions of that number.
  3. Figure out which of those positions fall between the start and end positions of the range.
  4. Count how many positions are between the start and end positions; that's the answer.

Code Implementation

class RangeFrequencyQuery:

    def __init__(self, array_of_numbers):
        self.number_positions = {}
        for index, number in enumerate(array_of_numbers):
            if number not in self.number_positions:
                self.number_positions[number] = []
            self.number_positions[number].append(index)

    def query(self, left_index, right_index, value_to_find):
        # If the value is not in the array, it appears zero times.
        if value_to_find not in self.number_positions:
            return 0

        positions = self.number_positions[value_to_find]
        count = 0

        # Binary search finds the first position >= left_index
        left_bound = self.binary_search_left(positions, left_index)

        # Binary search finds the last position <= right_index
        right_bound = self.binary_search_right(positions, right_index)

        # Only count if both bounds are valid indices.
        if left_bound <= right_bound:
            count = right_bound - left_bound + 1

        return count

    def binary_search_left(self, positions, target):
        low_index = 0
        high_index = len(positions) - 1
        result_index = len(positions)

        # Standard binary search to find the leftmost valid index.
        while low_index <= high_index:
            middle_index = (low_index + high_index) // 2
            if positions[middle_index] >= target:
                result_index = middle_index
                high_index = middle_index - 1
            else:
                low_index = middle_index + 1

        return result_index

    def binary_search_right(self, positions, target):
        low_index = 0
        high_index = len(positions) - 1
        result_index = -1

        # Standard binary search to find the rightmost valid index.
        while low_index <= high_index:
            middle_index = (low_index + high_index) // 2
            if positions[middle_index] <= target:
                result_index = middle_index
                low_index = middle_index + 1
            else:
                high_index = middle_index - 1

        return result_index

Big(O) Analysis

Time Complexity
O(n + m log k)The preprocessing step involves iterating through the input array of size n to build the index map, taking O(n) time. During query processing, finding the indices of the value takes O(k) where k is the number of occurrences of the value. Then, performing two binary searches (using the sorted positions) to find the first index greater than or equal to left and the first index greater than right takes O(log k) each, and the difference of those indices determines the frequency. With m queries in total, the total query time is O(m log k). Therefore the overall runtime is O(n + m log k).
Space Complexity
O(N)The algorithm stores the indices of each distinct number from the input list in a data structure (e.g., a hash map or dictionary). In the worst-case scenario, all numbers in the input list are distinct, resulting in storing N indices, where N is the length of the input list. Therefore, the auxiliary space required to store these indices is proportional to N. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or empty input arrayThrow an IllegalArgumentException or return 0, indicating no elements to query from.
left > rightSwap left and right or return 0, as the query range is invalid.
left < 0 or right >= arr.lengthThrow an IndexOutOfBoundsException or adjust the bounds to fit the array.
Value not present in the array at allReturn 0, as the frequency is zero.
Array containing Integer.MAX_VALUE or Integer.MIN_VALUEEnsure that comparisons do not result in integer overflow or unexpected behavior.
Array with all elements equal to the valueThe algorithm should correctly count all elements within the range.
Large array size affecting memory or performanceConsider using efficient data structures like a hash map of values to sorted lists of indices for logarithmic search, precompute frequency counts or divide array into blocks.
Frequency of the value exceeds Integer.MAX_VALUEUse long instead of integer to store frequency counts to handle potential overflow