Taro Logo

Random Pick with Weight

Medium
Google logo
Google
Topics:
ArraysBinary Search

You are given a 0-indexed array of positive integers w where w[i] describes the weight of the i<sup>th</sup> index.

You need to implement the function pickIndex(), which randomly picks an index in the range [0, w.length - 1] (inclusive) and returns it. The probability of picking an index i is w[i] / sum(w).

  • For example, if w = [1, 3], the probability of picking index 0 is 1 / (1 + 3) = 0.25 (i.e., 25%), and the probability of picking index 1 is 3 / (1 + 3) = 0.75 (i.e., 75%).

Could you provide a solution that efficiently implements the pickIndex() function, considering the constraints on the input size and the number of calls to pickIndex()?

Solution


Solution to Weighted Random Index Selection

Problem Description

The problem requires implementing a pickIndex() function that randomly selects an index from an array of positive integers, where each integer represents the weight of its corresponding index. The probability of selecting an index i is proportional to its weight w[i] divided by the sum of all weights. This ensures indices with higher weights are more likely to be selected.

1. Brute Force Approach

A naive approach involves creating a new array where each index i from the original array w is repeated w[i] times. Then, a random index is selected from this expanded array. This selected index corresponds to an index in the original array.

Pros:

  • Simple to understand and implement.

Cons:

  • Inefficient for large weights, as it requires significant memory to store the expanded array.

Example:

If w = [1, 3], the expanded array would be [0, 1, 1, 1]. A random number between 0 and 3 is selected. If 0 is selected, we return 0. If 1, 2, or 3 is selected, we return 1.

Code:

import random

class Solution:
    def __init__(self, w):
        self.expanded_array = []
        for i, weight in enumerate(w):
            self.expanded_array.extend([i] * weight)

    def pickIndex(self):
        return random.choice(self.expanded_array)

Complexity Analysis:

  • Time Complexity:
    • __init__: O(Sum of weights) because we are constructing a new array that contains each index repeated by its weight.
    • pickIndex: O(1) since picking a random element from an array is a constant-time operation.
  • Space Complexity: O(Sum of weights) due to the storage of the expanded array.

2. Optimal Solution: Prefix Sum and Binary Search

A more efficient solution involves calculating the prefix sum of the weights. The prefix sum array represents the cumulative weights up to each index. A random number between 1 and the total sum of weights is generated. Binary search is then used to find the index in the prefix sum array where the random number falls. This index is the selected index.

Algorithm:

  1. Compute the prefix sum array prefix_sums. prefix_sums[i] stores the sum of weights from w[0] to w[i].
  2. Generate a random integer target between 1 and the total sum of weights (inclusive).
  3. Use binary search to find the smallest index idx in prefix_sums such that prefix_sums[idx] >= target. This idx is the index to be returned.

Example:

If w = [1, 3, 2], the prefix sum array would be [1, 4, 6]. The total sum is 6. Let's say the random number generated is 3. We perform binary search on [1, 4, 6] to find the smallest index where the value is greater than or equal to 3. This would be index 1 (value 4), so we return 1.

Code:

import random
import bisect

class Solution:
    def __init__(self, w):
        self.prefix_sums = []
        total_sum = 0
        for weight in w:
            total_sum += weight
            self.prefix_sums.append(total_sum)
        self.total_sum = total_sum

    def pickIndex(self):
        target = random.randint(1, self.total_sum)
        index = bisect.bisect_left(self.prefix_sums, target)
        return index

Complexity Analysis:

  • Time Complexity:
    • __init__: O(N), where N is the length of the input array w, to compute the prefix sums.
    • pickIndex: O(log N), where N is the length of the prefix sums array, due to the binary search.
  • Space Complexity: O(N) to store the prefix sums.

Edge Cases

  • Empty Input: The problem statement specifies 1 <= w.length <= 10^4, so the input array w will not be empty.
  • Single Element: If the input array contains only one element (e.g., w = [1]), the algorithm should return 0 every time. The prefix sum approach correctly handles this case.

Conclusion

The prefix sum and binary search approach is more efficient than the brute-force approach, especially for large weights. It provides a good balance between time and space complexity, making it a suitable solution for this problem.