Taro Logo

Random Pick with Weight

Medium
Two Sigma logo
Two Sigma
0 views
Topics:
ArraysBinary Search

You are given a 0-indexed array of positive integers w where w[i] describes the weight of the ith index.

You need to implement the function pickIndex(), which randomly picks an index in the range [0, w.length - 1] (inclusive) and returns it. The probability of picking an index i is w[i] / sum(w).

  • For example, if w = [1, 3], the probability of picking index 0 is 1 / (1 + 3) = 0.25 (i.e., 25%), and the probability of picking index 1 is 3 / (1 + 3) = 0.75 (i.e., 75%).

Example 1:

Input
["Solution","pickIndex"]
[[[1]],[]]
Output
[null,0]

Explanation
Solution solution = new Solution([1]);
solution.pickIndex(); // return 0. The only option is to return 0 since there is only one element in w.

Example 2:

Input
["Solution","pickIndex","pickIndex","pickIndex","pickIndex","pickIndex"]
[[[1,3]],[],[],[],[],[]]
Output
[null,1,1,1,1,0]

Explanation
Solution solution = new Solution([1, 3]);
solution.pickIndex(); // return 1. It is returning the second element (index = 1) that has a probability of 3/4.
solution.pickIndex(); // return 1
solution.pickIndex(); // return 1
solution.pickIndex(); // return 1
solution.pickIndex(); // return 0. It is returning the first element (index = 0) that has a probability of 1/4.

Since this is a randomization problem, multiple answers are allowed.
All of the following outputs can be considered correct:
[null,1,1,1,1,0]
[null,1,1,1,1,1]
[null,1,1,1,0,0]
[null,1,1,1,0,1]
[null,1,0,1,0,0]
......
and so on.

Constraints:

  • 1 <= w.length <= 104
  • 1 <= w[i] <= 105
  • pickIndex will be called at most 104 times.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range of values for the weights in the `weights` array?
  2. Can the `weights` array be empty or null? If so, what should I return?
  3. Is the `weights` array guaranteed to only contain positive integers?
  4. If the random number generation results in a fractional index, how should I map that to an integer index for the return value?
  5. How many times will the pickIndex() function be called after the initial setup? This will help me understand the performance requirements.

Brute Force Solution

Approach

Imagine you have several boxes, each with a different size. The goal is to pick one box at random, but bigger boxes should have a higher chance of being picked. The brute force approach involves explicitly representing the larger boxes more often to reflect their higher probability.

Here's how the algorithm would work step-by-step:

  1. Create a very long list where each box is represented a number of times equal to its size.
  2. For example, a box of size 3 would be listed three times.
  3. Once this long list is created, pick a position at random from that list.
  4. The box at that randomly selected position is the box you choose.

Code Implementation

import random

class RandomPickWithWeight:

    def __init__(self, weights):
        self.expanded_list = []
        # Create the expanded list based on weights
        for index, weight in enumerate(weights):

            # Populate the list with index repeated by weight
            for _ in range(weight):
                self.expanded_list.append(index)

    def pickIndex(self):

        # Choose a random index from the expanded list
        random_index = random.randint(0, len(self.expanded_list) - 1)

        # Return element at randomly chosen index
        return self.expanded_list[random_index]

Big(O) Analysis

Time Complexity
O(W)The time complexity is determined by the process of creating the cumulative weight list. Let W be the sum of all weights in the input array. We need to iterate through the input array of weights and for each weight, we append its index a number of times equal to its weight to the cumulative weight list. The length of this cumulative weight list will be equal to W, so appending elements takes O(W) time. Picking a random index from the list of size W then takes O(1) time. Thus, the dominant operation is constructing the weighted list, which takes O(W) time.
Space Complexity
O(N)The described brute-force approach constructs a long list where each box is represented a number of times equal to its weight. Therefore, if the weights are w1, w2, ..., wN, the list's length will be w1 + w2 + ... + wN. The auxiliary space used is proportional to the sum of the weights, which in the worst case can be considered proportional to N if each weight has a value of 1 or the sum of the weights is limited by a factor of N. Thus, the space complexity is O(N).

Optimal Solution

Approach

We want to randomly pick something, but some things are more likely to be picked than others. The trick is to turn these different probabilities into a structure that makes picking easy and fast.

Here's how the algorithm would work step-by-step:

  1. First, add up all the weights to get a total. Think of it like finding the total number of lottery tickets.
  2. Then, for each item, find its cumulative weight. This means adding its weight to the weights of all the items before it. Imagine marking each lottery ticket with the total number of tickets sold up to that point.
  3. Now, generate a random number between zero and the total weight. This is like drawing a random number for the winning ticket.
  4. Use a special search (like a binary search) to find the item whose cumulative weight is just greater than or equal to the random number. This is like quickly finding the ticket that matches the winning number.
  5. The item you find is the one you randomly pick, based on its weight. The heavier something is, the more likely it is to have its range include the winning number.

Code Implementation

import random

class RandomPickWithWeight:

    def __init__(self, weights):
        self.cumulative_weights = []
        total_weight = 0

        # Calculating cumulative weights for each item
        for weight in weights:
            total_weight += weight
            self.cumulative_weights.append(total_weight)
        self.total_weight = total_weight

    def pickIndex(self):
        # Generate a random number within the total weight range
        random_number = random.random() * self.total_weight

        left, right = 0, len(self.cumulative_weights) - 1

        # Binary search to find the index corresponding to the random number
        while left <= right:
            mid = (left + right) // 2
            if self.cumulative_weights[mid] < random_number:
                left = mid + 1
            else:
                right = mid - 1

        # We return the left pointer, which will be the appropriate index
        return left

Big(O) Analysis

Time Complexity
O(log n)The initialization phase involves calculating the cumulative sums of the weights array, which takes O(n) time. However, the random pick operation dominates the time complexity. It involves generating a random number in O(1) and then performing a binary search on the cumulative weights array to find the appropriate index. Binary search has a time complexity of O(log n), where n is the number of weights. Therefore, the overall time complexity for the pick operation is O(log n).
Space Complexity
O(N)The algorithm constructs a cumulative weight array to store the cumulative sum of weights up to each index. This array has the same size as the input weights array, which we denote as N. Therefore, the auxiliary space required is directly proportional to the input size, N. The binary search itself uses constant space, but the dominant factor is the cumulative weight array.

Edge Cases

CaseHow to Handle
Null or empty weights arrayThrow an IllegalArgumentException or return -1 to indicate invalid input.
Weights array contains only one elementAlways return index 0 since it's the only valid choice.
Weights array with very large numbers leading to potential integer overflow in prefix sum calculationUse long data type to store prefix sums to avoid overflow.
Weights array contains all zero valuesThrow an IllegalArgumentException or return -1, as no index can be selected proportionally.
Weights array with very skewed distribution (one weight is significantly larger than others)The prefix sum and binary search approach will still work, but the large weight will be picked more often.
Maximum sized input array (limited by memory constraints)Ensure that the prefix sum array does not exceed available memory, potentially optimizing data structures.
Extreme boundary values for weights (e.g., Integer.MAX_VALUE)Handle potential overflow when calculating prefix sums, potentially using long data type or modular arithmetic.
Weights array contains negative numbersThrow an IllegalArgumentException as weights should be positive.