Taro Logo

Random Pick with Weight

Medium
Uber logo
Uber
3 views
Topics:
ArraysBinary Search

You are given a 0-indexed array of positive integers w where w[i] describes the weight of the i<sup>th</sup> index.

You need to implement the function pickIndex(), which randomly picks an index in the range [0, w.length - 1] (inclusive) and returns it. The probability of picking an index i is w[i] / sum(w).

  • For example, if w = [1, 3], the probability of picking index 0 is 1 / (1 + 3) = 0.25 (i.e., 25%), and the probability of picking index 1 is 3 / (1 + 3) = 0.75 (i.e., 75%).

Example 1:

Input: ["Solution","pickIndex"] [[[1]],[]] Output: [null,0]

Explanation: Solution solution = new Solution([1]); solution.pickIndex(); // return 0. The only option is to return 0 since there is only one element in w.

Example 2:

Input: ["Solution","pickIndex","pickIndex","pickIndex","pickIndex","pickIndex"] [[[1,3]],[],[],[],[],[]] Output: [null,1,1,1,1,0]

Explanation: Solution solution = new Solution([1, 3]); solution.pickIndex(); // return 1. It is returning the second element (index = 1) that has a probability of 3/4. solution.pickIndex(); // return 1 solution.pickIndex(); // return 1 solution.pickIndex(); // return 1 solution.pickIndex(); // return 0. It is returning the first element (index = 0) that has a probability of 1/4.

Since this is a randomization problem, multiple answers are allowed. All of the following outputs can be considered correct: [null,1,1,1,1,0] [null,1,1,1,1,1] [null,1,1,1,0,0] [null,1,1,1,0,1] [null,1,0,1,0,0] and so on.

Constraints:

  • 1 <= w.length <= 10^4
  • 1 <= w[i] <= 10^5
  • pickIndex will be called at most 10^4 times.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range for the values within the `weights` array?
  2. Can the `weights` array be empty, or contain zero values?
  3. How many times will the pickIndex() function be called in a typical usage scenario, and what is the expected scale of the data?
  4. If the sum of all weights overflows, how should that be handled?
  5. Is the array of weights guaranteed to be non-null?

Brute Force Solution

Approach

The basic idea is to simulate the picking process many, many times. We will directly use the given weights to guide our random selection in each simulated pick, until we get our answer.

Here's how the algorithm would work step-by-step:

  1. Imagine we have a bag of items, where each item has a weight attached to it; a higher weight means the item is more likely to be picked.
  2. To simulate a random pick, we consider the weight of each item as the number of times that specific item appears in our metaphorical bag.
  3. We generate a random number that represents a position inside this bag.
  4. We then go through each item to figure out which item this randomly selected position corresponds to. It's like walking through the bag and counting items until we reach our selected position.
  5. The item whose weight range includes that random number is the item that gets picked.
  6. We repeat this process every time we need to make a random pick.

Code Implementation

import random

class RandomPickWithWeight:

    def __init__(self, weights):
        self.weights = weights

    def pickIndex(self):
        # Calculate the total weight to define the 'bag' size
        total_weight = sum(self.weights)

        # Generate a random number within the range of the total weight
        random_number = random.randint(1, total_weight)

        current_weight_sum = 0

        # Iterate through the weights to find the corresponding index
        for index, weight in enumerate(self.weights):
            current_weight_sum += weight

            # Check if the random number falls within the current weight range
            if random_number <= current_weight_sum:
                return index

Big(O) Analysis

Time Complexity
O(n)The constructor precalculates a prefix sum array of size n, where n is the number of weights. Picking an index involves generating a random number and then iterating through the prefix sum array to find the index whose range contains the random number. In the worst case, the iteration goes through all n elements of the prefix sum array. Therefore, the pick operation has a time complexity of O(n).
Space Complexity
O(N)The algorithm implicitly constructs a cumulative sum array representing the 'bag' of items. The size of this 'bag' conceptually corresponds to the sum of weights, which, in the worst-case scenario, can be represented by an array of size N, where N is the number of items (or weights). Therefore, the space complexity is proportional to the number of weights provided as input, which results in O(N) auxiliary space.

Optimal Solution

Approach

This problem asks us to pick a random item, but not with equal chances. Items with higher weights should be picked more often. We can solve it by creating a range for each item based on its weight and then figuring out which range a randomly generated number falls into.

Here's how the algorithm would work step-by-step:

  1. First, calculate the total weight of all the items by adding up their individual weights.
  2. Next, create ranges for each item. The size of each range corresponds to that item's weight. For example, if an item has a weight of 2 and the total weight is 10, its range would take up 2/10 of the total number line.
  3. To create the ranges, make a list of the cumulative sums of the weights. Each entry in this list represents the end of a range.
  4. Generate a random number between 1 and the total weight.
  5. Find the range that the random number falls into. This tells you which item to pick. We can use an efficient search method to quickly find the range.
  6. The item corresponding to that range is your randomly picked item, weighted appropriately.

Code Implementation

import random

class Solution:

    def __init__(self, weights):
        self.weights = weights
        self.cumulative_sum_of_weights = []
        total_weight = 0
        for weight in weights:
            total_weight += weight
            self.cumulative_sum_of_weights.append(total_weight)

        self.total_weight = total_weight

    def pickIndex(self):
        # Generate a random number within the total weight range.
        random_number = random.randint(1, self.total_weight)

        left = 0
        right = len(self.weights) - 1

        # Binary search to find the index corresponding to random number.
        while left < right:
            mid = left + (right - left) // 2
            if self.cumulative_sum_of_weights[mid] < random_number:
                left = mid + 1
            else:
                right = mid

        # Return the index where the random number falls.
        return left

Big(O) Analysis

Time Complexity
O(n)The initialization involves calculating the cumulative sum of weights, which requires iterating through the input array of weights once. This takes O(n) time, where n is the number of weights. The pickIndex function generates a random number in O(1) and then performs a binary search on the cumulative sum array. Binary search takes O(log n) time. Therefore, the overall time complexity for the initialization is O(n) and the pickIndex function is O(log n), however, we are only assessing the first time to initialize the data. The initialization dominates the complexity making it O(n).
Space Complexity
O(N)The solution creates a list of cumulative sums of the weights. This list stores the cumulative weight up to each item, so the size of this list grows linearly with the number of items (N). Therefore, the auxiliary space required is proportional to the number of items, N, resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
Null or empty weights arrayThrow an IllegalArgumentException or return -1 indicating an invalid state, as no index can be chosen.
weights array with a single elementReturn 0 since it's the only possible index with 100% probability.
All weights are zeroThrow an IllegalArgumentException or return -1 as it's impossible to pick an index proportionally.
Large weights causing potential integer overflow during cumulative sum calculationUse long data type for cumulative sums to avoid integer overflow.
Maximum-sized weights array (performance consideration)Use binary search on the cumulative weights to ensure logarithmic time complexity for picking an index.
Weights array containing extremely small positive values approaching zeroThe algorithm should work as intended since weights are positive, but consider possible floating-point precision issues if using floats/doubles (should be ints).
Weights array with identical valuesThe algorithm should still correctly pick an index with uniform probability, as the cumulative weights will increase linearly.
Edge case of input array exceeding memory limitsConsider using an alternative approach that doesn't require storing the entire cumulative sum array if memory is constrained.