Taro Logo

Degree of an Array #4 Most Asked

Easy
1 view
Topics:
Arrays

Given a non-empty array of non-negative integers nums, the degree of this array is defined as the maximum frequency of any one of its elements.

Your task is to find the smallest possible length of a (contiguous) subarray of nums, that has the same degree as nums.

Example 1:

Input: nums = [1,2,2,3,1]
Output: 2
Explanation: 
The input array has a degree of 2 because both elements 1 and 2 appear twice.
Of the subarrays that have the same degree:
[1, 2, 2, 3, 1], [1, 2, 2, 3], [2, 2, 3, 1], [1, 2, 2], [2, 2, 3], [2, 2]
The shortest length is 2. So return 2.

Example 2:

Input: nums = [1,2,2,3,1,4,2]
Output: 6
Explanation: 
The degree is 3 because the element 2 is repeated 3 times.
So [2,2,3,1,4,2] is the shortest subarray, therefore returning 6.

Constraints:

  • nums.length will be between 1 and 50,000.
  • nums[i] will be an integer between 0 and 49,999.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum possible length of the input array `nums`, and what is the maximum possible value of an element within `nums`?
  2. If multiple subarrays have the minimum length and the same degree as the original array, is any one of them acceptable as the solution?
  3. What should I return if the input array is empty or null? Can I assume the input array will always be non-empty as stated in the description?
  4. Does the array `nums` contain only non-negative integers as stated, or could it contain negative integers, floating-point numbers, or other data types?
  5. Could there be multiple elements that have the same maximum frequency (i.e., contribute to the degree), and if so, does the subarray need to contain all of them?

Brute Force Solution

Approach

The brute force method for the Degree of an Array problem involves checking every possible sub-section of the data. We identify the most frequent value, and then look at every possible sub-section to find the shortest one that contains all occurrences of that value.

Here's how the algorithm would work step-by-step:

  1. First, find the value that appears most often in the whole dataset.
  2. Next, consider every possible piece of the dataset, starting with the smallest pieces.
  3. For each piece, check if it contains all the instances of the most frequent value.
  4. If a piece does contain all instances, remember its size.
  5. Continue checking all the pieces until you've looked at every possibility.
  6. Finally, choose the smallest size you found among the pieces that contained all instances of the most frequent value. This is your answer.

Code Implementation

def degree_of_an_array_brute_force(numbers):
    most_frequent_value = 0
    max_frequency = 0
    frequency_map = {}

    for number in numbers:
        if number not in frequency_map:
            frequency_map[number] = 0
        frequency_map[number] += 1
        if frequency_map[number] > max_frequency:
            max_frequency = frequency_map[number]
            most_frequent_value = number

    shortest_sub_array_length = len(numbers)

    # Iterate through all possible sub arrays to find the shortest one containing the most frequent element
    for sub_array_start_index in range(len(numbers)):
        for sub_array_end_index in range(sub_array_start_index, len(numbers)): 
            sub_array = numbers[sub_array_start_index:sub_array_end_index + 1]

            sub_array_frequency_map = {}
            for number in sub_array:
                if number not in sub_array_frequency_map:
                    sub_array_frequency_map[number] = 0
                sub_array_frequency_map[number] += 1

            # Check if the sub array contains all instances of the most frequent value
            if most_frequent_value in sub_array_frequency_map and sub_array_frequency_map[most_frequent_value] == frequency_map[most_frequent_value]:
                #Keep track of the smallest length so far
                shortest_sub_array_length = min(shortest_sub_array_length, len(sub_array))

    return shortest_sub_array_length

Big(O) Analysis

Time Complexity
O(n³)Finding the most frequent value requires iterating through the array once, taking O(n) time. Generating all possible subarrays involves nested loops. The outer loop iterates from the start of the array (0 to n-1), and the inner loop iterates from the current position of the outer loop to the end of the array. For each subarray, we check if it contains all occurrences of the most frequent element, which itself takes O(n) time in the worst case. Thus, we have roughly n * n (for subarrays) * n (to check if the subarray contains all instances of the most frequent value), which results in O(n³).
Space Complexity
O(1)The brute force method, as described, primarily iterates through sub-arrays of the input array. It identifies the most frequent value and checks sub-arrays, but it does not explicitly create new data structures that scale with the input size N. The algorithm stores the size of the shortest sub-array found so far and potentially a counter, but these variables use constant space. Therefore, the auxiliary space complexity remains constant regardless of the input size.

Optimal Solution

Approach

To find the 'degree' of an array (the most frequent number) and the smallest part of the array that contains all instances of that number, we use a clever tracking system. We remember the first and last seen position of each number, and how many times each number appears. This lets us quickly check only the shortest parts that contain the most common number.

Here's how the algorithm would work step-by-step:

  1. First, walk through the array and count how many times each number shows up.
  2. While counting, also remember the first time each number appears and the last time it appears.
  3. Find out which number (or numbers) appear most often; this is the 'degree'.
  4. Look at only the sections of the array that start at the first appearance and end at the last appearance of the most frequent number.
  5. Out of all those sections, find the shortest one. This is the answer because it's the smallest part of the array that still contains all instances of the most frequent number.

Code Implementation

def find_shortest_subarray(nums):
    first_seen = {}
    last_seen = {}
    frequency_count = {}

    for index, number in enumerate(nums):
        if number not in first_seen:
            first_seen[number] = index
        last_seen[number] = index

        frequency_count[number] = frequency_count.get(number, 0) + 1

    degree = 0
    for number in frequency_count:
        degree = max(degree, frequency_count[number])

    most_frequent_numbers = [
        number for number, count in frequency_count.items() if count == degree
    ]

    minimum_length = float('inf')

    # We need to only check the most frequent numbers.
    for number in most_frequent_numbers:

        first_index = first_seen[number]
        last_index = last_seen[number]

        # Calculate the length of the subarray containing all occurrences
        current_length = last_index - first_index + 1

        # Update the minimum length if the current length is smaller
        minimum_length = min(minimum_length, current_length)

    return minimum_length

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input array of size n a few times. The first pass counts frequencies and tracks first/last occurrences, taking O(n) time. Finding the maximum frequency and corresponding numbers also takes O(n) time. Finally, iterating through the relevant subarrays (defined by first and last occurrences of most frequent numbers) to find the shortest length is also bounded by O(n), as the number of relevant subarrays and their lengths are limited by the original array size. Therefore, the overall time complexity is dominated by these linear operations, resulting in O(n).
Space Complexity
O(N)The algorithm uses three hash maps (or dictionaries): one to store the frequency of each number, one to store the first occurrence of each number, and one to store the last occurrence of each number. In the worst case, all N elements of the array are distinct, so each of these hash maps will store N entries. Therefore, the auxiliary space used is proportional to the number of elements in the input array, N. The total extra memory approximates 3N, which simplifies to O(N).

Edge Cases

Empty or null input array
How to Handle:
Return 0 if the input array is empty or null as there is no subarray with any degree.
Array with a single element
How to Handle:
Return 1 as the array itself is the shortest subarray with degree 1.
Array with all elements having the same value
How to Handle:
Return 1 as any single element constitutes the shortest subarray with the maximum degree.
Array with elements having a skewed distribution (one element dominates)
How to Handle:
The solution should correctly identify the degree and find the shortest subarray containing that dominant element.
Array with multiple elements sharing the same maximum frequency (same degree)
How to Handle:
The solution should iterate and check the shortest subarray for each element with the maximum frequency and return the minimum length amongst them.
Large input array with large integer values
How to Handle:
The solution's space complexity using hash maps should be considered for very large arrays, and avoid potential Integer overflow during length calculations.
Array containing only zeros
How to Handle:
The solution handles this like any other array where a value appears multiple times, determining the degree and shortest subarray length correctly.
Integer overflow potential when calculating lengths/indices
How to Handle:
Use long to store length and index differences to prevent potential overflow issues, especially with very large arrays.
0/0 completed