Minimum Operations to Make a Subsequence

Hard

Google

2 views

Topics:

ArraysBinary SearchDynamic Programming

You are given an array target that consists of distinct integers and another integer array arr that can have duplicates.

In one operation, you can insert any integer at any position in arr. For example, if arr = [1,4,1,2], you can add 3 in the middle and make it [1,4,3,1,2]. Note that you can insert the integer at the very beginning or end of the array.

Return the minimum number of operations needed to make target a subsequence of arr.

A subsequence of an array is a new array generated from the original array by deleting some elements (possibly none) without changing the remaining elements' relative order. For example, [2,7,4] is a subsequence of [4,2,3,7,2,1,4] (the underlined elements), while [2,4,2] is not.

Example 1:

target = [5,1,3], arr = [9,4,2,3,4]
Output: 2
Explanation: You can add 5 and 1 in such a way that makes arr = [5,9,4,1,2,3,4], then target will be a subsequence of arr.

Example 2:

target = [6,4,8,1,3,2], arr = [4,7,6,2,3,8,6,1]
Output: 3

Constraints:

1 <= target.length, arr.length <= 10^5
1 <= target[i], arr[i] <= 10^9
target contains no duplicates.

Minimum Operations to Make a Subsequence

Hard

Google

2 views

Topics:

ArraysBinary SearchDynamic Programming

You are given an array target that consists of distinct integers and another integer array arr that can have duplicates.

In one operation, you can insert any integer at any position in arr. For example, if arr = [1,4,1,2], you can add 3 in the middle and make it [1,4,3,1,2]. Note that you can insert the integer at the very beginning or end of the array.

Return the minimum number of operations needed to make target a subsequence of arr.

A subsequence of an array is a new array generated from the original array by deleting some elements (possibly none) without changing the remaining elements' relative order. For example, [2,7,4] is a subsequence of [4,2,3,7,2,1,4] (the underlined elements), while [2,4,2] is not.

Example 1:

target = [5,1,3], arr = [9,4,2,3,4]
Output: 2
Explanation: You can add 5 and 1 in such a way that makes arr = [5,9,4,1,2,3,4], then target will be a subsequence of arr.

Example 2:

target = [6,4,8,1,3,2], arr = [4,7,6,2,3,8,6,1]
Output: 3

Constraints:

1 <= target.length, arr.length <= 10^5
1 <= target[i], arr[i] <= 10^9
target contains no duplicates.

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What are the possible ranges for the integer values within `target` and `arr`?
Can either `target` or `arr` be empty or null?
Does the `target` sequence need to be a contiguous subsequence, or can the elements be non-contiguous but in the correct order?
Are there any duplicate values within `target` or `arr`, and if so, how should they be handled?
If no subsequence of `arr` matches `target`, what value should I return?

Brute Force Solution

Approach

We want to find the fewest changes needed in one list to make a specific smaller list appear within it. The brute force way is to try matching the small list at every possible place in the bigger list, even if it means skipping elements in the bigger list. We then count the differences for each try and pick the attempt with the least differences.

Here's how the algorithm would work step-by-step:

Start by lining up the first item in the small list with the first item in the big list.
See how many changes it would take to make the small list a matching part of the big list, allowing you to skip items in the big list.
Now, shift the small list over by one position, so its first item lines up with the second item of the big list.
Again, count the necessary changes, remembering you can skip items in the big list.
Keep shifting the small list one position at a time along the big list, repeating the counting process.
After trying every possible starting position, compare all the change counts you recorded.
The smallest change count represents the minimum operations needed to make the small list a subsequence of the big list.

Code Implementation

def min_operations_subsequence_brute_force(target_array, source_array):
    minimum_operations = float('inf')

    # Iterate through all possible starting positions in the source array
    for source_start_index in range(len(source_array)):
        operations = 0
        target_index = 0
        source_index = source_start_index

        # Attempt to match the target array as a subsequence
        while target_index < len(target_array) and source_index < len(source_array):
            if target_array[target_index] == source_array[source_index]:
                target_index += 1

            source_index += 1

        # Count operations if target array isn't fully matched
        if target_index < len(target_array):
            operations = len(target_array) - target_index

        minimum_operations = min(minimum_operations, operations)

    # We must remove the length of target array because those matches don't count as operations
    return minimum_operations

Big(O) Analysis

Time Complexity

O(m*n) – The algorithm iterates through the 'nums' array of size 'n' using a loop, effectively trying each possible starting position for the 'target' array of size 'm' as a subsequence. Within each of these 'n' iterations, we attempt to match elements of 'target' in 'nums', potentially scanning through 'nums' again in the worst case if the 'target' element is not found nearby. Therefore, in the worst-case scenario, for each starting position in 'nums', we perform a potentially linear search of 'nums' relative to size 'm' where the entire nums needs to be scanned. This leads to approximately m*n operations.

Space Complexity

O(1) – The provided plain English explanation describes a brute-force approach that primarily involves iterating and comparing elements. It doesn't explicitly mention any auxiliary data structures like lists, hash maps, or recursion. The algorithm seems to rely on a few counter variables to track changes and shifts, and possibly variables to store the minimum change count found so far. These variables occupy a constant amount of space, irrespective of the sizes of the input lists. Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The goal is to find the longest possible sequence of numbers from the special list that also appears in the regular list. To do this efficiently, we'll focus on finding the longest increasing subsequence within the regular list, considering only the numbers that appear in the special list.

Here's how the algorithm would work step-by-step:

First, make a new list containing only the positions of the numbers from the special list as they appear in the regular list.
Next, find the longest increasing subsequence within this list of positions.
The length of this longest increasing subsequence tells you the maximum number of elements you can keep from the special list without changing their order in the regular list.
Finally, subtract the length of the longest increasing subsequence from the total length of the special list. This difference represents the minimum number of operations needed to make the special list a subsequence of the regular list.

Code Implementation

def minimum_operations_to_make_subsequence(special_array, regular_array):
    position_list = []
    
    # Build index array: pos[i] is index of special_array[i] in regular_array
    for special_element in special_array:
        try:
            position_list.append(regular_array.index(special_element))
        except ValueError:
            # If element isn't found, it doesn't contribute to LIS
            pass

    # Use patience sorting to find the longest increasing subsequence (LIS)
    tails = []
    for position in position_list:
        if not tails or position > tails[-1]:
            tails.append(position)
        else:
            # Binary search to find the smallest tail >= current number.
            left_index = 0
            right_index = len(tails) - 1
            while left_index <= right_index:
                middle_index = (left_index + right_index) // 2
                if tails[middle_index] < position:
                    left_index = middle_index + 1
                else:
                    right_index = middle_index - 1
            tails[left_index] = position

    # The length of LIS gives elements to keep without operations.
    longest_increasing_subsequence_length = len(tails)

    # Return the minimum operations needed.
    return len(special_array) - longest_increasing_subsequence_length

Big(O) Analysis

Time Complexity

O(n log n) – Creating the position list iterates through the target array once, taking O(n) time where n is the length of the target array. Finding the Longest Increasing Subsequence (LIS) in the position list, which has a maximum length of m (length of the source array), typically employs a binary search based approach, resulting in O(m log m) complexity. Since m <= n, the overall complexity is dominated by finding the LIS, hence O(n log n). Subtracting the LIS length and calculating the result takes constant time O(1).

Space Complexity

O(N) – The algorithm creates a list of positions based on the `special` list's elements found in the `regular` list; in the worst case, all elements of the `special` list are present in the `regular` list, resulting in a list of size N (where N is the length of the `special` list). The longest increasing subsequence algorithm then operates on this list of positions, requiring an auxiliary array of at most the same size N to store the subsequence. Therefore, the auxiliary space complexity is O(N).

Edge Cases

Case	How to Handle
Empty target array	Return 0, as an empty sequence can always be made with 0 operations.
Empty source array	Return the length of the target array, as every element needs to be inserted.
Target array longer than source array and no subsequence exists	Return the length of the target array because we would need to insert all elements in target into source.
Source and target arrays are identical	Return 0, since the target is already a subsequence.
All elements in target array are not present in the source array	Return the length of the target array, as no elements match and all must be inserted.
Source array contains duplicate elements while target array does not	The LIS algorithm should correctly handle duplicates in the source array while finding the longest increasing subsequence of target indices in source.
Large input arrays that may cause time limit exceeded	Use an efficient Longest Increasing Subsequence (LIS) algorithm with O(n log n) complexity to handle large inputs, specifically binary search.
Integer overflow if elements are too large and the difference between elements is used	The comparison operations should be carefully reviewed and cast to a larger data type when comparing, to prevent potential overflow issues if the elements can be large.