Taro Logo

Delete Columns to Make Sorted III

Hard
Google logo
Google
2 views
Topics:
ArraysStringsDynamic Programming

You are given an array of n strings strs, all of the same length.

We may choose any deletion indices, and we delete all the characters in those indices for each string.

For example, if we have strs = ["abcdef","uvwxyz"] and deletion indices {0, 2, 3}, then the final array after deletions is ["bef", "vyz"].

Suppose we chose a set of deletion indices answer such that after deletions, the final array has every string (row) in lexicographic order. (i.e., (strs[0][0] <= strs[0][1] <= ... <= strs[0][strs[0].length - 1]), and (strs[1][0] <= strs[1][1] <= ... <= strs[1][strs[1].length - 1]), and so on). Return the minimum possible value of answer.length.

Example 1:

Input: strs = ["babca","bbazb"]
Output: 3
Explanation: After deleting columns 0, 1, and 4, the final array is strs = ["bc", "az"].
Both these rows are individually in lexicographic order (ie. strs[0][0] <= strs[0][1] and strs[1][0] <= strs[1][1]).
Note that strs[0] > strs[1] - the array strs is not necessarily in lexicographic order.

Example 2:

Input: strs = ["edcba"]
Output: 4
Explanation: If we delete less than 4 columns, the only row will not be lexicographically sorted.

Example 3:

Input: strs = ["ghi","def","abc"]
Output: 0
Explanation: All rows are already lexicographically sorted.

Constraints:

  • n == strs.length
  • 1 <= n <= 100
  • 1 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.
  •  

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a string in the input array, and what is the maximum length of the input array itself?
  2. Are the strings in the input array guaranteed to consist of only lowercase English letters?
  3. If multiple sets of columns can be deleted to make the remaining columns sorted lexicographically, should I return the minimum number of columns to delete or just any valid solution?
  4. If the input array is empty or contains only empty strings, what should the function return?
  5. By 'sorted', do you mean non-decreasing lexicographical order for each remaining column after the deletion?

Brute Force Solution

Approach

Imagine you have a set of columns of letters, and you want to keep the fewest number of columns so that the remaining columns are sorted alphabetically from top to bottom. The brute force approach involves trying every possible combination of column removals. We meticulously check each combination to see if the remaining columns are sorted.

Here's how the algorithm would work step-by-step:

  1. Start by considering all possible sets of columns to keep, one at a time. This means looking at keeping all columns, then all but one column, then all but two columns, and so on, down to keeping only a single column.
  2. For each set of columns you're considering, check if the remaining letters in each row are in alphabetical order. This means, in each of your remaining columns, the letters should be in increasing order from top to bottom.
  3. If a set of columns results in all the rows being in alphabetical order, record the number of columns that were kept in this 'good' set. Note that a 'good' set satisfies the constraint of sorted rows.
  4. After checking all possible sets of columns, find the 'good' set with the largest number of columns kept. This 'good' set is the best one.
  5. Report the number of columns in the set with the largest number of columns.

Code Implementation

def minDeletionSizeBruteForce(strings):
    number_of_columns = len(strings[0])
    max_number_of_columns_to_keep = 0

    # Iterate through all possible subsets of columns
    for i in range(1 << number_of_columns):
        columns_to_keep = []
        for column_index in range(number_of_columns):
            if (i >> column_index) & 1:
                columns_to_keep.append(column_index)

        is_sorted = True
        # Verify that the selected columns are sorted
        for row_index in range(len(strings)):
            # Construct the string from selected indices
            constructed_string = ''
            for column_index in columns_to_keep:
                constructed_string += strings[row_index][column_index]

            # This checks if a specific row is sorted
            for char_index in range(len(constructed_string) - 1):
                if constructed_string[char_index] > constructed_string[char_index + 1]:
                    is_sorted = False
                    break

            if not is_sorted:
                break

        if is_sorted:
            # Update the maximum number of columns if valid
            max_number_of_columns_to_keep = max(max_number_of_columns_to_keep, len(columns_to_keep))

    # Return the number of columns to delete to get the result
    return number_of_columns - max_number_of_columns_to_keep

Big(O) Analysis

Time Complexity
O(2^n * m * n)The algorithm iterates through all possible subsets of columns, where n is the number of columns. This generates 2^n subsets. For each subset, we need to verify if the columns are sorted. Checking if a specific combination of columns is sorted involves iterating through the rows (m rows). Within each row, we have up to n comparisons (where n is the number of columns) because we need to make sure each of the considered columns are in sorted order from top to bottom. Therefore, the total time complexity is O(2^n * m * n).
Space Complexity
O(1)The provided brute force approach, as described, primarily involves iterating through combinations and performing comparisons. While the algorithm explores many subsets, it does not inherently create any auxiliary data structures whose size scales with the input size N (where N would likely represent the number of columns or the total number of characters in the input). The comparisons and tracking of the maximum 'good' set can be done using a fixed number of variables, independent of N. Therefore, the auxiliary space used is constant. Hence, the space complexity is O(1).

Optimal Solution

Approach

The problem is about finding the longest sequence of columns where the rows are sorted. We'll use a clever method called dynamic programming to avoid trying every single combination. This method remembers the best result so far and builds upon it, ultimately giving us the answer efficiently.

Here's how the algorithm would work step-by-step:

  1. Think of each column as a potential ending point for a sorted sequence.
  2. Start from the first column and see if adding it to a sequence keeps the rows in sorted order.
  3. If adding the column keeps everything sorted, then it extends the sequence. Remember how long this sorted sequence is.
  4. Repeat this for every column: check if adding it extends an existing sequence. If it does, update the length of the best sequence ending at that column.
  5. Keep track of the longest sorted sequence you've found so far. This is the key to the optimal solution.
  6. In the end, the length of the longest sorted sequence tells you the maximum number of columns you can keep, and you can easily calculate how many columns you need to remove.

Code Implementation

def min_deletion_size(strings):
    number_of_columns = len(strings[0])
    dp_table = [1] * number_of_columns

    # Iterate through each column to find the longest sorted sequence.
    for i in range(1, number_of_columns):
        for j in range(i):

            # Check if adding column i extends a sorted sequence ending at j.
            if all(strings[row][j] <= strings[row][i] for row in range(len(strings))):
                dp_table[i] = max(dp_table[i], dp_table[j] + 1)

    # The length of longest increasing subsequence.
    max_sorted_columns = max(dp_table)

    # We subtract the length from the total columns to know what to return.
    number_of_columns_to_delete = number_of_columns - max_sorted_columns

    return number_of_columns_to_delete

Big(O) Analysis

Time Complexity
O(n*m)We iterate through each of the 'n' columns. For each column, we compare it against all the preceding columns to check if adding it to an existing sorted sequence is possible. This check requires comparing 'm' rows in the columns to ensure sorted order. Therefore, the overall time complexity is O(n*m), where n is the number of columns and m is the number of rows.
Space Complexity
O(N)The algorithm uses dynamic programming, where we keep track of the length of the longest sorted sequence ending at each column. This requires an auxiliary array, dp, of size N, where N is the number of columns in the input array. Each element dp[i] stores the length of the longest sorted sequence ending at column i. Therefore, the space complexity is proportional to the number of columns, resulting in O(N) auxiliary space.

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn 0 if the input array is null or empty as there are no columns to potentially delete.
Array with only one stringReturn 0 because a single string is always sorted.
Array with strings of different lengthsHandle by comparing characters only up to the shortest string's length, and ensure the overall longest increasing subsequence is calculated correctly.
All strings are identicalThe longest increasing subsequence will be the length of the input array since all columns are non-decreasing.
All strings are in strictly decreasing orderThe longest increasing subsequence will be 1 because no columns can be kept without deleting others.
Very long strings (potential for memory issues)The dynamic programming approach should be efficient enough with respect to memory; consider the max length of the string when allocating memory.
Large number of strings (scalability concerns)Ensure dynamic programming is optimized to avoid unnecessary computations, as the runtime is O(n^2) where n is the number of strings, potentially requiring further optimization for extremely large input.
Strings contain non-ASCII charactersThe character comparison logic needs to correctly handle Unicode characters, potentially using appropriate string comparison functions based on the language used.