Taro Logo

Longest Common Subsequence Between Sorted Arrays

Medium
Google logo
Google
9 views
Topics:
Arrays

You are given a list of sorted arrays, arrays, where each array contains distinct integers sorted in ascending order. Your task is to find the longest common subsequence that exists in all of the given arrays. A subsequence of an array is a sequence that can be derived from the array by deleting some or no elements without changing the order of the remaining elements.

For example, consider the following input:

arrays = [
  [1, 3, 4, 5, 6],
  [2, 3, 5, 6],
  [1, 2, 3, 6]
]

The longest common subsequence in this case is [3, 6] because both 3 and 6 appear in all three arrays, and no other subsequence longer than 2 exists.

Here are some additional examples to consider:

  1. arrays = [[1, 2, 3], [4, 5, 6]] (Output: [])
  2. arrays = [[1, 2, 3, 4, 5], [1, 2, 5, 7, 9], [1, 3, 4, 5, 8]] (Output: [1, 5])
  3. arrays = [[1, 2, 3], [1, 2, 3], [1, 2, 3]] (Output: [1, 2, 3])

Write a function that takes a list of sorted arrays as input and returns the longest common subsequence as a sorted list of integers. Ensure that the solution is efficient and handles edge cases gracefully, such as empty input arrays or arrays with no common subsequence.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum size of the arrays, and the maximum value of the integers within them?
  2. Can the input arrays be empty or contain null values? If so, how should I handle those cases?
  3. If there are multiple longest common subsequences, is any one acceptable, or is there a specific subsequence I should return (e.g., the lexicographically smallest)?
  4. Although the arrays are sorted individually, is it possible for an integer to appear multiple times within a single array, and if so, how should that impact the result?
  5. What should I return if there is no common subsequence between the arrays? Should I return an empty array, null, or throw an exception?

Brute Force Solution

Approach

The brute force approach for finding the longest common subsequence between sorted arrays involves checking every possible subsequence. We essentially explore all combinations of elements from the arrays to see if they form a common sequence. We then find the longest among these common sequences.

Here's how the algorithm would work step-by-step:

  1. Consider every possible sequence that can be formed from the elements of the first array.
  2. For each of these sequences, check if it is also a valid sequence present in the second array.
  3. If it's present in the second array, check if it's also a valid sequence in the third array, and so on, for all the arrays.
  4. If a sequence is present in all arrays, consider it a common subsequence.
  5. Keep track of all the common subsequences you find.
  6. After checking all possible sequences from the first array, identify the longest common subsequence among the ones you tracked.

Code Implementation

def longest_common_subsequence_brute_force(arrays):
    if not arrays:
        return []

    first_array = arrays[0]
    all_common_subsequences = []

    def is_subsequence(sequence, array):
        array_index = 0
        sequence_index = 0
        while array_index < len(array) and sequence_index < len(sequence):
            if array[array_index] == sequence[sequence_index]:
                sequence_index += 1
            array_index += 1
        return sequence_index == len(sequence)

    def generate_subsequences(array, index, current_subsequence):
        # Base case: When we have processed all elements of the array
        if index == len(array):
            if current_subsequence:
                is_common = True
                for other_array in arrays[1:]:
                    if not is_subsequence(current_subsequence, other_array):
                        is_common = False
                        break

                # Add to common subsequences if valid
                if is_common:
                    all_common_subsequences.append(current_subsequence)
            return

        # Recursive case 1: Exclude the current element
        generate_subsequences(array, index + 1, current_subsequence)

        # Recursive case 2: Include the current element
        generate_subsequences(array, index + 1, current_subsequence + [array[index]])

    generate_subsequences(first_array, 0, [])

    longest_subsequence = []
    for subsequence in all_common_subsequences:
        # Find the maximum common subsequence.
        if len(subsequence) > len(longest_subsequence):
            longest_subsequence = subsequence

    return longest_subsequence

Big(O) Analysis

Time Complexity
O(N * 2^N * M)The brute force approach involves generating all possible subsequences from the first array, which can have 2^N subsequences where N is the length of the first array. For each of these subsequences, we need to check if it exists as a subsequence in the other M arrays. Checking if a subsequence exists in another array of length N can take O(N) time in the worst case. Therefore, the overall time complexity is O(2^N * N * M), where M is the number of arrays and we are assuming all arrays have similar length N. We can approximate the length check with a more complete traversal taking O(N) time, leading to approximately O(N * 2^N * M).
Space Complexity
O(2^M)The brute force approach explores every possible subsequence from the first array. If the first array has M elements, there can be up to 2^M possible subsequences since each element can either be included or excluded in a subsequence. We need to keep track of all these potential subsequences, implying a space complexity of O(2^M) to store them. N here represents the total number of elements across all arrays; however the size of the first array (M) dictates the auxiliary space. Note: keeping track of common subsequences also contributes O(2^M).

Optimal Solution

Approach

The most efficient way to find the longest common subsequence is to focus on elements that appear in all input sets. By iterating through the sets simultaneously, we can quickly identify and extend any common subsequence, avoiding unnecessary comparisons. This method significantly reduces the number of checks required compared to brute-force approaches.

Here's how the algorithm would work step-by-step:

  1. Start by looking at the first element of each set.
  2. Check if this element is present in every set.
  3. If it is, then this element is part of the longest common subsequence, so include it and move to the next element in each set.
  4. If it isn't, find the smallest element among the current elements of all sets.
  5. Advance the set(s) containing that smallest element to their next element because that smallest element can't be part of any common subsequence.
  6. Repeat steps 2-5 until you reach the end of any of the sets.
  7. Finally, return the sequence of elements you included which represents the longest common subsequence found.

Code Implementation

def longest_common_subsequence_between_sorted_arrays(arrays):
    if not arrays:
        return []

    array_indices = [0] * len(arrays)
    longest_common_subsequence = []

    while True:
        # Get the current elements from each array.
        current_elements = []
        for i in range(len(arrays)):
            if array_indices[i] < len(arrays[i]):
                current_elements.append(arrays[i][array_indices[i]])
            else:
                # If any array is exhausted, stop.
                return longest_common_subsequence

        # Check if all elements are equal.
        if all(element == current_elements[0] for element in current_elements):
            # Found a common element, add to subsequence.
            longest_common_subsequence.append(current_elements[0])
            for i in range(len(arrays)):
                array_indices[i] += 1

        else:
            # Find the smallest element to advance.
            minimum_element = min(current_elements)

            # Advance the arrays with the smallest element.
            for i in range(len(arrays)):
                if arrays[i][array_indices[i]] == minimum_element:
                    # Smallest element cannot be in LCS
                    array_indices[i] += 1

    return longest_common_subsequence

Big(O) Analysis

Time Complexity
O(n*m)The algorithm iterates through the input arrays, advancing pointers within each array until one of them reaches the end. Let n be the length of the longest array and m be the number of arrays. In the worst-case scenario, each element of the longest array is compared against the current element in all the other m-1 arrays. Therefore, the number of comparisons is proportional to n * m. Thus, the overall time complexity is O(n*m).
Space Complexity
O(M)The algorithm constructs a new list, specifically the 'sequence of elements you included', to store the longest common subsequence found. The maximum possible length of this subsequence is bounded by the length of the shortest input list. Therefore, in the worst-case scenario, where 'M' represents the length of the shortest input list, the space required to store the subsequence is proportional to 'M'. All other variables used in the algorithm, like indices for iterating lists, consume constant space. Thus, the overall auxiliary space complexity is O(M).

Edge Cases

CaseHow to Handle
One or more input arrays are null or emptyReturn an empty list, as no common subsequence is possible.
All input arrays are identical and contain the same single element.Return a list containing that single element.
Input arrays have no common elements.Return an empty list, indicating no common subsequence exists.
The length of the shortest array is very small (e.g., 1 or 2)The algorithm should still function correctly and efficiently even with small array sizes, although the common sequence will be short.
Input arrays are extremely large, potentially exceeding available memory during intermediate calculations or data structures.Consider using a memory-efficient approach, such as comparing arrays in smaller chunks or using iterators to avoid loading the entire dataset into memory at once.
Input arrays contain large integer values that could lead to integer overflow during calculations (e.g., summing indices).Use appropriate data types (e.g., long) to store intermediate results, or consider using modular arithmetic if the overflow is acceptable.
The common subsequence is very long, approaching the size of the smallest input array.Ensure the data structure used to store the common subsequence can handle this length efficiently without performance degradation.
Input arrays contain duplicate values within the array, but some of these duplicate values are part of the longest common subsequence.The algorithm should correctly identify and include only the appropriate instances of the duplicate values to form the longest common subsequence.