Taro Logo

Maximum Length of a Concatenated String with Unique Characters

Medium
Apple logo
Apple
1 view
Topics:
ArraysStringsRecursionDynamic ProgrammingBit Manipulation

You are given an array of strings arr. A string s is formed by the concatenation of a subsequence of arr that has unique characters.

Return the maximum possible length of s.

A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.

Example 1:

Input: arr = ["un","iq","ue"]
Output: 4
Explanation: All the valid concatenations are:
- ""
- "un"
- "iq"
- "ue"
- "uniq" ("un" + "iq")
- "ique" ("iq" + "ue")
Maximum length is 4.

Example 2:

Input: arr = ["cha","r","act","ers"]
Output: 6
Explanation: Possible longest valid concatenations are "chaers" ("cha" + "ers") and "acters" ("act" + "ers").

Example 3:

Input: arr = ["abcdefghijklmnopqrstuvwxyz"]
Output: 26
Explanation: The only string in arr has all 26 characters.

Constraints:

  • 1 <= arr.length <= 16
  • 1 <= arr[i].length <= 26
  • arr[i] contains only lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of a string within the input array? Are there any limits on the total number of strings in the array?
  2. Can the input array contain empty strings or null values?
  3. If it's impossible to form a concatenated string with unique characters from any combination of strings in the array, what should be returned?
  4. Are the input strings guaranteed to contain only lowercase English letters, or can they contain other characters?
  5. If multiple concatenated strings with the maximum length and unique characters exist, can I return any one of them, or is there a specific criteria for choosing one?

Brute Force Solution

Approach

The brute force method tries every possible combination of strings to find the longest one with unique characters. It explores all ways to concatenate strings together, checking if each combination meets the unique character requirement. This means testing every conceivable arrangement until the best one is found.

Here's how the algorithm would work step-by-step:

  1. Start with an empty combined string and a current longest length of zero.
  2. Consider each string in the given list individually. If a string has only unique characters, update the current longest length if needed.
  3. Now, pick the first string from the list and try to combine it with every other string in the list.
  4. For each combination, check if the combined string has only unique characters.
  5. If it does, calculate the length of the combined string and compare it with the current longest length. If the combined string is longer, update the current longest length.
  6. Repeat the previous two steps by starting with the second string in the list and combining it with every other string.
  7. Keep doing this for every string in the list, trying it as the starting point for a combination.
  8. Continue to combine strings in groups of three, four, and so on, checking for unique characters and updating the longest length each time a longer combination is found.
  9. Once all possible combinations have been checked, the current longest length will be the maximum length of a concatenated string with unique characters.

Code Implementation

def maximum_length_of_a_concatenated_string_with_unique_characters(strings):
    maximum_length = 0

    for i in range(1 << len(strings)):
        combined_string = ""
        for j in range(len(strings)):
            # Check if the j-th bit is set in the i-th combination
            if (i >> j) & 1:
                combined_string += strings[j]

        # Check if combined string has unique characters
        if len(set(combined_string)) == len(combined_string):
            # Update maximum_length if needed
            maximum_length = max(maximum_length, len(combined_string))

    return maximum_length

Big(O) Analysis

Time Complexity
O(2^n)The brute force approach considers every possible combination of strings. For n strings, each string can either be included or excluded in a combination, leading to 2^n possible combinations. For each combination, we need to check if the resulting concatenated string has unique characters which, in the worst case, takes O(m) time where m is the total length of all strings. The dominant factor is the generation of all possible subsets, hence the time complexity is O(2^n * m). In the worst case, 'm' is proportional to 'n', therefore we say it is O(2^n) in practice.
Space Complexity
O(N)The brute force method explores combinations, requiring space to store concatenated strings. In the worst case, it might combine all N input strings into a single, very long string to check for unique characters. A temporary string of length proportional to the sum of all input string lengths, up to N in the worst case, is created to store the concatenated result. Hence, the auxiliary space is O(N), where N is the total number of characters across all input strings.

Optimal Solution

Approach

The best way to solve this is to use a method called backtracking. Imagine building the longest string piece by piece, but being smart about it. If a piece introduces duplicate letters, we immediately discard it and try something else.

Here's how the algorithm would work step-by-step:

  1. Start with an empty string.
  2. Consider the first available string in the list.
  3. Check if adding the string creates duplicate letters in our current string.
  4. If there are no duplicates, add the string to our current string and remember its length.
  5. Now consider the next string in the list and repeat steps 3 and 4.
  6. If at any point, adding a string *does* create duplicate letters, don't add it to our current string, and instead move on to the next string in the list.
  7. After trying all available strings, keep track of the longest string we were able to construct.
  8. Now go back to step 2, but this time, don't start with the very first string in the list. Start with the second string, and repeat the whole process.
  9. Continue this process, each time starting with a different string in the list.
  10. The longest string we ever found across all the different starting points is the answer.

Code Implementation

def maximum_length(strings):
    maximum_length_so_far = 0

    def has_duplicate_characters(current_string):
        character_set = set()
        for character in current_string:
            if character in character_set:
                return True
            character_set.add(character)
        return False

    def backtrack(index, current_concatenated_string):
        nonlocal maximum_length_so_far

        maximum_length_so_far = max(maximum_length_so_far, len(current_concatenated_string))

        # Iterate through the remaining strings.
        for i in range(index, len(strings)):
            string_to_add = strings[i]
            combined_string = current_concatenated_string + string_to_add

            # Check for duplicate characters before proceeding.
            if has_duplicate_characters(combined_string):
                continue

            # Explore the possibility of including the current string.
            backtrack(i + 1, combined_string)

    # Iterate through the strings, setting each as a start string.
    for start_index in range(len(strings)):
        # Reset our set to only contain letters from the start string.
        if has_duplicate_characters(strings[start_index]):
            continue

        # Initiate backtracking with the start string.
        backtrack(start_index + 1, strings[start_index])

    # Check case where no string is selected
    if maximum_length_so_far == 0:
        has_valid_string = False
        for string_item in strings:
            if not has_duplicate_characters(string_item):
                has_valid_string = True
                break
        if not has_valid_string:
            return 0

    return maximum_length_so_far

Big(O) Analysis

Time Complexity
O(2^n * m)The backtracking algorithm explores all possible subsets of the input array of strings. In the worst case, this leads to 2^n possible combinations, where n is the number of strings in the input array. For each combination, we need to check if the concatenated string has unique characters. Let 'm' be the average length of the strings in the input array; checking for unique characters in the concatenated string of length 'm' takes O(m) time. Therefore, the overall time complexity is O(2^n * m).
Space Complexity
O(N)The backtracking algorithm described uses recursion. In the worst-case scenario, where almost every string in the input array 'arr' of size N can be concatenated without creating duplicate characters, the recursion depth can reach N. Each recursive call creates a new stack frame to store the current string and index, contributing to auxiliary space. Therefore, the space complexity is proportional to the maximum depth of the recursion, which is O(N).

Edge Cases

CaseHow to Handle
Empty input listReturn 0 immediately as no string concatenation is possible.
Input list contains an empty stringTreat the empty string as a valid string that contributes nothing to the length.
Input list contains strings with duplicate characters within themselvesFilter out such strings early on, as they can never contribute to a valid unique concatenation.
Input list contains strings that, when concatenated with other strings, result in duplicate charactersThe recursive or backtracking approach should explore all valid combinations and prune branches with duplicate characters.
Input list contains very long strings, approaching memory limitsConsider memory usage of intermediate concatenated strings, and possibly switch to iterative DP if recursion stack overflow becomes an issue.
The concatenation of all strings in the input list results in unique characters.The algorithm should correctly identify this case and return the length of the concatenated string.
All strings in the input list have overlapping character setsThe algorithm should return the length of the longest string in the input as no combination is possible.
Large number of strings in the input list, combined with long average string length, leading to performance bottlenecksOptimize character set checks using bit masks for faster duplicate detection and early pruning.