Taro Logo

Find the Original Typed String II

Hard
Asked by:
Profile picture
Profile picture
17 views
Topics:
ArraysStringsDynamic ProgrammingRecursion

Alice is attempting to type a specific string on her computer. However, she tends to be clumsy and may press a key for too long, resulting in a character being typed multiple times.

You are given a string word, which represents the final output displayed on Alice's screen. You are also given a positive integer k.

Return the total number of possible original strings that Alice might have intended to type, if she was trying to type a string of size at least k.

Since the answer may be very large, return it modulo 109 + 7.

Example 1:

Input: word = "aabbccdd", k = 7

Output: 5

Explanation:

The possible strings are: "aabbccdd", "aabbccd", "aabbcdd", "aabccdd", and "abbccdd".

Example 2:

Input: word = "aabbccdd", k = 8

Output: 1

Explanation:

The only possible string is "aabbccdd".

Example 3:

Input: word = "aaabbb", k = 3

Output: 8

Constraints:

  • 1 <= word.length <= 5 * 105
  • word consists only of lowercase English letters.
  • 1 <= k <= 2000

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected behavior if `s` cannot be transformed into `t` by removing characters (i.e., no valid original typed string exists)? Should I return an empty string, null, or throw an exception?
  2. Are the input strings `s` and `t` case-sensitive? For example, should 'a' be treated the same as 'A'?
  3. Can the input strings `s` and `t` be empty or null?
  4. If there are multiple possible original typed strings (due to different removal options), is any valid original typed string acceptable, or is there a specific criteria for choosing one?
  5. Could you please provide an example where there are multiple repeating characters in `s` that need to be handled?

Brute Force Solution

Approach

The core idea is to try out all possible ways the shorter string could have been created from the longer string. We will explore every combination by checking if removing characters from the longer string could result in the shorter string.

Here's how the algorithm would work step-by-step:

  1. Start by assuming the shorter string is formed by keeping the first letter of the longer string and removing the rest.
  2. Then, assume the shorter string is formed by keeping the second letter of the longer string and removing the rest, and so on. Continue until you have checked every possibility of keeping only one letter.
  3. Next, try keeping two letters of the longer string, one after the other, and removing the rest. Explore all possible pairs.
  4. Repeat this process by attempting to keep three letters, then four, and so on, up to the full length of the shorter string. Each time, you are checking all the possible combinations of characters that could form the shorter string.
  5. For each combination you try, carefully compare the characters that were supposedly kept against the characters of the shorter string. If the characters match and form the shorter string, then the longer string is considered a possible 'typed' version of the original string. If they don't match, ignore the combination.
  6. Ultimately, if you find a way that the longer string could become the shorter string by removing characters, you can conclude that the shorter string is the original string.

Code Implementation

def find_original_typed_string_brute_force(original_string, typed_string):
    def solve(original_index, typed_index, current_string):
        if original_index == len(original_string) and typed_index == len(typed_string):
            return True

        if original_index < len(original_string) and typed_index == len(typed_string):
            return False

        if original_index == len(original_string) and typed_index < len(typed_string):
            # We can only remove characters from typed_string at this point.
            return True

        # Option 1: Keep the character in typed_string
        if original_string[original_index] == typed_string[typed_index]:
            if solve(original_index + 1, typed_index + 1, current_string + typed_string[typed_index]):
                return True

        # Option 2: Remove the character in typed_string
        # We are skipping the current char
        if solve(original_index, typed_index + 1, current_string):
            return True

        return False

    if solve(0, 0, ""):
        return True
    else:
        return False

Big(O) Analysis

Time Complexity
O(2^m * n)The algorithm iterates through all possible subsequences of the longer string (length n), attempting to match the shorter string (length m). Generating all subsequences of a string of length n takes O(2^n) time. However, the outer loop iterates from subsequence length 1 up to the length of the shorter string 'm'. Within each subsequence generation, a comparison with the shorter string of length 'm' is performed, taking O(m) time in the best case, but O(n) in the worst case because it needs to confirm that the characters are the same and the subsequence is valid. Thus, considering the overall impact, the time complexity is approximately O(2^m * n), as we are primarily exploring subsequences whose length does not exceed m, and comparing each against n characters.
Space Complexity
O(N)The provided plain English explanation outlines an approach that explores combinations by iteratively keeping increasing numbers of characters from the longer string. While not explicitly stated, to compare kept characters with the shorter string, the algorithm needs to store the 'kept' characters. In the worst case, it keeps almost the entire longer string to compare against the shorter string, thus requiring temporary storage proportional to the length of the longer string. Let N be the length of the longer string; hence, the auxiliary space needed scales linearly with N.

Optimal Solution

Approach

This problem requires us to identify the original string given a potentially longer, typed string which might include duplications of characters. The most efficient strategy uses a technique to carefully compare the potential original string with the typed string, using a process that recognizes and skips over duplicated characters to determine the actual original string.

Here's how the algorithm would work step-by-step:

  1. Begin by comparing the first character of the potential original string with the first character of the typed string.
  2. If the characters match, advance to the next character in both strings.
  3. If the characters do not match, check if the current character in the typed string is the same as the previous character in the typed string. If it is, this indicates a possible duplication.
  4. If it's a duplication, advance to the next character in the typed string to skip over the duplicated character.
  5. Repeat this process of comparing and skipping over duplications until you reach the end of either string.
  6. If you successfully reach the end of the potential original string, it means it is indeed a valid original string of the typed string. Return true.
  7. If you reach the end of the typed string before reaching the end of the potential original string, it means the potential original string could not be formed from the typed string. Return false.

Code Implementation

def find_original_string(original_string, typed_string):
    original_index = 0
    typed_index = 0

    while typed_index < len(typed_string):
        if original_index < len(original_string) and \
           original_string[original_index] == typed_string[typed_index]:
            original_index += 1
            typed_index += 1
        else:
            # Check for repeated characters (typos)
            if typed_index > 0 and \
               typed_string[typed_index] == typed_string[typed_index - 1]:

                # Skip the repeated character in typed_string
                typed_index += 1

            else:
                # Characters don't match, not a repetition, not in original
                return False

    # Ensure we've reached the end of the original string
    if original_index != len(original_string):
        return False

    return True

Big(O) Analysis

Time Complexity
O(m + n)The algorithm iterates through both the original string of length 'm' and the typed string of length 'n' using two pointers. In the best-case scenario where the strings are identical, the algorithm checks each character once resulting in m operations. In the worst-case where the typed string contains many repeated characters or is significantly different, the typed string is traversed to its end. In this case, the original string is traversed once and typed string is traversed once. Thus the number of operations can be approximated as m + n, which simplifies to O(m + n).
Space Complexity
O(1)The described algorithm operates with two index variables to traverse the potential original string and the typed string. No additional data structures such as arrays, hash maps, or sets are used to store intermediate results. The memory footprint remains constant regardless of the lengths of the input strings because only a fixed number of integer variables are employed. Thus, the auxiliary space complexity is O(1).

Edge Cases

s is null or empty
How to Handle:
Return an empty string since there's nothing to derive from.
t is null or empty
How to Handle:
Return s as the solution since anything in s that's extra can simply be removed when t is nothing.
s is shorter than t
How to Handle:
Return an empty string or null, as s cannot contain all characters of t with added repetitions.
s and t are identical
How to Handle:
Return s (or t) directly, since no characters need to be removed.
t is a substring of s, but with extra characters in s
How to Handle:
Iterate through s and t, removing extra repeated characters in s while maintaining the order of characters in t.
Characters in t are not in s
How to Handle:
Return an empty string or null, as s does not contain all the characters from t.
s consists of repeating characters, but t does not
How to Handle:
Iterate through s and only keep characters when it matches the corresponding position in t, otherwise remove repeating characters in s.
Consecutive repeating characters in 's' are more than what's allowable based on 't'
How to Handle:
When s[i] matches t[j] check consecutive occurences of s[i] and t[j] to ensure the number of occurrences in 's' is not exceeding allowable count in 't' + 1. For example, if t[j] repeats 'k' times, then s[i] can at most repeat 'k+1' times consecutively.