Taro Logo

Find All Good Strings

Hard
Asked by:
Profile picture
Profile picture
8 views
Topics:
StringsDynamic Programming

Given the strings s1 and s2 of size n and the string evil, return the number of good strings.

A good string has size n, it is alphabetically greater than or equal to s1, it is alphabetically smaller than or equal to s2, and it does not contain the string evil as a substring. Since the answer can be a huge number, return this modulo 109 + 7.

Example 1:

Input: n = 2, s1 = "aa", s2 = "da", evil = "b"
Output: 51 
Explanation: There are 25 good strings starting with 'a': "aa","ac","ad",...,"az". Then there are 25 good strings starting with 'c': "ca","cc","cd",...,"cz" and finally there is one good string starting with 'd': "da". 

Example 2:

Input: n = 8, s1 = "leetcode", s2 = "leetgoes", evil = "leet"
Output: 0 
Explanation: All strings greater than or equal to s1 and smaller than or equal to s2 start with the prefix "leet", therefore, there is not any good string.

Example 3:

Input: n = 2, s1 = "gx", s2 = "gz", evil = "x"
Output: 2

Constraints:

  • s1.length == n
  • s2.length == n
  • s1 <= s2
  • 1 <= n <= 500
  • 1 <= evil.length <= 50
  • All strings consist of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the constraints on the lengths of `s1`, `s2`, and `evil`? Specifically, what are the maximum lengths and are they empty strings permissible?
  2. What characters are allowed in `s1`, `s2`, and `evil`? Are they guaranteed to be lowercase English letters, or should I account for other characters?
  3. What does it mean for a string to be "evil"? Is it sufficient for `evil` to be a substring, or does it need to be a contiguous substring (a sequence of consecutive characters)?
  4. If there are multiple "good" strings within the lexicographical range [s1, s2], should I return the count modulo some value? If so, what is that value?
  5. Are `s1` and `s2` guaranteed to be in lexicographical order (i.e., `s1` <= `s2`)?

Brute Force Solution

Approach

The brute force approach involves generating every possible string within given bounds and checking if each generated string meets certain criteria. We essentially try every single possibility. This is like testing every key on a keychain until we find the one that opens the lock.

Here's how the algorithm would work step-by-step:

  1. Start by creating all possible strings that have the specified length using the available characters.
  2. For each generated string, compare it to the two boundary strings to ensure that the generated string is within the specified range.
  3. Also, for each generated string, check if it contains the 'evil' string as a substring.
  4. If the generated string is within the range and does not contain the 'evil' string, then we count it as a 'good' string.
  5. After checking all possible strings, the total count of 'good' strings is our answer.

Code Implementation

def find_all_good_strings_brute_force(
    string_length, evil_string, string_one, string_two
):
    number_of_good_strings = 0
    
    # Iterate through all possible strings of given length.
    for generated_string_index in range(26 ** string_length):
        generated_string = ""
        temp_index = generated_string_index

        for _ in range(string_length):
            generated_string = chr(ord('a') + (temp_index % 26)) + \
                generated_string
            temp_index //= 26

        # Check range conditions before checking substring
        if not (string_one <= generated_string <= string_two):
            continue

        # Check if generated string contains the evil string
        if evil_string in generated_string:
            continue

        number_of_good_strings += 1

    return number_of_good_strings

Big(O) Analysis

Time Complexity
O(2^n * n * m)The brute force approach generates all possible strings of length n. Since each character can be one of the available characters (assumed to be a constant-sized alphabet), there are approximately 2^n such strings. For each generated string, we need to compare it with the boundary strings (length n) and also check if it contains the evil string as a substring (length m). The substring check takes O(n * m) time. Therefore, the overall time complexity is O(2^n * n * m), where n is the length of the strings being generated, and m is the length of the evil string.
Space Complexity
O(1)The brute force approach, as described, generates and checks each string independently. It does not store a collection of generated strings, nor does it create any auxiliary data structures whose size scales with the input length. The variables used for comparisons and counting are constant in size. Therefore, the auxiliary space complexity is constant.

Optimal Solution

Approach

The problem asks us to count strings within a range that do not contain a forbidden pattern. Instead of checking every possible string, we use a clever method to build acceptable strings incrementally and avoid unnecessary calculations by remembering intermediate results. This optimized approach relies on pre-computing information about the forbidden pattern and using it to guide our string construction.

Here's how the algorithm would work step-by-step:

  1. First, analyze the forbidden pattern to understand how it could potentially overlap with itself. This analysis is used to calculate how much of a partial match of the forbidden pattern we can reuse when creating longer strings.
  2. Next, start building strings one character at a time. Keep track of how much of the forbidden pattern has been matched so far.
  3. When adding a character, check if it leads to creating the forbidden pattern. Also, consider whether the newly built string is still within the allowed range.
  4. If the new character doesn't create the forbidden pattern and the string is in the range, we continue adding characters to the string.
  5. To avoid recomputing results, remember the count of valid strings for each length and each stage of matching the forbidden pattern. This technique avoids trying the same possibilities multiple times.
  6. The final result is the number of valid strings generated within the specified range.

Code Implementation

def find_all_good_strings(string_length, evil_string, string1, string2):
    modulo = 10**9 + 7
    evil_string_length = len(evil_string)

    # Compute the prefix function for the evil string.
    prefix_function = [0] * evil_string_length
    for i in range(1, evil_string_length):
        j = prefix_function[i - 1]
        while j > 0 and evil_string[i] != evil_string[j]:
            j = prefix_function[j - 1]
        if evil_string[i] == evil_string[j]:
            j += 1
        prefix_function[i] = j

    # Function to calculate the number of good strings <= a given string.
    def count_good_strings_up_to(limit_string):
        dp_table = {}  # (index, evil_match_length, is_tight) -> count

        def calculate_good_strings(index, evil_match_length, is_tight):
            if evil_match_length == evil_string_length:
                return 0 # Avoid strings that contain evil_string.

            if index == string_length:
                return 1 # A valid string is found.

            if (index, evil_match_length, is_tight) in dp_table:
                return dp_table[(index, evil_match_length, is_tight)]

            count = 0
            upper_bound = ord(limit_string[index]) - ord('a') if is_tight else 25

            for char_index in range(upper_bound + 1):
                character = chr(char_index + ord('a'))
                new_evil_match_length = evil_match_length
                while new_evil_match_length > 0 and character != evil_string[new_evil_match_length]:
                    new_evil_match_length = prefix_function[new_evil_match_length - 1]

                if character == evil_string[new_evil_match_length]:
                    new_evil_match_length += 1

                new_is_tight = is_tight and (char_index == upper_bound)
                count = (count + calculate_good_strings(index + 1, new_evil_match_length, new_is_tight)) % modulo

            dp_table[(index, evil_match_length, is_tight)] = count
            return count

        return calculate_good_strings(0, 0, True)

    # Count strings <= string2 - Count strings < string1, adjusting for edge case.
    result = (count_good_strings_up_to(string2) - count_good_strings_up_to(string1) + modulo) % modulo

    # Check if string1 itself is a "good" string.
    evil_match_length_for_string1 = 0
    for character in string1:
        while evil_match_length_for_string1 > 0 and character != evil_string[evil_match_length_for_string1]:
            evil_match_length_for_string1 = prefix_function[evil_match_length_for_string1 - 1]
        if character == evil_string[evil_match_length_for_string1]:
            evil_match_length_for_string1 += 1
        if evil_match_length_for_string1 == evil_string_length:
            return result # string1 contains evil_string, so return

    # Account for string1 since we subtracted strings *less than* string1.
    return (result + 1) % modulo

Big(O) Analysis

Time Complexity
O(m * n * k)Let m be the length of the range (high - low), n be the length of the forbidden string, and k be the size of the character set. The dynamic programming solution iterates through each possible string length up to m. For each length, it considers all possible states of matching the forbidden string (up to length n). In each state, it tries all possible characters (k) to extend the string. Therefore, the time complexity is O(m * n * k).
Space Complexity
O(M)The algorithm uses dynamic programming to store intermediate results for previously computed subproblems. Specifically, it remembers the count of valid strings for each length and for each stage of matching the forbidden pattern (evil). Let M be the length of the forbidden pattern. The algorithm stores these counts in a DP table. The size of this table is dependent on the range length and the length of the forbidden pattern. Therefore, the auxiliary space used by the DP table is proportional to M. Thus, the space complexity is O(M).

Edge Cases

Empty s1 or s2 string
How to Handle:
Return 0 as there are no possible 'good' strings.
s1 and s2 are identical
How to Handle:
The number of strings <= s2 should be one; return 1.
evil string is empty
How to Handle:
Every string between s1 and s2 is a 'good' string so the answer would be the total number of such strings.
Length of s1, s2, or evil is 0
How to Handle:
Handle cases when s1, s2 or evil string lengths are zero by adapting the base cases.
s1, s2, evil contain non-ASCII characters
How to Handle:
Ensure the code can handle unicode characters if the problem doesn't specify only ASCII characters, otherwise, explicitly state the input character set.
Integer overflow during counting of good strings (especially with large n)
How to Handle:
Use modulo operation throughout calculations to prevent integer overflow, and store intermediate results as long to hold larger values.
evil string is longer than s1 or s2
How to Handle:
If evil is longer than s1 or s2, there will be no 'good' strings, so return 0.
s1 > s2 lexicographically
How to Handle:
If s1 > s2 lexicographically, there will be no valid range and therefore no possible solutions; return 0.