Taro Logo

Word Pattern II

Medium
Amazon logo
Amazon
2 views
Topics:
StringsRecursionGraphsDynamic Programming

You are given a pattern and a string s. You need to determine if s matches the pattern. Here, match means there is a one-to-one mapping between letters in the pattern and non-empty substrings in s.

Examples:

  1. pattern = "abab", s = "redblueredblue" should return true because 'a' -> "redblue", 'b' -> "redblue".
  2. pattern = "aaaa", s = "asdasdasdasd" should return true because 'a' -> "asd".
  3. pattern = "aabb", s = "xyzabcxzyabc" should return false.

Clarifications:

  • Can the input string s or the pattern string be null or empty? If so, what should I return?
  • Are the pattern characters limited to lowercase English letters?
  • Is the string s limited to lowercase English letters?
  • Can a single character in the pattern map to an empty string in s? (No, the problem states non-empty substrings)
  • If a character in the pattern has already been mapped to a substring in s, can it be mapped to a different substring later? (No, the mapping must be one-to-one and consistent).

Write a function that takes the pattern and the string s as input and returns true if s matches the pattern, and false otherwise.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the pattern contain duplicate characters, and can the same word in the string be mapped to different characters in the pattern?
  2. Can the string contain words separated by multiple spaces, leading/trailing spaces, or empty words?
  3. Is the pattern guaranteed to contain only lowercase letters, and the string only lowercase words?
  4. If no mapping is possible, should I return false, or is there a specific error value I should return?
  5. Is the string guaranteed to contain at least as many words as the pattern contains characters?

Brute Force Solution

Approach

The brute force approach to this problem is like trying every possible secret code to see if it unlocks a treasure chest. We explore every possible way to match parts of one sequence to the other until we find a matching pattern.

Here's how the algorithm would work step-by-step:

  1. Take the first letter of the pattern and try to match it with the first chunk of the sequence. This chunk could be just the first character, the first two characters, or even the whole sequence.
  2. If that seems to work, move on to the next letter in the pattern. Try to match this letter with another chunk of the remaining sequence. It's important this new chunk doesn't overlap with the previous one, and that it matches a previously used chunk if the current pattern letter was used before.
  3. Keep going, matching each letter in the pattern to a part of the sequence. If at any point you realize the matching is impossible (because the sequence runs out or the chunks don't fit), backtrack and try a different chunk for one of the previous pattern letters.
  4. If you successfully match all the letters in the pattern to chunks in the sequence without any overlaps or mismatches, then you've found a solution!
  5. If you've exhausted all the possible ways to match letters to sequence chunks and haven't found a solution, then there is no valid pattern.

Code Implementation

def word_pattern_match(pattern, input_string):
    pattern_to_string = {}
    string_to_pattern = {}

    def backtrack(pattern_index, string_index):
        # Base case: both pattern and string are exhausted
        if pattern_index == len(pattern) and string_index == len(input_string):
            return True

        # If either is exhausted while the other isn't, it's a mismatch
        if pattern_index == len(pattern) or string_index == len(input_string):
            return False

        current_pattern_char = pattern[pattern_index]

        # Try all possible substrings for the current pattern character
        for k in range(string_index + 1, len(input_string) + 1):
            substring = input_string[string_index:k]

            # If the pattern char already exists, check if it matches
            if current_pattern_char in pattern_to_string:
                if pattern_to_string[current_pattern_char] != substring:
                    continue

                # Continue matching the rest of the string
                if backtrack(pattern_index + 1, k):
                    return True

            # If substring already associated with different pattern
            elif substring in string_to_pattern:
                continue

            # Create new mapping and continue the search
            else:
                # Add the new mapping to both dictionaries
                pattern_to_string[current_pattern_char] = substring

                string_to_pattern[substring] = current_pattern_char

                # Recurse to check if solution can be found
                if backtrack(pattern_index + 1, k):
                    return True

                # Backtrack: Remove the mapping for trying other possibilities
                del pattern_to_string[current_pattern_char]

                del string_to_pattern[substring]

        # If no mapping works, return False
        return False

    return backtrack(0, 0)

Big(O) Analysis

Time Complexity
O(n^m)The described brute force approach explores all possible mappings between pattern characters and non-overlapping substrings of the input string. Let n be the length of the input string and m be the length of the pattern. In the worst case, each character in the pattern can map to a substring of varying lengths in the string. The algorithm essentially tries all possible combinations of substring lengths for each pattern character such that their total length equals n. Since there are potentially n choices for the first pattern character, and then a reduced set of choices for the subsequent ones, this results in exponential growth. Therefore, the complexity is roughly O(n^m), where n is the length of the input string and m is the length of the pattern, reflecting the exponential nature of exploring all possible mappings.
Space Complexity
O(N)The algorithm employs recursion, where each level corresponds to processing a character in the pattern. In the worst-case scenario, where each character in the pattern maps to a single character in the input string, the depth of the recursion could reach the length of the input string which we can denote by N. Moreover, in each recursive call we may store mappings between pattern characters and substrings using a hash map (or dictionary) which grows as we explore possible solutions. Thus, the space complexity is proportional to the maximum depth of the recursion stack, along with the size of the stored mappings, leading to O(N) auxiliary space in the worst case.

Optimal Solution

Approach

The problem asks us to determine if a pattern of letters can be matched to segments of a string. The optimal approach uses a backtracking strategy, where we explore potential mappings between pattern characters and string segments. If a mapping leads to a dead end, we undo it and try a different one.

Here's how the algorithm would work step-by-step:

  1. Start by trying to match the first character in the pattern to the beginning of the string.
  2. Create a system to track what parts of the string each letter in the pattern represents.
  3. If a letter in the pattern has already been matched to a string segment, make sure you use that same matching again.
  4. If a letter in the pattern hasn't been matched yet, try to match it to a string segment that hasn't been used before.
  5. If you reach the end of the pattern and the entire string is used up, then you've found a match!
  6. If at any point you find a conflict or can't make a match, undo your last match and try a different option. This is called backtracking.
  7. Keep trying different combinations until you either find a successful mapping or exhaust all possibilities.

Code Implementation

def word_pattern_match(pattern, input_string):
    pattern_map = {}
    string_set = set()

    def backtrack(pattern_index, string_index):
        if pattern_index == len(pattern) and string_index == len(input_string):
            return True

        if pattern_index == len(pattern) or string_index == len(input_string):
            return False

        pattern_character = pattern[pattern_index]

        if pattern_character in pattern_map:
            mapped_string = pattern_map[pattern_character]
            if input_string[string_index:].startswith(mapped_string):
                return backtrack(pattern_index + 1, string_index + len(mapped_string))
            else:
                return False
        else:
            # Try all possible string segments for the current pattern character
            for i in range(string_index, len(input_string)):
                substring = input_string[string_index:i + 1]

                # Avoid ambiguous mapping to prevent incorrect solutions
                if substring in string_set:
                    continue

                # Record our choices before diving deeper
                pattern_map[pattern_character] = substring
                string_set.add(substring)

                if backtrack(pattern_index + 1, i + 1):
                    return True

                # Backtrack to explore other possibilities
                del pattern_map[pattern_character]
                string_set.remove(substring)

            return False

    return backtrack(0, 0)

Big(O) Analysis

Time Complexity
O(n^m)The dominant cost is the backtracking search, where m is the length of the pattern and n is the length of the string. In the worst case, each character in the pattern can map to potentially every substring of the string. This leads to an exponential branching factor during the backtracking process. Since we're exploring every possible mapping between the pattern and substrings of the string, and the depth of the recursion is determined by the pattern length (m), the number of recursive calls can grow to O(n^m), where n represents the string length. Therefore, the worst-case time complexity is O(n^m).
Space Complexity
O(N)The primary space complexity stems from the recursion depth, which, in the worst-case scenario, can reach N, where N is the length of the pattern string. This is because each character in the pattern could potentially map to a segment of the string, leading to a recursive call. Additionally, we maintain two hash maps (pattern to string and string to pattern) to track the mappings between pattern characters and string segments. The size of these hash maps can grow up to O(N) in the worst case if all pattern characters map to different string segments. Therefore, the overall space complexity is O(N) due to the recursion depth and the size of the hash maps used for backtracking.

Edge Cases

CaseHow to Handle
Empty pattern or stringReturn true only if both the pattern and the string are empty; otherwise, return false.
Pattern longer than the stringReturn false since each character in the pattern must map to a non-empty substring.
Pattern with repeating characters and string with inconsistent substringsBacktracking will eventually fail to find a consistent mapping, returning false.
A character in the pattern maps to an empty stringExplicitly forbid mapping to empty strings during the backtracking search.
One-to-many mapping; a string substring is mapped to multiple pattern charactersUse a hash map to track existing mappings and reject conflicting assignments.
Many-to-one mapping; a pattern character is mapped to different substringsUse a hash set to track substrings already mapped to, preventing reusing them for another character.
String contains only repeating characters and pattern demands multiple different substringsBacktracking explores all possibilities and will fail to find valid mapping to the pattern characters, thus returning false.
Maximum input size exceeding recursion depth limitsThe backtracking algorithm might hit the maximum recursion depth limit for large input sizes, so consider iterative approaches for larger input or optimize backtracking logic.