Taro Logo

Wildcard Matching

Hard
Two Sigma logo
Two Sigma
0 views
Topics:
StringsDynamic ProgrammingGreedy AlgorithmsTwo Pointers

Given an input string (s) and a pattern (p), implement wildcard pattern matching with support for '?' and '*' where:

  • '?' Matches any single character.
  • '*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "*"
Output: true
Explanation: '*' matches any sequence.

Example 3:

Input: s = "cb", p = "?a"
Output: false
Explanation: '?' matches 'c', but the second letter is 'a', which does not match 'b'.

Constraints:

  • 0 <= s.length, p.length <= 2000
  • s contains only lowercase English letters.
  • p contains only lowercase English letters, '?' or '*'.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string `s` and pattern `p` be null or empty strings?
  2. What characters besides letters can be present in string `s`? Is it limited to ASCII characters?
  3. Is the matching case-sensitive?
  4. If the pattern `p` has consecutive '*', should they be treated as a single '*'?
  5. Does the pattern need to match the *entire* string `s`, or is a partial match sufficient?

Brute Force Solution

Approach

The brute force approach to wildcard matching is like trying to fit two jigsaw puzzles together by testing every possible way they might align. We check if the pattern can match the text by exploring all potential matches. If at any point a mismatch is found, we backtrack and try a different alignment.

Here's how the algorithm would work step-by-step:

  1. Start comparing the pattern to the text from the beginning.
  2. If the current pattern character matches the current text character, move on to the next characters in both.
  3. If the pattern character is a '?', it can match any single text character, so move on to the next characters in both.
  4. If the pattern character is a '*', it can match zero or more text characters. Try matching zero characters first (skip the '*' and move to the next pattern character).
  5. If that doesn't work, try matching one text character with the '*' (move to the next text character and the next pattern character).
  6. Continue trying to match more and more text characters with the '*' until you reach the end of the text.
  7. If you reach the end of the pattern and the end of the text at the same time, the pattern matches the text.
  8. If at any point there's a mismatch and the pattern character isn't a wildcard, go back to the last '*' you saw and try matching more characters with it (if possible).
  9. If you've tried all possibilities for all '*' characters, and you still haven't found a match, the pattern doesn't match the text.

Code Implementation

def is_wildcard_matching(text, pattern):
    text_index = 0
    pattern_index = 0
    star_index = -1
    match_index = 0

    while text_index < len(text):
        if pattern_index < len(pattern) and \
           (pattern[pattern_index] == text[text_index] or \
            pattern[pattern_index] == '?'):
            text_index += 1
            pattern_index += 1
        elif pattern_index < len(pattern) and pattern[pattern_index] == '*':
            # Store the index of the '*' and current text index
            star_index = pattern_index
            match_index = text_index
            pattern_index += 1

        elif star_index != -1:
            # Backtrack: match current text char with '*' 
            pattern_index = star_index + 1
            match_index += 1
            text_index = match_index
        else:
            return False

    # Check for remaining characters in pattern
    while pattern_index < len(pattern) and pattern[pattern_index] == '*':
        pattern_index += 1

    # If all characters in pattern are processed, it's a match
    return pattern_index == len(pattern)

Big(O) Analysis

Time Complexity
O(n^m)The brute force approach explores all possible alignments between the text and the pattern. In the worst case, the pattern contains multiple '*' characters. Each '*' can potentially match zero to n characters in the text, where n is the length of the text. If there are m '*' characters in the pattern, the algorithm might explore up to n possibilities for each of them. This leads to a time complexity of O(n^m), where n is the length of the text and m is the number of '*' in the pattern, reflecting the exponential growth in the number of explored possibilities. Because the algorithm can recursively check different combinations of characters matched by '*' it leads to O(n^m) total operations.
Space Complexity
O(N*M)The brute force approach, as described, can be implemented using recursion. In the worst-case scenario, the algorithm might explore a large number of possible matches and mismatches, leading to a significant number of recursive calls being placed on the call stack. The maximum depth of the recursion can be proportional to the product of the lengths of the text and pattern strings, where N is the length of the text string and M is the length of the pattern string. Each recursive call adds a new frame to the call stack, consuming memory. Thus, the space complexity is O(N*M).

Optimal Solution

Approach

The best way to solve this is by using a strategy that remembers whether we've seen a particular combination of the main text and wildcard pattern before. This helps us avoid repeating work and makes the matching process much faster.

Here's how the algorithm would work step-by-step:

  1. Imagine you're comparing two strings: one is the text you want to match, and the other is the pattern that might have wildcards.
  2. Create a way to store whether a certain point in the text matches a certain point in the pattern. This storage keeps track of the process so it doesn't repeat the work.
  3. Start comparing the text and pattern from the beginning. If the characters match, or if the pattern has a '?' (which matches any single character), move forward in both the text and the pattern.
  4. If you see a '*' in the pattern (which can match zero or more characters), you have a choice: either the '*' matches nothing, or it matches one or more characters in the text.
  5. If the '*' matches nothing, move to the next character in the pattern and keep the same spot in the text.
  6. If the '*' matches something, move to the next character in the text, but keep the '*' in the pattern. This is where the smart storage we created helps.
  7. Every time you reach a point in the text and pattern, check if you've been at that exact spot before. If you have, you already know if that part matches, so you don't need to recalculate it.
  8. Keep going until you've reached the end of both the text and the pattern, or you find that the pattern cannot match the text.
  9. The text matches the pattern only if you reach the end of both strings at the same time and you've followed the matching rules the entire way.

Code Implementation

def is_wildcard_matching(text, pattern):
    text_length = len(text)
    pattern_length = len(pattern)

    # Create a table to store matching results
    dp_table = [[False] * (pattern_length + 1) for _ in range(text_length + 1)]
    dp_table[0][0] = True

    # Handles the case where the text is empty.
    for pattern_index in range(1, pattern_length + 1):
        if pattern[pattern_index - 1] == '*':
            dp_table[0][pattern_index] = dp_table[0][pattern_index - 1]

    for text_index in range(1, text_length + 1):
        for pattern_index in range(1, pattern_length + 1):
            if pattern[pattern_index - 1] == '?' or text[text_index - 1] == pattern[pattern_index - 1]:
                dp_table[text_index][pattern_index] = dp_table[text_index - 1][pattern_index - 1]
            elif pattern[pattern_index - 1] == '*':
                # The '*' can represent 0 or more characters
                dp_table[text_index][pattern_index] = dp_table[text_index][pattern_index - 1] or dp_table[text_index - 1][pattern_index]
            else:
                dp_table[text_index][pattern_index] = False

    # Returns True if the whole text matches the whole pattern
    return dp_table[text_length][pattern_length]

Big(O) Analysis

Time Complexity
O(m*n)The algorithm uses dynamic programming (or memoization, as suggested by remembering seen combinations) to determine if a pattern of length m matches a string of length n. In the worst-case scenario, the algorithm explores all possible pairings of characters in the string and pattern. This creates an m x n matrix of states to explore. Hence the time complexity is O(m*n), where m is the length of the pattern and n is the length of the string.
Space Complexity
O(m*n)The solution uses a storage mechanism, likely a 2D array or matrix, to remember whether a certain point in the text matches a certain point in the pattern. If 'n' is the length of the text and 'm' is the length of the pattern, then this storage requires m*n space to keep track of previously visited text/pattern combinations. This ensures that redundant computations are avoided, which significantly speeds up the overall matching process. Therefore, the space complexity is O(m*n).

Edge Cases

CaseHow to Handle
Empty string s and empty pattern pShould return true because an empty pattern matches an empty string.
Empty string s and pattern p is only '*'Should return true because '*' matches an empty string.
String s is empty and pattern p is not empty (and not just '*')Should return false unless the entire pattern consists of '*' characters.
Pattern p is empty and string s is not emptyShould return false as there is no pattern to match the string.
String s contains a long sequence of identical characters and pattern p contains a single '*'The '*' should match the entire sequence, so algorithm must handle correctly.
Pattern p contains multiple consecutive '*' charactersMultiple consecutive '*' should be treated as a single '*'.
Pattern p ends with '*' and string s is non-emptyShould return true if the initial part of the pattern matches the start of the string.
Long string s and long pattern p with many '*' and '?' characters, potentially causing stack overflow in a recursive solutionDynammic programming approach can be used to avoid deep recursion and stack overflow.