Taro Logo

Regular Expression Matching

Medium
4 views
2 months ago

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:

  • '.' Matches any single character.​​​​
  • '*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Constraints:

  • 1 <= s.length <= 20
  • 1 <= p.length <= 20
  • s contains only lowercase English letters.
  • p contains only lowercase English letters, '.', and '*'.
  • It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.
Sample Answer
## Regular Expression Matching

This problem requires implementing a regular expression matching algorithm with support for `.` and `*`. The `.` character matches any single character, and the `*` character matches zero or more of the preceding element. The matching should cover the entire input string, not just a partial substring.

### 1. Naive Approach (Recursion)

A straightforward but inefficient approach is to use recursion. The base cases are when either the string or the pattern is empty. If the next character in the pattern is not `*`, we check if the first characters of the string and pattern match (or if the pattern character is `.`). If they do, we recursively call the function with the rest of the string and pattern. If the next character is `*`, we have two options: either we ignore the `*` and the preceding element, or we try to match the preceding element one or more times.

```python
def isMatch_naive(s: str, p: str) -> bool:
    if not p:
        return not s

    first_match = bool(s) and (p[0] == s[0] or p[0] == '.')

    if len(p) >= 2 and p[1] == '*':
        return (isMatch_naive(s, p[2:]) or  # Ignore * and preceding element
                (first_match and isMatch_naive(s[1:], p)))
    else:
        return first_match and isMatch_naive(s[1:], p[1:])

2. Optimal Approach (Dynamic Programming)

A more efficient approach is to use dynamic programming. We create a 2D table dp where dp[i][j] is true if the first i characters of the string s match the first j characters of the pattern p. The table is built bottom-up, and the final result is stored in dp[len(s)][len(p)].

def isMatch(s: str, p: str) -> bool:
    n = len(s)
    m = len(p)

    dp = [[False] * (m + 1) for _ in range(n + 1)]
    dp[0][0] = True

    # Deals with patterns like a*, a*b*, a*b*c*
    for j in range(1, m + 1):
        if p[j - 1] == '*' and j > 1:
            dp[0][j] = dp[0][j - 2]

    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if p[j - 1] == '.' or p[j - 1] == s[i - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            elif j > 1 and p[j - 1] == '*':
                dp[i][j] = dp[i][j - 2]  # zero occurrences
                if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
                    dp[i][j] = dp[i][j] or dp[i - 1][j]  # one or more occurrences
            else:
                dp[i][j] = False

    return dp[n][m]

3. Big(O) Run-time Analysis

  • Naive Approach: The time complexity of the recursive approach is exponential, specifically O(2^(m+n)), where n is the length of string s and m is the length of pattern p. This is due to the branching nature of the recursion when encountering a * character. Each * could potentially lead to two recursive calls.

  • Dynamic Programming Approach: The time complexity of the dynamic programming approach is O(m*n), where n is the length of string s and m is the length of pattern p. This is because we iterate through the dp table once, which has dimensions (n+1)x(m+1).

4. Big(O) Space Usage Analysis

  • Naive Approach: The space complexity of the recursive approach is O(m+n) in the worst case, where n is the length of string s and m is the length of pattern p. This space is used by the call stack due to the recursive calls. In worst case the depth of the recursion can be m+n.

  • Dynamic Programming Approach: The space complexity of the dynamic programming approach is O(m*n), where n is the length of string s and m is the length of pattern p. This is because we use a 2D table dp of dimensions (n+1)x(m+1) to store the intermediate results.

5. Edge Cases and How to Handle Them

  • Empty String: If both s and p are empty, they match. If s is empty but p is not, we need to check if p can match an empty string (e.g., a*b*c*). If p is empty but s is not, they don't match.
  • . Character: The . character should match any single character in s.
  • * Character:** The * character should match zero or more occurrences of the preceding character. If the preceding character doesn't match the current character in s, we can ignore the * and the preceding character.
  • Consecutive *: The problem statement guarantees that each * will have a preceding character, so we don't need to handle cases like **.
  • Long Patterns with many *: The dynamic programming approach efficiently handles long patterns by caching intermediate results.

Here are some examples to illustrate the dynamic programming approach:

Example 1: s = "aa", p = "a*"

dp table:

      ∅   a   *
∅  True False True
a  False False True
a  False False True

Example 2: s = "ab", p = ".*"

dp table:

      ∅   .   *
∅  True False True
a  False True True
b  False True True

By constructing the dp table, we can determine if s matches p.