Given an input string s
and a pattern p
, implement regular expression matching with support for .
and *
where:
.
Matches any single character.*
Matches zero or more of the preceding element.The matching should cover the entire input string (not partial).
For example:
Explain your approach and provide the code with time and space complexity analysis.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The goal is to see if a given text matches a specific pattern. The brute force method involves trying all possible ways the pattern could match the text by considering every potential alignment and interpretation of the pattern's special characters.
Here's how the algorithm would work step-by-step:
def is_regular_expression_match(text, pattern):
text_length = len(text)
pattern_length = len(pattern)
def backtrack(text_index, pattern_index):
# If we've reached the end of the pattern, check if we're also at the end of the text.
if pattern_index == pattern_length:
return text_index == text_length
# Check if the next character in the pattern is '*'.
if pattern_index + 1 < pattern_length and pattern[pattern_index + 1] == '*':
# Try matching zero occurrences of the character before '*'.
if backtrack(text_index, pattern_index + 2):
return True
# Try matching one or more occurrences.
while text_index < text_length and (pattern[pattern_index] == '.' or text[text_index] == pattern[pattern_index]):
if backtrack(text_index + 1, pattern_index + 2):
return True
text_index += 1
return False
# If the next pattern character is not '*' then need to check the current characters.
else:
if text_index < text_length and (pattern[pattern_index] == '.' or text[text_index] == pattern[pattern_index]):
return backtrack(text_index + 1, pattern_index + 1)
else:
return False
return backtrack(0, 0)
The key to solving this efficiently is to avoid recomputing results we've already found. We use a technique called dynamic programming to break the problem down into smaller, overlapping subproblems, solving each only once and storing the results to reuse later.
Here's how the algorithm would work step-by-step:
def is_match(text, pattern):
text_length = len(text)
pattern_length = len(pattern)
# dp_table[i][j] is true if text[0..i-1] matches pattern[0..j-1]
dp_table = [[False] * (pattern_length + 1) for _ in range(text_length + 1)]
# Empty text and empty pattern match
dp_table[0][0] = True
# Deals with patterns like a* or a*b* or a*b*c*
for pattern_index in range(1, pattern_length + 1):
if pattern[pattern_index - 1] == '*':
dp_table[0][pattern_index] = dp_table[0][pattern_index - 2]
for text_index in range(1, text_length + 1):
for pattern_index in range(1, pattern_length + 1):
if pattern[pattern_index - 1] == text[text_index - 1] or pattern[pattern_index - 1] == '.':
# Characters match, so result depends on previous subproblem
dp_table[text_index][pattern_index] = dp_table[text_index - 1][pattern_index - 1]
elif pattern[pattern_index - 1] == '*':
# '*' matches zero occurrences of the previous element
dp_table[text_index][pattern_index] = dp_table[text_index][pattern_index - 2]
# '*' matches one or more occurrences of the previous element
if pattern[pattern_index - 2] == text[text_index - 1] or pattern[pattern_index - 2] == '.':
# If character before '*' matches text char or is a wildcard.
dp_table[text_index][pattern_index] = dp_table[text_index][pattern_index] or dp_table[text_index - 1][pattern_index]
else:
# No match
dp_table[text_index][pattern_index] = False
# Result is stored in the bottom right cell
return dp_table[text_length][pattern_length]
Case | How to Handle |
---|---|
Empty text string | If the text string is empty, the regular expression must also be empty or consist only of '*' characters to match. |
Empty pattern string | An empty pattern only matches an empty text string. |
Pattern starts with '*' | A pattern starting with '*' is invalid and should be handled as no match unless the text is empty and the pattern is only '*'. |
Consecutive '*'' characters in pattern | Treat consecutive '*' characters as a single '*' as multiple consecutive '*' chars have the same matching effect. |
Text string contains special regex characters | Escaping special regex characters in the text or using character class matching is necessary for correct matching. |
Pattern contains only '*' | A pattern containing only '*' should match any string, including the empty string. |
Text string is very long | The algorithm's time complexity (especially for backtracking solutions) may become an issue with very long text strings; consider dynamic programming. |
Pattern contains many '*' | Excessive '*' characters can lead to exponential backtracking in recursive solutions, thus DP is advised. |