Taro Logo

Substring Matching Pattern

Easy
Amazon logo
Amazon
8 views
Topics:
Strings

You are given a string s and a pattern string p, where p contains exactly one '*' character.

The '*' in p can be replaced with any sequence of zero or more characters.

Return true if p can be made a substring of s, and false otherwise.

Example 1:

  • s = "leetcode", p = "ee*e"
  • Output: true
  • Explanation: By replacing the '*' with "tcod", the substring "eetcode" matches the pattern.

Example 2:

  • s = "car", p = "c*v"
  • Output: false
  • Explanation: There is no substring matching the pattern.

Example 3:

  • s = "luck", p = "u*"
  • Output: true
  • Explanation: The substrings "u", "uc", and "uck" match the pattern.

Constraints:

  • 1 <= s.length <= 50
  • 1 <= p.length <= 50
  • s contains only lowercase English letters.
  • p contains only lowercase English letters and exactly one '*'

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters will be in the pattern and the input string? Are we limited to ASCII, or should I expect Unicode characters?
  2. Is the pattern length guaranteed to be less than or equal to the length of the input string? What should I return if the pattern is longer?
  3. If the pattern is found multiple times in the input string, should I return all occurrences, or just the first one?
  4. Is there a defined format for the pattern? For example, is it like a regular expression or a simple string literal? Are special characters allowed?
  5. Can the input string or pattern be empty or null? If so, what should the function return?

Brute Force Solution

Approach

The brute force approach to substring matching involves checking every possible alignment of the pattern within the text. We'll systematically slide the pattern across the text and compare them character by character. If a match is found at any position, we'll report it.

Here's how the algorithm would work step-by-step:

  1. Start with the beginning of the text.
  2. Place the pattern directly on top of the beginning of the text and compare each character.
  3. If all characters of the pattern match the corresponding characters in the text, you've found a match.
  4. If there's a mismatch, shift the pattern one position to the right in the text.
  5. Repeat the character-by-character comparison.
  6. Continue shifting the pattern one position at a time and comparing until the pattern reaches the end of the text.
  7. If a complete match is found at any of these positions, record that position as a match.

Code Implementation

def substring_matching_brute_force(text, pattern):
    matches = []
    text_length = len(text)
    pattern_length = len(pattern)

    for text_index in range(text_length - pattern_length + 1):
        # Iterate through possible starting positions of pattern in text
        pattern_index = 0
        while (pattern_index < pattern_length and
               text[text_index + pattern_index] == pattern[pattern_index]):

            pattern_index += 1

        # If pattern_index reaches pattern_length, all chars matched.
        if pattern_index == pattern_length:
            matches.append(text_index)

    return matches

Big(O) Analysis

Time Complexity
O(n*m)The brute force algorithm iterates through the text string (length n) nearly entirely, and for each potential starting position in the text, it compares the pattern string (length m) character by character. In the worst-case scenario, the algorithm will compare the entire pattern at almost every position in the text, resulting in n-m+1 comparisons, each of which may take m steps. Therefore, the time complexity is approximately (n-m+1)*m which simplifies to O(n*m) where n is the length of the text and m is the length of the pattern. If the pattern's length m is relatively small compared to n, it can be approximated as O(n).
Space Complexity
O(1)The brute force substring matching algorithm, as described, doesn't utilize any auxiliary data structures. It only requires a constant number of variables to track the current position in the text and the pattern during the comparison. The space used does not scale with the size of the text or pattern, denoted by N. Therefore, the space complexity is constant.

Optimal Solution

Approach

The best way to solve this problem involves transforming the pattern into a simplified representation that helps us quickly compare it against the text. Instead of directly comparing the pattern, we'll focus on finding repeated relationships within it, and apply that logic when scanning the text.

Here's how the algorithm would work step-by-step:

  1. First, create a coded version of the pattern. This means replacing each unique character in the pattern with a number, so the first unique character becomes '1', the second becomes '2', and so on.
  2. Now, do the same thing to every section of the text that is the same length as the pattern, creating its own coded version.
  3. Then, compare the coded version of the pattern to each of the coded versions of the text sections. If they match exactly, it means that the pattern exists in that part of the text.
  4. By using these coded versions, we reduce the problem to simple number matching, which is much faster than comparing more complex patterns directly.

Code Implementation

def substring_match_pattern(text, pattern):
    pattern_length = len(pattern)
    text_length = len(text)
    results = []

    def encode_string(input_string):
        encoding_map = {}
        next_code = 1
        encoded_string = ''
        for char in input_string:
            if char not in encoding_map:
                encoding_map[char] = str(next_code)
                next_code += 1
            encoded_string += encoding_map[char]
        return encoded_string

    encoded_pattern = encode_string(pattern)

    # Iterate through the text to find potential matches.
    for i in range(text_length - pattern_length + 1):
        substring = text[i:i + pattern_length]
        encoded_substring = encode_string(substring)

        # Compare the encoded pattern and substring.
        if encoded_pattern == encoded_substring:
            results.append(i)

    return results

# Example Usage
# text = "redblueredblue"
# pattern = "redblue"
# print(substring_match_pattern(text, pattern))

# text = "abcabcabc"
# pattern = "abc"
# print(substring_match_pattern(text, pattern))

# text = "ababababab"
# pattern = "aba"
# print(substring_match_pattern(text, pattern))

# text = "aabbabbaab"
# pattern = "aabb"
# print(substring_match_pattern(text, pattern))

def test_cases():
    assert substring_match_pattern("redblueredblue", "redblue") == [0, 6]
    assert substring_match_pattern("abcabcabc", "abc") == [0, 3, 6]
    assert substring_match_pattern("ababababab", "aba") == [0, 2, 4, 6]
    assert substring_match_pattern("aabbabbaab", "aabb") == [0, 4]
    assert substring_match_pattern("aabbabbaab", "abba") == [2, 5]
    assert substring_match_pattern("aabbabbaab", "bba") == [3, 6]
    assert substring_match_pattern("aabbabbaab", "b") == [2, 3, 5, 6, 8]
    assert substring_match_pattern("aabbabbaab", "a") == [0, 1, 4, 7, 9]
    assert substring_match_pattern("aabbabbaab", "") == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    assert substring_match_pattern("", "") == [0]

    # Test case with no match
    assert substring_match_pattern("abcdefg", "xyz") == []

    # Test case where pattern is longer than text
    assert substring_match_pattern("abc", "abcdef") == []

    print("All test cases passed!")

#test_cases()

Big(O) Analysis

Time Complexity
O(n*m)Let n be the length of the text and m be the length of the pattern. We iterate through the text to consider each possible substring of length m, which takes O(n-m+1) time. For each substring, we create its coded version, which involves iterating through the substring of length m, taking O(m) time. Then, we compare the coded substring with the coded pattern, also taking O(m) time. The dominant operations are iterating through the text (O(n)) and for each substring of the text we code it and perform a comparison against the coded pattern (O(m)). Therefore, the overall time complexity is O(n*m), where n is the length of the text and m is the length of the pattern. Although the substring iteration is technically n-m+1, we simplify the expression to O(n*m) since m is considered a constant factor.
Space Complexity
O(M)The auxiliary space is primarily determined by the size of the coded pattern and the coded sections of the text. Let M be the length of the pattern. The coded pattern requires space proportional to M to store the mapping of characters to numbers. Similarly, each coded section of the text, which is also of length M, needs temporary storage. Since we're processing the text one section at a time, the maximum auxiliary space used at any point is determined by the coded pattern/section size of M. Therefore, the space complexity is O(M), where M is the length of the pattern.

Edge Cases

CaseHow to Handle
Empty pattern stringReturn an empty list since no match is possible with an empty pattern.
Empty text stringReturn an empty list as the pattern cannot be found in an empty text.
Null pattern stringThrow an IllegalArgumentException or return an error code to indicate invalid input.
Null text stringThrow an IllegalArgumentException or return an error code to indicate invalid input.
Pattern string longer than text stringReturn an empty list since the pattern cannot be contained within the text string.
Pattern containing special characters (e.g., regex metacharacters)Escape special characters in the pattern string before using it in any string matching function.
Text string contains Unicode charactersEnsure the string matching algorithm correctly handles Unicode characters based on the encoding used.
Pattern and Text strings are identicalReturn a list containing index 0 since the pattern starts at the very beginning of the text.