Taro Logo

Shortest Matching Substring

Medium
18 views
2 months ago

You are given a string s and a pattern string p, where p contains exactly two '*' characters.

The '*' in p matches any sequence of zero or more characters.

Return the length of the shortest substring in s that matches p. If there is no such substring, return -1. Note: The empty substring is considered valid.

Example 1: s = "abaacbaecebce", p = "bacce" Output: 8

Example 2: s = "baccbaadbc", p = "ccbaaadb" Output: -1

Example 3: s = "a", p = "**" Output: 0

Example 4: s = "madlogic", p = "adlogi" Output: 6

Constraints: 1 <= s.length <= 10^5 2 <= p.length <= 10^5 s contains only lowercase English letters. p contains only lowercase English letters and exactly two '*'.

Sample Answer
def shortest_substring_match(s: str, p: str) -> int:
    """Given a string s and a pattern string p, where p contains exactly two '*' characters.
    The '*' in p matches any sequence of zero or more characters.
    Return the length of the shortest substring in s that matches p. If there is no such substring, return -1.
    Note: The empty substring is considered valid.
    """

    parts = p.split('*')
    if len(parts) != 3:
        return -1  # Invalid pattern format

    prefix = parts[0]
    middle = parts[1]
    suffix = parts[2]

    min_length = float('inf')

    for i in range(len(s) - len(prefix) - len(suffix) + 1):
        if s[i:i + len(prefix)] == prefix:
            for j in range(i + len(prefix), len(s) - len(suffix) + 1):
                if s[j:j + len(suffix)] == suffix:
                    substring = s[i:j + len(suffix)]
                    
                    middle_start = i + len(prefix)
                    middle_end = j
                    actual_middle = s[middle_start:middle_end]

                    is_match = True
                    
                    min_length = min(min_length, len(substring))

    if min_length == float('inf'):
        return -1
    else:
        return min_length


# Example usage and test cases:
print(shortest_substring_match("abaacbaecebce", "ba*c*ce"))  # Output: 8
print(shortest_substring_match("baccbaadbc", "cc*baa*adb"))  # Output: -1
print(shortest_substring_match("a", "**"))  # Output: 0
print(shortest_substring_match("madlogic", "*adlogi*")) # Output: 6
print(shortest_substring_match("abc", "a*b*c")) # Output: 3
print(shortest_substring_match("ababab", "a*b*ab")) # Output: 4
print(shortest_substring_match("hello world", "he*o w*ld")) # Output: 11
print(shortest_substring_match("axbycz", "a*c*z")) # Output: 6
print(shortest_substring_match("aaaaaaaaaa", "a*a*a")) # Output: 3
print(shortest_substring_match("xyz", "x*y*z"))
print(shortest_substring_match("aaaaabbbb", "a*b*")) #Output 2
print(shortest_substring_match("aaaaabbbb", "aa*bb*")) #Output 4
print(shortest_substring_match("abcabcabc", "ab*ab*abc")) # Output 7
print(shortest_substring_match("abcabcabc", "ab*c*bc")) # Output 5
print(shortest_substring_match("qwerty", "q*rty*")) # Output 5
print(shortest_substring_match("aaaa", "a*a*")) # Output: 2

Explanation:

  1. Split the pattern: The pattern p is split into three parts using * as the delimiter. These parts are prefix, middle, and suffix. We expect the split function to return a list of exactly 3 parts. If not, it is an invalid format and we return -1.
  2. Iterate through possible substrings: The code iterates through all possible substrings of s that could potentially match the pattern. The outer loop iterates through possible starting positions i for the prefix, and the inner loop iterates through possible ending positions j for the suffix.
  3. Check for prefix and suffix match: Inside the inner loop, it checks if the substring starting at i matches the prefix and the substring ending at j matches the suffix. If both match, then the potential substring from i to j qualifies as a match. At this stage, we know that s[i:i + len(prefix)] == prefix and that s[j:j + len(suffix)] == suffix.
  4. Calculate shortest length: If prefix and suffix matches, the length of the substring is calculated. Then we set the min_length to be the min of the current min_length and the calculated length of the current substring
  5. No match found: If no match is found during the entire iteration then the min_length will remain float('inf'). Hence return -1.

Big(O) Runtime Analysis:

The runtime complexity is approximately O(n^2 * m), where n is the length of string s and m is the average length of potential matching substrings. The nested loops contribute O(n^2), and string comparisons within the loops can take O(m) in the worst case because the s[i:i + len(prefix)] and s[j:j + len(suffix)] operations can potentially iterate through the entire string.

Big(O) Space Usage Analysis:

The space complexity is O(1). The algorithm uses a fixed number of variables to store the prefix, middle, suffix, and indices. The space used does not scale with the input size.