Given two strings needle
and haystack
, implement a function to find the index of the first occurrence of needle
in haystack
. If needle
is not found, return -1.
Here are some constraints:
haystack
and needle
consist of lowercase English characters.1 <= haystack.length, needle.length <= 10^4
Could you provide a solution that is efficient?
Example 1:
Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.
One straightforward approach is to use a brute-force method. We iterate through the haystack
string, and at each index, check if the needle
string matches the substring of haystack
starting from that index. If a match is found, we return the starting index. If we reach the end of the haystack
without finding a match, we return -1.
def strStr_brute_force(haystack: str, needle: str) -> int:
if not needle:
return 0
for i in range(len(haystack) - len(needle) + 1):
if haystack[i:i+len(needle)] == needle:
return i
return -1
haystack
and m is the length of needle
. In the worst case, we might need to compare needle
with almost every substring of haystack
.A more efficient approach is to use the Knuth-Morris-Pratt (KMP) algorithm. The KMP algorithm preprocesses the needle
string to create a table that helps to avoid unnecessary comparisons. This table, often called the longest proper prefix which is also a suffix (LPS) table, stores the length of the longest proper prefix of needle
that is also a suffix of the prefix ending at each index. The KMP algorithm achieves O(n) time complexity where n is the length of the haystack
.
def strStr(haystack: str, needle: str) -> int:
if not needle:
return 0
def computeLPSArray(needle: str) -> list[int]:
length = len(needle)
lps = [0] * length
length_of_prefix = 0 # Length of the previous longest prefix suffix
i = 1
while i < length:
if needle[i] == needle[length_of_prefix]:
length_of_prefix += 1
lps[i] = length_of_prefix
i += 1
else:
if length_of_prefix != 0:
length_of_prefix = lps[length_of_prefix - 1]
else:
lps[i] = 0
i += 1
return lps
lps = computeLPSArray(needle)
i = 0 # index for haystack
j = 0 # index for needle
while i < len(haystack):
if needle[j] == haystack[i]:
i += 1
j += 1
if j == len(needle):
return i - j
elif i < len(haystack) and needle[j] != haystack[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return -1
haystack
and m is the length of needle
. The computeLPSArray
function takes O(m) time, and the main loop takes O(n) time.needle
.needle
: If needle
is an empty string, the function should return 0, as an empty string is always found at the beginning of any string.needle
longer than haystack
: If needle
is longer than haystack
, the function should return -1, as needle
cannot be a substring of haystack
.haystack
or needle
containing non-lowercase English characters: The problem statement specifies lowercase characters only, but you could clarify this during the interview and ask about handling other characters. If required to handle all characters, the same logic applies; the character set does not affect the algorithm's time complexity.