Taro Logo

Find the Index of the First Occurrence in a String #6 Most Asked

Easy
7 views
Topics:
StringsTwo PointersSliding Windows

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2:

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.

Constraints:

  • 1 <= haystack.length, needle.length <= 104
  • haystack and needle consist of only lowercase English characters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can either `needle` or `haystack` be null or empty strings? If so, what should the return value be?
  2. Are `needle` and `haystack` case-sensitive? For example, should "hello" match "Hello"?
  3. What is the expected range for the lengths of `needle` and `haystack`?
  4. If `needle` is an empty string, what should I return?
  5. Should I assume both `needle` and `haystack` only contain standard ASCII characters, or are Unicode characters possible?

Brute Force Solution

Approach

We want to find the first spot where a smaller piece of text appears inside a larger piece of text. The brute force approach involves checking every possible starting position within the larger text to see if the smaller text matches at that location. We continue this process until we find a match, or exhaust all possible starting positions.

Here's how the algorithm would work step-by-step:

  1. Start at the very beginning of the larger text.
  2. Check if the smaller text is exactly the same as the text starting at that position in the larger text.
  3. If they match, you've found the first occurrence, so you're done.
  4. If they don't match, move one position forward in the larger text.
  5. Repeat the checking process again: see if the smaller text matches the text starting at this new position in the larger text.
  6. Keep doing this, moving one position at a time, until you either find a match or reach the end of the larger text.
  7. If you reach the end of the larger text without finding a match, it means the smaller text doesn't appear in the larger text at all.

Code Implementation

def find_first_occurrence(
    larger_text,
    smaller_text):

    larger_text_length = len(larger_text)
    smaller_text_length = len(smaller_text)

    for starting_position in range(
        larger_text_length - smaller_text_length + 1):
        # Iterate through all possible starting positions
        # within the larger text.

        match = True
        for character_index in range(smaller_text_length):
            if larger_text[starting_position + \
                    character_index] != \
                    smaller_text[character_index]:
                match = False
                break

        # If all characters match, return
        # the starting position.
        if match:
            return starting_position

    # If no match is found, return -1.
    return -1

Big(O) Analysis

Time Complexity
O(m*n)The provided brute force approach iterates through the larger string (haystack) of length n. For each position in the haystack, it compares a substring of length m (the length of the needle) with the needle itself. In the worst case, it compares the needle against every possible starting position in the haystack. Thus, the time complexity is proportional to the product of the length of the haystack and the length of the needle, resulting in O(m*n). Note that m can be considered a constant, if it is relatively small.
Space Complexity
O(1)The provided plain English explanation outlines a brute-force approach. The algorithm iterates through the larger string, comparing substrings with the smaller string. It doesn't explicitly mention any auxiliary data structures such as lists, maps, or sets being used to store intermediate results or track visited positions. The described process uses only a few index variables to track the current position in the larger string and potentially the smaller string during comparison, and these variables consume a constant amount of extra space, irrespective of the input string sizes. Thus, the space complexity is constant.

Optimal Solution

Approach

Instead of checking every single possible starting point, the optimal approach focuses on quickly skipping over sections that can't possibly contain the answer. This involves a preprocessing step that smartly identifies patterns to avoid unnecessary comparisons later on, which dramatically speeds up the search.

Here's how the algorithm would work step-by-step:

  1. First, analyze the pattern we're looking for to understand its structure and identify any repeating parts or unique characteristics.
  2. Then, create a 'map' of the main text, representing the positions where the pattern could potentially start. This map helps us quickly rule out many positions without doing a full comparison.
  3. Next, go through the main text, but instead of checking every single spot, use the 'map' to only compare the pattern at the positions we marked as possible starting points.
  4. When a match is found, return the spot's location in the main text.
  5. If no match is found after checking all valid positions based on the map, conclude that the pattern isn't there.

Code Implementation

def find_first_occurrence(text, pattern):
    text_length = len(text)
    pattern_length = len(pattern)

    if pattern_length == 0:
        return 0

    # Create potential starting point map.
    potential_starts = []
    for i in range(text_length - pattern_length + 1):
        potential_starts.append(i)

    # Only compare pattern at possible start positions.
    for start_index in potential_starts:
        if text[start_index:start_index + pattern_length] == pattern:

            return start_index

    # Pattern isn't in text.
    return -1

Big(O) Analysis

Time Complexity
O(n + m)The preprocessing step to analyze the pattern and create the 'map' based on potential starting positions takes O(m) time, where m is the length of the pattern. The subsequent search through the main text of length n, utilizing the precomputed map to selectively compare the pattern, takes O(n) time in the worst-case scenario, where the map indicates many possible starting points requiring comparisons. The overall time complexity is therefore O(n + m) because we iterate through both the main text and the pattern in a linear manner. This assumes the pattern matching at selected positions takes constant time on average due to the map efficiently pruning most locations.
Space Complexity
O(N)The algorithm creates a 'map' of the main text, representing positions where the pattern could potentially start. In the worst case, this map could store an entry for each position in the main text, if almost every position is a potential start. Therefore, the auxiliary space used by this 'map' grows linearly with the length of the main text, which we denote as N. Consequently, the space complexity is O(N).

Edge Cases

needle is an empty string
How to Handle:
Return 0, as an empty string is considered to be present at the beginning of any string.
haystack is an empty string, and needle is not
How to Handle:
Return -1, as the needle cannot be found in an empty haystack.
Both needle and haystack are empty strings
How to Handle:
Return 0, as an empty needle is at the beginning of an empty haystack.
needle is longer than haystack
How to Handle:
Return -1, as the needle cannot be found if it's longer than the haystack.
needle appears at the very beginning of haystack
How to Handle:
Standard iterative string comparison should correctly identify index 0 as the starting point.
needle appears at the very end of haystack
How to Handle:
Ensure the loop condition allows checking up to the last possible start index for the needle.
needle appears multiple times in haystack
How to Handle:
The algorithm should return the index of the *first* occurrence and terminate there.
haystack and needle contain very long strings
How to Handle:
Using a naive string comparison approach could result in O(m*n) time complexity; consider more efficient algorithms like KMP.
0/0 completed