Find All Anagrams in a String

Medium

Apple

1 view

Topics:

ArraysStringsSliding Windows

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

Example 1:

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

Clarifications/Edge Cases:

What should be returned if p is longer than s?
What should be returned if either s or p are empty strings?

Constraints:

1 <= s.length, p.length <= 3 * 10^4
s and p consist of lowercase English letters.

Find All Anagrams in a String

Medium

Apple

1 view

Topics:

ArraysStringsSliding Windows

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

Example 1:

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

Clarifications/Edge Cases:

What should be returned if p is longer than s?
What should be returned if either s or p are empty strings?

Constraints:

1 <= s.length, p.length <= 3 * 10^4
s and p consist of lowercase English letters.

Solution

Naive Solution (Brute Force)

For each possible starting index in string s, we extract a substring of the same length as p. Then, we check if this substring is an anagram of p. To check if two strings are anagrams, we can sort both strings and compare them. If they are equal after sorting, they are anagrams.

def find_anagrams_naive(s: str, p: str) -> list[int]:
    n = len(s)
    k = len(p)
    result = []

    for i in range(n - k + 1):
        substring = s[i:i+k]
        if sorted(substring) == sorted(p):
            result.append(i)

    return result

Time Complexity: O((n-k) * klog(k)), where n is the length of s and k is the length of p. For each of the (n-k) possible starting positions, we sort a substring of length k, which takes O(klog(k)) time.

Space Complexity: O(k), because sorting uses O(k) space.

Optimal Solution (Sliding Window with Character Frequency)

A more efficient approach uses a sliding window with character frequency counting. We maintain a frequency map (or array) for both the pattern p and the current window in s. As we slide the window, we update the frequency map of the window and compare it with the frequency map of p. If the frequency maps are identical, we have found an anagram.

def find_anagrams(s: str, p: str) -> list[int]:
    n = len(s)
    k = len(p)
    if k > n:  # Edge case: pattern longer than string
        return []

    result = []
    p_freq = [0] * 26  # Frequency array for pattern p
    window_freq = [0] * 26 # Frequency array for current window

    # Initialize frequency arrays for p and the first window
    for i in range(k):
        p_freq[ord(p[i]) - ord('a')] += 1
        window_freq[ord(s[i]) - ord('a')] += 1

    # Slide the window through s
    for i in range(n - k + 1):
        if window_freq == p_freq:
            result.append(i)

        # Update the window frequency for the next slide
        if i < n - k:
            window_freq[ord(s[i]) - ord('a')] -= 1      # Remove the leftmost char frequency
            window_freq[ord(s[i + k]) - ord('a')] += 1  # Add the rightmost char frequency

    return result

Time Complexity: O(n), where n is the length of s. We iterate through s once. The frequency map comparisons take O(1) time because the size of the alphabet is fixed (26).

Space Complexity: O(1), because we use a fixed-size array (of size 26) to store character frequencies.

Edge Cases:

If the length of p is greater than the length of s, there can be no anagrams, so return an empty list.
Empty string inputs for s or p (though problem constraints say that the length is at least 1, so we don't consider this case).
s and p are equal: In this case the index 0 should be returned.

Solution

Naive Solution (Brute Force)

def find_anagrams_naive(s: str, p: str) -> list[int]:
    n = len(s)
    k = len(p)
    result = []

    for i in range(n - k + 1):
        substring = s[i:i+k]
        if sorted(substring) == sorted(p):
            result.append(i)

    return result

Space Complexity: O(k), because sorting uses O(k) space.

Optimal Solution (Sliding Window with Character Frequency)

def find_anagrams(s: str, p: str) -> list[int]:
    n = len(s)
    k = len(p)
    if k > n:  # Edge case: pattern longer than string
        return []

    result = []
    p_freq = [0] * 26  # Frequency array for pattern p
    window_freq = [0] * 26 # Frequency array for current window

    # Initialize frequency arrays for p and the first window
    for i in range(k):
        p_freq[ord(p[i]) - ord('a')] += 1
        window_freq[ord(s[i]) - ord('a')] += 1

    # Slide the window through s
    for i in range(n - k + 1):
        if window_freq == p_freq:
            result.append(i)

        # Update the window frequency for the next slide
        if i < n - k:
            window_freq[ord(s[i]) - ord('a')] -= 1      # Remove the leftmost char frequency
            window_freq[ord(s[i + k]) - ord('a')] += 1  # Add the rightmost char frequency

    return result

Time Complexity: O(n), where n is the length of s. We iterate through s once. The frequency map comparisons take O(1) time because the size of the alphabet is fixed (26).

Space Complexity: O(1), because we use a fixed-size array (of size 26) to store character frequencies.

Edge Cases:

If the length of p is greater than the length of s, there can be no anagrams, so return an empty list.
Empty string inputs for s or p (though problem constraints say that the length is at least 1, so we don't consider this case).
s and p are equal: In this case the index 0 should be returned.