Taro Logo

Find All Anagrams in a String

Medium
Meta logo
Meta
4 views
Topics:
ArraysStringsSliding Windows

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

For example:

Example 1:

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

Constraints:

  • 1 <= s.length, p.length <= 3 * 10^4
  • s and p consist of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input strings `s` and `p` contain characters beyond the standard ASCII set, such as Unicode characters?
  2. What should I return if no anagrams of `p` are found within `s`? Should I return an empty list or null?
  3. Are `s` and `p` guaranteed to be non-empty strings?
  4. If `p` is longer than `s`, what is the expected behavior? Should I return an empty list?
  5. Does the order of the anagram indices in the output list matter?

Brute Force Solution

Approach

The brute force method for finding anagrams is all about checking every single possibility. We slide a window across the main string and, at each position, we painstakingly compare it to the target anagram.

Here's how the algorithm would work step-by-step:

  1. Take a section of the main string that's the exact same length as the anagram you're searching for.
  2. Check if this section is an anagram of the target anagram.
  3. If it is, mark down its starting location.
  4. Move the section one position to the right in the main string.
  5. Repeat the process of checking for an anagram.
  6. Continue sliding the section and checking until you reach the end of the main string.
  7. You will end up with a list of all the starting positions where anagrams were found.

Code Implementation

def find_all_anagrams_brute_force(main_string, anagram):
    anagram_indices = []
    anagram_length = len(anagram)
    main_string_length = len(main_string)

    for index in range(main_string_length - anagram_length + 1):

        substring = main_string[index:index + anagram_length]

        # Check if the substring is an anagram
        if is_anagram(substring, anagram):

            #Record the starting index if anagram
            anagram_indices.append(index)

    return anagram_indices

def is_anagram(first_string, second_string):

    if len(first_string) != len(second_string):
        return False

    first_string_characters = sorted(first_string)
    second_string_characters = sorted(second_string)

    #Compare the sorted lists to find anagrams
    if first_string_characters == second_string_characters:

        return True

    return False

Big(O) Analysis

Time Complexity
O(n*k)The brute force approach involves iterating through the main string (let's say of length n) using a sliding window. At each position, we compare the substring of length k (the length of the target anagram) with the target anagram to check if they are anagrams. The anagram check itself requires comparing the characters which takes O(k) time. Since we iterate through approximately n positions in the main string, and for each position we perform an O(k) anagram check, the overall time complexity becomes O(n*k).
Space Complexity
O(1)The brute force method described only uses a constant amount of extra space. The algorithm slides a window and performs comparisons, but it doesn't create any auxiliary data structures that scale with the input size. Therefore, the space complexity is O(1).

Optimal Solution

Approach

The key is to avoid repeatedly checking every possible substring. We only need to track how often each letter appears in the target anagram, and then slide a window across the main string, updating our counts as we go. This allows us to quickly check if an anagram exists at each position.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the anagram we're looking for. Think of this as a recipe.
  2. Now, slide a window (of the same length as the anagram) across the bigger string.
  3. As the window moves, keep track of the letters inside it and how many times each appears.
  4. Compare the letter counts inside the window to the original anagram recipe. If they match exactly, you've found an anagram!
  5. Instead of recalculating the letter counts for each new window position, update the counts based on the letters entering and leaving the window. This saves a lot of time.
  6. Repeat this process until you've checked all possible window positions within the main string.

Code Implementation

def find_all_anagrams_in_a_string(main_string, anagram):
    anagram_length = len(anagram)
    main_string_length = len(main_string)
    
    if anagram_length > main_string_length:
        return []

    anagram_character_counts = {}
    for character in anagram:
        anagram_character_counts[character] = anagram_character_counts.get(character, 0) + 1

    result_indices = []
    window_character_counts = {}

    # Initialize the window
    for i in range(anagram_length):
        window_character_counts[main_string[i]] = window_character_counts.get(main_string[i], 0) + 1

    #Check if first window is an anagram
    if window_character_counts == anagram_character_counts:
        result_indices.append(0)

    # Slide the window
    for i in range(1, main_string_length - anagram_length + 1):

        # Remove the leftmost character from the window
        left_character = main_string[i - 1]
        window_character_counts[left_character] -= 1
        if window_character_counts[left_character] == 0:
            del window_character_counts[left_character]

        # Add the rightmost character to the window
        right_character = main_string[i + anagram_length - 1]
        window_character_counts[right_character] = window_character_counts.get(right_character, 0) + 1

        # Compare the current window with the anagram
        if window_character_counts == anagram_character_counts:
            result_indices.append(i)

    return result_indices

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the main string 's' of length 'n' using a sliding window. The window size is fixed to the length of the anagram 'p'. Inside the loop, it compares the character counts of the current window with the character counts of the anagram. These character count operations are done on a fixed size array of 26 elements (for the alphabet), making them constant time operations. Thus, the overall time complexity is dominated by the single loop traversing the string 's', giving us O(n).
Space Complexity
O(1)The space complexity is dominated by the storage of letter counts for the anagram and the sliding window. The problem uses two frequency maps (or arrays) to store counts of characters in the pattern (anagram) and the current window. Since the character set is limited (e.g., lowercase English letters), the size of these maps (or arrays) is constant and independent of the input string's length (N). Therefore, the auxiliary space used is constant, resulting in O(1) space complexity.

Edge Cases

CaseHow to Handle
s or p is nullThrow IllegalArgumentException or return an empty list indicating invalid input.
s is empty, p is non-emptyReturn an empty list since the pattern cannot be found in an empty string.
p is empty, s is non-emptyReturn an empty list since an empty pattern can theoretically be 'found' at every index, but it is not an anagram in the traditional sense.
length of p is greater than length of sReturn an empty list since an anagram of p cannot exist within s.
s and p have the same length, but are not anagramsReturn an empty list after confirming the initial window in s is not an anagram of p.
s and p are very long strings, potentially leading to performance issuesUse a sliding window with a frequency map to achieve O(n) time complexity for efficient processing.
p contains duplicate charactersThe frequency map correctly accounts for duplicate characters within p.
s contains characters not present in pThe sliding window and comparison of frequency maps will correctly identify when the current window is not an anagram.