Taro Logo

Find All Anagrams in a String

Medium
Revolut logo
Revolut
0 views
Topics:
ArraysTwo PointersSliding WindowsStrings

Given two strings s and p, return an array of all the start indices of p's anagrams in s. You may return the answer in any order.

Example 1:

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

Constraints:

  • 1 <= s.length, p.length <= 3 * 104
  • s and p consist of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input strings s and p contain characters other than lowercase English letters?
  2. What are the maximum lengths of the input strings s and p?
  3. If no anagrams are found, should I return an empty list or null?
  4. Is the order of the starting indices in the output list important?
  5. Can the string p be empty or null?

Brute Force Solution

Approach

The goal is to find all occurrences of anagrams of a given pattern within a larger text. A brute force approach involves checking every possible substring within the text to see if it is an anagram of the pattern. This means we'll go through the text, look at chunks of the right size, and compare them to the pattern.

Here's how the algorithm would work step-by-step:

  1. Start at the very beginning of the text.
  2. Take a chunk of the text that's exactly the same size as the pattern you're looking for anagrams of.
  3. Check if this chunk is an anagram of the pattern. One way to do this is to count how many times each letter appears in the chunk and compare it to how many times each letter appears in the pattern. If the counts are the same for every letter, it's an anagram.
  4. If it's an anagram, great! Make a note of where you found it.
  5. Move one position to the right in the text.
  6. Again, take a chunk of the text that's the same size as the pattern, starting from this new position.
  7. Check if this new chunk is an anagram of the pattern, just like before.
  8. Repeat steps 5-7 until you've reached the end of the text, making sure you've checked every possible chunk.
  9. Once you've gone through the entire text, you'll have a list of all the places where anagrams of the pattern were found.

Code Implementation

def find_all_anagrams_brute_force(text, pattern):
    anagram_indices = []
    pattern_length = len(pattern)
    text_length = len(text)

    for index in range(text_length - pattern_length + 1):
        substring = text[index:index + pattern_length]

        # We use character counts to check for anagrams
        if is_anagram(substring, pattern):

            anagram_indices.append(index)

    return anagram_indices

def is_anagram(substring, pattern):
    if len(substring) != len(pattern):
        return False

    pattern_character_counts = {}
    for char in pattern:
        pattern_character_counts[char] = pattern_character_counts.get(char, 0) + 1

    substring_character_counts = {}
    for char in substring:
        substring_character_counts[char] = substring_character_counts.get(char, 0) + 1

    # Compare the counts of each character.
    if pattern_character_counts == substring_character_counts:

        return True
    else:
        return False

Big(O) Analysis

Time Complexity
O(n*m)The algorithm iterates through the text string of length n, creating a substring of length m (the pattern length) at each position. For each substring, it compares character counts with the pattern, which takes O(m) time. Since this substring comparison happens for potentially every index in the text string, the overall time complexity is approximately n iterations each taking m operations, resulting in O(n*m) where n is the length of the text and m is the length of the pattern.
Space Complexity
O(1)The provided plain English explanation focuses on a brute-force approach involving comparing chunks of the text to the pattern. The key operation is counting letter frequencies within a chunk and comparing these counts to the letter frequencies in the pattern. If we assume a fixed character set (e.g., lowercase English letters), the space required to store these counts is constant, regardless of the input string's length. The counts themselves will be some fixed size data structure such as an array of length 26 (if checking for lower case english characters), which has constant space. Therefore, the auxiliary space used is independent of the input size N, making the space complexity O(1).

Optimal Solution

Approach

We want to find all the places in a big string where a smaller string's letters appear, but in any order. The key idea is to use a sliding window that only moves forward, avoiding redundant checks by cleverly updating the letter counts as it goes.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each letter appears in the smaller string. This is our target count.
  2. Now, imagine a window the same size as the smaller string sliding across the bigger string, one character at a time.
  3. As the window moves, keep track of how many times each letter appears within the current window.
  4. Each time the count of letters in the current window exactly matches our target count from the smaller string, it means we've found an anagram.
  5. Instead of recounting all letters every time the window moves, simply update the count: subtract the letter that's leaving the window on the left and add the letter that's entering the window on the right.
  6. Keep sliding the window and checking the counts until you reach the end of the bigger string. Every time counts match, record the starting spot of the window.

Code Implementation

def find_all_anagrams_in_string(main_string, pattern_string):
    anagram_indices = []
    pattern_length = len(pattern_string)
    main_string_length = len(main_string)

    if pattern_length > main_string_length:
        return anagram_indices

    pattern_character_counts = {}
    for character in pattern_string:
        pattern_character_counts[character] = pattern_character_counts.get(character, 0) + 1

    window_character_counts = {}

    # Initialize the sliding window's character counts.
    for i in range(pattern_length):
        current_character = main_string[i]
        window_character_counts[current_character] = window_character_counts.get(current_character, 0) + 1

    # This checks if the initial window is an anagram.
    if window_character_counts == pattern_character_counts:
        anagram_indices.append(0)

    for i in range(1, main_string_length - pattern_length + 1):
        # Remove the leftmost character from the window.
        left_character = main_string[i - 1]
        window_character_counts[left_character] -= 1
        if window_character_counts[left_character] == 0:
            del window_character_counts[left_character]

        # Add the rightmost character to the window.
        right_character = main_string[i + pattern_length - 1]
        window_character_counts[right_character] = window_character_counts.get(right_character, 0) + 1

        # Check if the current window is an anagram.
        if window_character_counts == pattern_character_counts:

            # Add the starting index of the anagram to result.
            anagram_indices.append(i)

    return anagram_indices

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the larger string s of length n using a sliding window. Creating the frequency map for the pattern p takes O(m) time where m is the length of pattern p, but this is done only once. The sliding window progresses through s from left to right, performing a constant number of operations at each step (adding and subtracting counts). Therefore, the dominant factor is the single pass through the string s, resulting in a time complexity of O(n).
Space Complexity
O(1)The algorithm uses a fixed-size hash map (or array) to store character counts for both the smaller string and the sliding window. The size of this hash map is determined by the number of distinct characters in the alphabet (e.g., 26 for lowercase English letters), which is constant and independent of the input string sizes. Therefore, the auxiliary space used is constant, resulting in O(1) space complexity.

Edge Cases

CaseHow to Handle
s or p is null or emptyReturn an empty list if either string is null or empty, as no anagram can exist.
p is longer than sReturn an empty list because p cannot be an anagram of any substring of s.
s and p are the same string (trivial anagram)Return index 0 as the only anagram start index.
s and p have the same length but are not anagramsReturn an empty list since no anagram exists.
s contains only one unique character repeated many times, p contains only one unique character repeated fewer timesSliding window will check all possible windows and return starting indices where counts match.
p contains duplicate charactersCharacter frequency maps or arrays correctly handle duplicate characters in p when comparing with substrings of s.
s is very long, potentially causing performance issues if the sliding window or frequency map comparison is inefficient.Ensure the solution uses an efficient O(n) algorithm like sliding window with constant time frequency map comparison.
s and p contains unicode charactersUse a hash map that can accommodate all unicode characters to store counts, or adjust array size if known character set is limited