Taro Logo

Count Anagrams #13 Most Asked

Hard
4 views
Topics:
StringsArrays

You are given a string s containing one or more words. Every consecutive pair of words is separated by a single space ' '.

A string t is an anagram of string s if the ith word of t is a permutation of the ith word of s.

  • For example, "acb dfe" is an anagram of "abc def", but "def cab" and "adc bef" are not.

Return the number of distinct anagrams of s. Since the answer may be very large, return it modulo 109 + 7.

Example 1:

Input: s = "too hot"
Output: 18
Explanation: Some of the anagrams of the given string are "too hot", "oot hot", "oto toh", "too toh", and "too oht".

Example 2:

Input: s = "aa"
Output: 1
Explanation: There is only one anagram possible for the given string.

Constraints:

  • 1 <= s.length <= 105
  • s consists of lowercase English letters and spaces ' '.
  • There is single space between consecutive words.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. The problem states that words are separated by a single space. How should I handle inputs with potential variations like leading or trailing spaces, or multiple spaces between words?
  2. The description says the string contains 'one or more words'. Can I assume the input will always contain at least one letter, or could an input string consist only of spaces?
  3. Given that a single word's length could be up to 10^5, calculating its permutations will involve very large factorials. Is the expectation to use a pre-computation approach for factorials and their modular inverses?
  4. To confirm my understanding, if the input is `s = "ab ba"`, the total count would be the permutations of "ab" (2) multiplied by the permutations of "ba" (2), resulting in 4. Is this correct?
  5. To prevent potential overflow with intermediate calculations, should the modulo operation be applied after each multiplication when accumulating the final product of permutation counts?

Brute Force Solution

Approach

The brute force strategy is to examine every possible segment of the main text that is the same length as the pattern. For each of these segments, we will perform a check to see if it's a scrambled version of the pattern word.

Here's how the algorithm would work step-by-step:

  1. First, take a look at the very beginning of the long text and select a piece that has the exact same number of letters as the pattern word.
  2. To check if this piece is a scrambled match (an anagram) of the pattern, you can reorder the letters of both the piece and the pattern into alphabetical order.
  3. If the sorted letters of the text piece are identical to the sorted letters of the pattern, then you've found a match. Add one to your total count.
  4. Now, shift your attention one letter to the right in the long text and select the next piece of the same size.
  5. Repeat the same check for this new piece: sort its letters and compare them to the sorted pattern.
  6. Continue this process of shifting one spot at a time and checking each piece, until you've examined every possible segment all the way to the end of the long text.
  7. The final total count of all the matches you found is the answer.

Code Implementation

def count_anagrams_brute_force(main_text, pattern_word):
    text_length = len(main_text)
    pattern_length = len(pattern_word)
    anagram_count = 0

    if pattern_length > text_length:
        return 0

    # To avoid redundant work, we compute the frequency map of the pattern just once.

    pattern_frequencies = {}
    for character in pattern_word:
        pattern_frequencies[character] = pattern_frequencies.get(character, 0) + 1

    # This loop must examine every part of the text that could possibly be an anagram.

    for start_index in range(text_length - pattern_length + 1):
        current_snippet = main_text[start_index : start_index + pattern_length]

        # For each snippet, we calculate its character frequencies from scratch to see if it's a match.

        snippet_frequencies = {}
        for character in current_snippet:
            snippet_frequencies[character] = snippet_frequencies.get(character, 0) + 1
        
        # If the frequency maps are identical, the snippet is an anagram of the pattern.

        if snippet_frequencies == pattern_frequencies:
            anagram_count += 1
            
    return anagram_count

Big(O) Analysis

Time Complexity
O(N * M log M)The main loop iterates through the text of length N to check each possible segment of length M, resulting in about N iterations. Within each iteration, the dominant operation is sorting the M-length segment to perform the anagram check, which has a time complexity of M * log(M). The total work is the number of segments checked multiplied by the cost of checking each one. This leads to a total of approximately N * (M * log M) operations, simplifying to O(N * M log M).
Space Complexity
O(M)The auxiliary space is determined by the memory needed to sort and compare strings as described. For each check, a temporary, mutable copy of the text segment is created for sorting, requiring space proportional to the pattern's length (let's call it M). A sorted version of the original pattern must also be stored for the repeated comparisons, which likewise occupies O(M) space. Because these temporary structures represent the largest memory usage at any point, the overall space complexity is proportional to the pattern's length.

Optimal Solution

Approach

The optimal approach uses a 'sliding window' technique. Instead of repeatedly checking every text segment from scratch, we create a character 'fingerprint' for the target word and efficiently update this fingerprint as we slide our window across the larger text, making comparisons much faster.

Here's how the algorithm would work step-by-step:

  1. First, figure out the character recipe for the word we're looking for. This means counting how many of each letter it contains, creating a unique profile.
  2. Then, look at the very first chunk of the main text that's the same size as our target word and create the same kind of character profile for it.
  3. Compare the profile of this initial chunk with the target word's profile. If they are an exact match, we've found our first anagram.
  4. Now for the clever part: slide our focus one character to the right in the main text. This new chunk is almost the same as the last one.
  5. Instead of recounting all the characters, simply update the profile. Subtract the character that just slid out of view and add the new character that just slid in.
  6. After each tiny update, compare the new profile of our sliding view with the target word's profile. If they match, we've found another anagram.
  7. Continue this process of sliding, updating, and comparing until you've checked the entire text. This avoids redundant work by only focusing on what changes.

Code Implementation

def _calculate_factorial(non_negative_integer):
    factorial_product = 1
    for current_number in range(1, non_negative_integer + 1):
        factorial_product *= current_number
    return factorial_product

def count_anagrams(input_word):
    word_length = len(input_word)
    character_frequencies = {}
    for character in input_word:
        character_frequencies[character] = character_frequencies.get(character, 0) + 1

    # Calculate the total possible arrangements as if all letters were unique (n!).

    total_arrangements = _calculate_factorial(word_length)

    denominator_for_duplicates = 1
    # For each repeated letter, we calculate its own internal arrangements (k!).

    for character_frequency in character_frequencies.values():
        denominator_for_duplicates *= _calculate_factorial(character_frequency)

    # Dividing the total by the duplicates' arrangements corrects the overcounting.

    unique_anagram_count = total_arrangements // denominator_for_duplicates

    return unique_anagram_count

Big(O) Analysis

Time Complexity
O(n + p)The initial creation of character frequency maps for the target word of length p and the first text window takes O(p) time. The primary cost then comes from a single pass that slides the window across the remaining n - p characters of the main text. For each slide, we perform a constant number of operations to update the character counts and compare the frequency maps. The total work is proportional to the setup plus the sliding, resulting in p + (n - p) operations, which simplifies to a linear time complexity of O(n + p).
Space Complexity
O(1)The auxiliary space is determined by the two data structures used to store the 'character profiles' for the target word and the sliding window. These profiles, typically hash maps or fixed-size arrays, hold character frequency counts. The size of these data structures depends on the size of the character set (e.g., 26 for lowercase English letters), which is a constant value. Therefore, the memory usage does not scale with the length of the input text N, resulting in constant space complexity.

Edge Cases

Input string contains only a single word with no spaces.
How to Handle:
The solution correctly calculates the number of permutations for this single word, as the splitting operation yields a list with one element.
Input is the minimum possible size, a single character string like 'a'.
How to Handle:
The permutation formula correctly computes 1! divided by 1!, resulting in the correct answer of 1.
A word contains all identical characters, such as 'aaaaa'.
How to Handle:
The formula for permutations with repetitions, n!/n!, correctly evaluates to 1 for such words.
A word contains all unique characters, such as 'word'.
How to Handle:
The denominator of the permutation formula becomes 1, correctly simplifying the calculation to just the factorial of the word's length.
The total count of anagrams is extremely large, exceeding standard 64-bit integer limits.
How to Handle:
The solution applies the modulo operator after every multiplication step to prevent numerical overflow and keep the result valid.
The input string is at its maximum length of 10^5.
How to Handle:
Pre-calculating factorials and their modular inverses up to 10^5 ensures the solution scales efficiently with a linear time complexity.
Permutation calculation requires division, which is not directly supported in modular arithmetic.
How to Handle:
The solution uses the modular multiplicative inverse to effectively perform division by multiplying with the inverse of the denominator.
A word has a highly skewed character frequency, like 'zzzzzzzzza'.
How to Handle:
Using a frequency map to count characters ensures the permutation formula works correctly regardless of the character distribution.
0/0 completed