Taro Logo

Group Anagrams #19 Most Asked

Medium
Topics:
ArraysStrings

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]

Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation:

  • There is no string in strs that can be rearranged to form "bat".
  • The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.
  • The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [""]

Output: [[""]]

Example 3:

Input: strs = ["a"]

Output: [["a"]]

Constraints:

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range of lengths for the input array and the individual strings within it?
  2. What character set can we expect the strings to contain? For instance, are they all lowercase English letters, or can they include uppercase, numbers, or special characters?
  3. How should we handle empty strings in the input array? Should an empty string be grouped with other empty strings?
  4. Could the input array itself be empty, and if so, what should the expected output be?
  5. Does the input array contain duplicate strings, for example `["eat", "tea", "eat"]`, and how should they be handled in the output groups?

Brute Force Solution

Approach

The brute force strategy involves comparing every single word with every other word in the list to see if they are anagrams of each other. We will go through the list one by one, and for each word, we'll check all the remaining words to find its anagram partners.

Here's how the algorithm would work step-by-step:

  1. Take the first word from the list and put it into a new group.
  2. Now, look at the second word in the original list.
  3. Compare this second word with the first word you already grouped. To check if they are anagrams, you can see if they are made of the exact same letters, just in a different order.
  4. If they are anagrams, put the second word into the same group as the first. If not, create a brand new group just for the second word.
  5. Next, take the third word from the original list.
  6. Compare this third word against a representative from each group you've already created.
  7. If it's an anagram of the words in an existing group, add it to that group.
  8. If it doesn't match any existing group, create a new group for this third word.
  9. Continue this process for every single word in the original list until all words have been placed into a group.

Code Implementation

def are_anagrams(first_word, second_word):
    if len(first_word) != len(second_word):
        return False
    return sorted(first_word) == sorted(second_word)

def group_anagrams_brute_force(list_of_words):
    grouped_anagrams = []
    words_already_grouped = [False] * len(list_of_words)

    # Iterate through each word to either form a new group or join an existing one.
    for current_index, current_word in enumerate(list_of_words):
        if words_already_grouped[current_index]:
            continue

        # Create a new group for the current word as it hasn't been grouped yet.
        new_anagram_group = [current_word]
        words_already_grouped[current_index] = True

        # Compare the current word against all subsequent words to find its anagrams.
        for next_index in range(current_index + 1, len(list_of_words)):
            if not words_already_grouped[next_index]:
                next_word = list_of_words[next_index]
                if are_anagrams(current_word, next_word):
                    new_anagram_group.append(next_word)
                    # Mark this subsequent word as grouped so we don't process it again.
                    words_already_grouped[next_index] = True
        
        grouped_anagrams.append(new_anagram_group)

    return grouped_anagrams

Big(O) Analysis

Time Complexity
O(n² * k * log k)The overall complexity is driven by comparing every word with a representative from each potential group. Let n be the number of words and k be the maximum length of a word. For each of the n words, we might have to compare it against a representative from every previously formed group, which in the worst case could be up to n-1 groups. The comparison itself, checking if two words are anagrams by sorting them, takes O(k * log k) time. This leads to a nested comparison structure, performing an O(k * log k) operation roughly n² times. Therefore, the total time complexity simplifies to O(n² * k * log k).
Space Complexity
O(N * K)The algorithm constructs a new data structure to hold the final groups of anagrams, which in the worst-case scenario (no anagrams) will store a copy of every original string. Let N be the number of strings and K be the maximum length of a string; this output structure will require O(N * K) space. Additionally, to keep track of which strings have already been placed into a group, a boolean array or a similar tracking mechanism of size N is needed. The dominant factor is the output structure, leading to a total auxiliary space complexity of O(N * K).

Optimal Solution

Approach

The key insight is that words which are anagrams of each other will look identical if we rearrange their letters into a standard order. We can use this sorted version of each word as a unique signature to group all the original words that share it.

Here's how the algorithm would work step-by-step:

  1. Imagine you have a collection of empty boxes, each waiting for a specific group of anagrams.
  2. Pick up the first word from the input list.
  3. Rearrange the letters of this word into alphabetical order. This new, sorted word will be the label for one of your boxes.
  4. Find the box with that specific alphabetical label. If it doesn't exist yet, create a new box and put the label on it.
  5. Place the original, unsorted word inside that box.
  6. Repeat this process for every single word in the input list: sort its letters to find its label, then put the original word into the correctly labeled box.
  7. Once you've gone through all the words, the words inside each box will be perfect anagrams of each other. The final result is simply the collection of all these boxes.

Code Implementation

from collections import defaultdict

def group_anagrams(words):
    # This dictionary will serve as our collection of labeled boxes.
    # Using defaultdict simplifies adding the first word for a new anagram group.
    anagram_groups = defaultdict(list)

    for original_word in words:
        # Sorting the letters of a word creates a unique signature for all its anagrams.
        # This signature acts as the label for our boxes.
        sorted_word_tuple = tuple(sorted(original_word))

        # Place the original word into the box corresponding to its sorted letter signature.
        anagram_groups[sorted_word_tuple].append(original_word)

    return list(anagram_groups.values())

Big(O) Analysis

Time Complexity
O(n * k log k)The primary cost comes from processing each of the n words in the input list. For every single word, we must sort its characters to create a unique key. If the maximum length of a word is k, sorting it takes O(k log k) time. Since this sorting operation is performed for all n words, the total time complexity is the product of the number of words and the cost of sorting each one. This results in a total runtime of O(n * k log k).
Space Complexity
O(N * K)The core of this approach is creating a collection of 'boxes' to group the anagrams, which corresponds to an auxiliary hash map or dictionary. In the worst-case scenario, where no words are anagrams of each other, this map will store N unique keys, one for each of the N input words. The values associated with these keys are lists containing the original words, and the space required to store all these words is equivalent to the total number of characters in the input list. Therefore, the space complexity is O(N * K), where N is the number of words and K is the maximum length of a word.

Edge Cases

Input array is empty
How to Handle:
The solution should return an empty list as there are no strings to group.
Input array contains a single string
How to Handle:
The solution should return a list containing a single list with just that string.
Input array contains empty strings
How to Handle:
All empty strings are anagrams of each other and should be grouped together.
All strings in the array are anagrams of each other
How to Handle:
The solution should return a single list containing all the input strings.
No strings in the array are anagrams of each other
How to Handle:
The solution should return a list of lists, where each inner list contains only one string.
Strings contain non-alphabetic characters like numbers or symbols
How to Handle:
The sorting or character counting key generation handles these characters correctly, grouping strings with the same set of characters.
Input contains strings with different cases (e.g., 'Act' and 'cat')
How to Handle:
These are treated as distinct strings because of case sensitivity, so they will not be grouped together unless normalized first.
Very large input array or very long strings
How to Handle:
An efficient O(N*K) solution using a character count map as the key is preferred over an O(N*K*logK) sorting-based solution to avoid performance issues.
0/0 completed