Taro Logo

Group Anagrams

Medium
Visa logo
Visa
2 views
Topics:
ArraysStrings

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]

Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation:

  • There is no string in strs that can be rearranged to form "bat".
  • The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.
  • The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [""]

Output: [[""]]

Example 3:

Input: strs = ["a"]

Output: [["a"]]

Constraints:

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input list `strs` be empty or null? What should I return in those cases?
  2. What is the maximum length of a string within the `strs` array? Is there a reasonable upper bound on the character set (e.g., only lowercase English letters)?
  3. Are the strings in the input array guaranteed to be non-null and valid?
  4. Is the order of the list of lists I return important, and within each inner list, is the order of the strings important?
  5. If there are strings in the input that are not anagrams of any other string, should they be in their own list within the returned list of lists?

Brute Force Solution

Approach

The goal is to group words together that are anagrams of each other (meaning they have the same letters but in a different order). The brute force method involves comparing each word to every other word in the list. We check to see if they are anagrams and group them accordingly.

Here's how the algorithm would work step-by-step:

  1. Take the first word in the list.
  2. Compare it to every other word in the list.
  3. For each comparison, check if the two words are anagrams by seeing if they have the exact same letters, even if rearranged.
  4. If they are anagrams, put both words in the same group.
  5. Repeat steps 2-4 for every other word in the list, adding them to existing groups if they are anagrams or creating a new group if they don't match any existing group.
  6. At the end, you'll have a collection of groups, where each group contains words that are anagrams of each other.

Code Implementation

def group_anagrams_brute_force(words):
    anagram_groups = []

    for current_word in words:
        added_to_group = False

        # Iterate through existing groups to find a match
        for group in anagram_groups:
            first_word_in_group = group[0]
            if are_anagrams(current_word, first_word_in_group):
                group.append(current_word)
                added_to_group = True
                break

        # Create a new group if no match found
        if not added_to_group:
            anagram_groups.append([current_word])

    return anagram_groups

def are_anagrams(first_word, second_word):
    # Anagrams must have the same length
    if len(first_word) != len(second_word):
        return False

    first_word_chars = sorted(first_word)
    second_word_chars = sorted(second_word)

    # Sort and compare to check for anagrams
    return first_word_chars == second_word_chars

Big(O) Analysis

Time Complexity
O(n² * k)We iterate through each of the n words in the input list. For each word, we compare it with every other word, resulting in a nested loop. The outer loop runs n times. For each inner loop iteration (comparing two words), we need to determine if the two words are anagrams. Let's assume the average length of each word is k. Comparing the words to check for anagrams will take O(k) time. Therefore the total time complexity is O(n * n * k), which simplifies to O(n² * k).
Space Complexity
O(N^2)The brute force method involves comparing each word to every other word. In the worst case, we may need to store each word in its own group. These groups are stored in a data structure (e.g., a list of lists) to hold the anagram groups. In the worst-case scenario where no words are anagrams, we would have N groups, each potentially storing a single word, resulting in storing approximately N words across N groups. Moreover, when comparing each word, we need to check if they are anagrams, which can involve creating a sorted version or a character count, which, in the worst case, would involve creating temporary data structures on the order of N for each comparison, in the worst case leading to N comparisons, which will drive the space to O(N^2) due to storing the intermediate results and final anagram groups.

Optimal Solution

Approach

The most efficient way to group anagrams is to recognize that anagrams have the same letters, just rearranged. We can use this to quickly identify which words belong together without doing many comparisons. The idea is to create a unique identifier for each anagram group.

Here's how the algorithm would work step-by-step:

  1. For each word, create a 'signature' or 'key' that represents its letters, regardless of their order. One way to do this is to sort the letters alphabetically.
  2. Use the 'signature' as a way to group the words. Words with the same signature are anagrams of each other.
  3. Create a collection (like a box or a bag) for each unique signature you find.
  4. As you go through the list of words, put each word into the box labeled with its signature.
  5. In the end, each box will contain a group of anagrams.

Code Implementation

def group_anagrams(list_of_strings):
    anagram_groups = {}

    for current_string in list_of_strings:
        # Create a sorted signature for the word.
        sorted_string = "".join(sorted(current_string))

        # Use the sorted signature as a key.
        if sorted_string in anagram_groups:
            anagram_groups[sorted_string].append(current_string)
        else:
            # Create a new list if the signature is not present
            anagram_groups[sorted_string] = [current_string]

    # The result should be a list of lists
    return list(anagram_groups.values())

Big(O) Analysis

Time Complexity
O(n * k log k)The algorithm iterates through each of the n words in the input list. For each word, it sorts the characters (k characters on average), which takes O(k log k) time. The sorted string is then used as a key in a hash map, which takes O(1) on average. Therefore, the dominant operation is sorting each word, resulting in a total time complexity of O(n * k log k), where n is the number of words and k is the average length of the words.
Space Complexity
O(N)The algorithm uses a hash map (or dictionary) to store the 'signature' of each word as a key, and a list of anagrams corresponding to that signature as the value. In the worst-case scenario, all words in the input list are unique (no anagrams), resulting in N unique signatures stored as keys in the hash map. Each key will have an associated list, and in this worst case, each list will contain only one word. Therefore, the space used by the hash map grows linearly with the number of words in the input list.

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn an empty list to indicate no anagrams are present.
Input array with a single stringReturn a list containing a list with that single string, as it forms a group of size one.
Input array with all identical stringsThe algorithm should group all identical strings into a single list.
Input array with empty stringsEmpty strings are anagrams of each other and should be grouped together.
Input array with strings of varying lengthsThe algorithm should correctly identify and separate strings that are not anagrams due to different lengths.
Strings containing non-alphabetic characters (e.g., spaces, punctuation)The algorithm should either ignore non-alphabetic characters or handle them appropriately based on the problem's requirements, possibly by preprocessing.
Large input array causing potential memory issuesThe algorithm should be efficient in terms of memory usage to handle large inputs without exceeding memory limits.
Anagrams with Unicode charactersThe solution should handle Unicode characters correctly if the input strings can contain them; a suitable sorting or hashing approach must be used.