Taro Logo

Group Anagrams

Medium
Nvidia logo
Nvidia
2 views
Topics:
ArraysStringsDynamic Programming

You are given an array of strings strs. Your task is to group the anagrams together. Anagrams are defined as strings that contain the same characters, but potentially in a different order. The order in which you return the groups of anagrams does not matter.

Examples:

  1. Input: strs = ["eat","tea","tan","ate","nat","bat"] Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

  2. Input: strs = [""] Output: [[""]]

  3. Input: strs = ["a"] Output: [["a"]]

Constraints:

  • 1 <= strs.length <= 10^4
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Could you implement a function that takes an array of strings as input and returns a list of lists, where each inner list contains strings that are anagrams of each other?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum number of strings in the input array `strs`, and what is the maximum length of each string?
  2. Can the input array `strs` be empty or null? If so, what should I return?
  3. Are all the characters in the strings guaranteed to be lowercase English letters, or could there be other characters?
  4. If there are no anagrams present in the input, should I return an empty list of lists, or something else?
  5. Is the order of the list of anagrams within the outer list important? What about the order of the strings within each anagram group?

Brute Force Solution

Approach

The goal is to group words that are anagrams of each other. The brute force way is to compare each word to every other word to see if they are anagrams. We do this for every single word, without trying to be clever.

Here's how the algorithm would work step-by-step:

  1. Take the first word in the list.
  2. Compare it to every other word in the list to see if they are anagrams.
  3. If you find any anagrams, put them in the same group as the first word.
  4. Move to the next word in the original list that hasn't already been grouped.
  5. Again, compare this word to every other word in the list (that isn't already grouped).
  6. If you find any anagrams, put them in the same group as the current word.
  7. Keep repeating this process for each word until all the words have been placed into groups.

Code Implementation

def group_anagrams_brute_force(words):
    anagram_groups = []
    grouped_indices = set()

    for first_word_index in range(len(words)):
        # Skip if the word has already been grouped
        if first_word_index in grouped_indices:
            continue

        first_word = words[first_word_index]
        current_group = [first_word]
        grouped_indices.add(first_word_index)

        for second_word_index in range(first_word_index + 1, len(words)):
            # Avoid comparing a word to itself or already grouped words
            if second_word_index in grouped_indices:
                continue

            second_word = words[second_word_index]

            # Anagram check via sorted string comparison
            if sorted(first_word) == sorted(second_word):
                current_group.append(second_word)
                grouped_indices.add(second_word_index)

        anagram_groups.append(current_group)

    # Ensure all words have been processed
    return anagram_groups

Big(O) Analysis

Time Complexity
O(n² * k)The algorithm iterates through each of the n words in the input list. For each word, it compares it with every other word in the list to check if they are anagrams. This comparison involves checking if two strings of length k are anagrams which requires O(k) time. Since we perform this comparison for each pair of words, the total number of comparisons is approximately n * n, and each comparison takes O(k) time. Therefore, the overall time complexity is O(n² * k).
Space Complexity
O(N)The algorithm, as described, creates groups to store anagrams. In the worst-case scenario, no two words are anagrams, leading to N groups (where N is the number of words in the input list). Each group would contain a single word. Thus, the auxiliary space required to store these groups is proportional to the number of words in the input list.

Optimal Solution

Approach

The trick is to recognize that anagrams are just rearrangements of the same letters. Instead of comparing each word directly, we can use a clever trick to identify anagrams and group them together efficiently.

Here's how the algorithm would work step-by-step:

  1. For each word, create a signature that represents the unique combination of letters it contains. A simple way to do this is to sort the letters alphabetically.
  2. Use the signature to group the words. All words with the same signature are anagrams of each other.
  3. Store the anagram groups in a way that allows you to quickly retrieve all words with a specific signature.
  4. Finally, present the anagram groups as the result. Each group represents a collection of words that are anagrams of each other.

Code Implementation

def group_anagrams(words):
    anagram_groups = {}

    for word in words:
        # Using sorted string as a signature for anagrams.
        signature = "".join(sorted(word))

        if signature not in anagram_groups:
            # Create a new list if signature not present.
            anagram_groups[signature] = [word]

        else:
            anagram_groups[signature].append(word)

    # Present anagram groups as lists of words.
    return list(anagram_groups.values())

Big(O) Analysis

Time Complexity
O(n * k log k)The dominant operation is iterating through the input list of 'n' strings. For each string, we sort its characters which takes O(k log k) time, where 'k' is the length of the string. Since the sorting step is performed for each of the 'n' strings, the overall time complexity becomes O(n * k log k). The remaining operations like hashing are less significant compared to the cost of sorting.
Space Complexity
O(N)The algorithm uses a hash map to store the sorted signature of each word as a key, and a list of anagrams as the value. In the worst case, all words are distinct and have different signatures, resulting in N keys in the hash map, where N is the number of words in the input. Each key stores a string (the signature), and the associated value stores a list containing the original words, so the total space used by the hash map is proportional to the total length of the sorted word signatures and the original words, which can be considered O(N) in the average case where words are not excessively long. The sorted signatures themselves take O(N) space.

Edge Cases

CaseHow to Handle
Null or empty input arrayReturn an empty list since there are no strings to group.
Array with only one stringReturn a list containing a list with the single input string.
Array with empty stringsEmpty strings are anagrams of each other and should be grouped together.
Array with a large number of strings, many of which are anagramsThe solution should scale efficiently, ideally with O(n*klogk) time complexity where n is the number of strings and k is the maximum length of a string, using a hashmap for grouping.
Array where all strings are identicalAll identical strings are anagrams of each other and should be grouped into a single list.
Array with strings containing non-lowercase English lettersThe problem specifies lowercase English letters, so handle invalid characters gracefully (e.g., throw an exception or preprocess to remove/replace them if the prompt permits it).
Strings with very large lengths that could lead to integer overflow if sorting by character countsSort strings by character representation instead of relying on character counts for the key to avoid integer overflow issues.
Strings with same characters but different lengthsThese should not be grouped together as anagrams.