Taro Logo

Count Pairs Of Similar Strings

Easy
Google logo
Google
Topics:
ArraysStrings

You are given a 0-indexed string array words.

Two strings are similar if they consist of the same characters.

  • For example, "abca" and "cba" are similar since both consist of characters 'a', 'b', and 'c'. However, "abacba" and "bcfd" are not similar since they do not consist of the same characters.

Return the number of pairs (i, j) such that 0 <= i < j <= word.length - 1 and the two strings words[i] and words[j] are similar.

Example 1:

Input: words = ["aba","aabb","abcd","bac","aabc"]
Output: 2
Explanation: There are 2 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
- i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2:

Input: words = ["aabb","ab","ba"]
Output: 3
Explanation: There are 3 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
- i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
- i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3:

Input: words = ["nba","cba","dba"]
Output: 0
Explanation: Since there does not exist any pair that satisfies the conditions, we return 0.

Constraints:

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • words[i] consist of only lowercase English letters.

Solution


Naive Approach

A brute-force approach involves iterating through all possible pairs of words and, for each pair, checking if they are similar. To check for similarity, we can convert each word into a set of characters and then compare if the sets are equal.

Algorithm:

  1. Iterate through all possible pairs (i, j) where 0 <= i < j < words.length.
  2. For each pair, create sets of characters for words[i] and words[j].
  3. Compare the two sets. If they are equal, increment the count of similar pairs.
  4. Return the total count of similar pairs.

Code:

def similar_pairs_brute_force(words):
    count = 0
    n = len(words)
    for i in range(n):
        for j in range(i + 1, n):
            if set(words[i]) == set(words[j]):
                count += 1
    return count

Time Complexity:

  • O(N^2 * M), where N is the number of words and M is the maximum length of a word. The nested loops iterate through all pairs (O(N^2)), and creating a set from a word takes O(M) time.

Space Complexity:

  • O(M), where M is the maximum length of a word. This is the space required to store the sets of characters.

Optimal Approach

To optimize, we can precompute the set of characters for each word and store it in a canonical form (e.g., a sorted string of unique characters). Then, we can use a hash map to count the occurrences of each canonical form. Finally, we iterate through the hash map and count the number of pairs for each canonical form.

Algorithm:

  1. Create a hash map (dictionary) to store the count of each canonical form.
  2. Iterate through the words array. a. Convert each word into a set of characters. b. Convert the set into a sorted string (canonical form). c. Update the count of the canonical form in the hash map.
  3. Iterate through the hash map. a. For each canonical form, calculate the number of pairs using the formula n * (n - 1) / 2, where n is the count of the canonical form. b. Add the number of pairs to the total count.
  4. Return the total count of similar pairs.

Code:

def similar_pairs_optimal(words):
    counts = {}
    for word in words:
        canonical = "".join(sorted(set(word)))
        counts[canonical] = counts.get(canonical, 0) + 1
    
    total_pairs = 0
    for count in counts.values():
        total_pairs += count * (count - 1) // 2
    
    return total_pairs

Time Complexity:

  • O(N * M * log(M)), where N is the number of words and M is the maximum length of a word. Creating the canonical form involves creating a set (O(M)), sorting the set (O(M * log(M))), and joining the characters (O(M)). The rest of the operations are O(N) for calculating totals.

Space Complexity:

  • O(N * M), where N is the number of words and M is the maximum length of a word. This is the space required to store the hash map, which can store up to N different canonical forms, each with a length of up to M.

Edge Cases

  1. Empty Input: If the input array words is empty, return 0.
  2. Single Word: If the input array contains only one word, return 0.
  3. Words with No Common Characters: The algorithm should correctly handle cases where no two words are similar.
  4. Duplicate Words: The algorithm should correctly count pairs even if there are duplicate words.