Count Pairs Of Similar Strings Interview Question for Google

You are given a 0-indexed string array words. Two strings are similar if they consist of the same characters. For example, "abca" and "cba" are similar since both consist of characters 'a', 'b', and 'c'. However, "abacba" and "bcfd" are not similar since they do not consist of the same characters. Return the number of pairs (i, j) such that 0 <= i < j <= word.length - 1 and the two strings words[i] and words[j] are similar.

Example 1: Input: words = ["aba","aabb","abcd","bac","aabc"] Output: 2 Explanation: There are 2 pairs that satisfy the conditions:

i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2: Input: words = ["aabb","ab","ba"] Output: 3 Explanation: There are 3 pairs that satisfy the conditions:

i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3: Input: words = ["nba","cba","dba"] Output: 0 Explanation: Since there does not exist any pair that satisfies the conditions, we return 0.

def similar_pairs(words):
    count = 0
    for i in range(len(words)):
        for j in range(i + 1, len(words)):
            if set(words[i]) == set(words[j]):
                count += 1
    return count


# Example Usage
words1 = ["aba","aabb","abcd","bac","aabc"]
print(f"Number of similar pairs in {words1}: {similar_pairs(words1)}")

words2 = ["aabb","ab","ba"]
print(f"Number of similar pairs in {words2}: {similar_pairs(words2)}")

words3 = ["nba","cba","dba"]
print(f"Number of similar pairs in {words3}: {similar_pairs(words3)}")

Naive Solution

The most straightforward approach is to iterate through all possible pairs of words and, for each pair, check if they are similar. This involves comparing the unique characters in each word. The naive solution directly implements this approach.

Optimal Solution

The provided Python code implements the optimal solution. It iterates through all possible pairs of words and checks if the sets of characters in each word are equal. Using sets for comparison is efficient because set comparison has an average time complexity of O(min(len(set1), len(set2))). This avoids nested loops for character-by-character comparison.

Big(O) Run-time Analysis

Outer Loop: The outer loop iterates n times, where n is the number of words in the input list.
Inner Loop: The inner loop iterates from i + 1 to n, so it runs approximately n/2 times on average.
Set Conversion: Creating a set from a word of length m takes O(m) time.
Set Comparison: Comparing two sets takes O(k) time on average, where k is the number of unique characters in the smaller set.

Therefore, the overall time complexity is O(n^2 * m), where n is the number of words and m is the average length of the words. More precisely, each comparison is O(min(len(set(word1)), len(set(word2)))) which is upper bounded by O(m). The nested loops contribute O(n^2).

Big(O) Space Usage Analysis

Sets: For each pair of words, we create two sets to store the unique characters. The space required for each set is O(m), where m is the average length of the words.
Variables: We use a constant amount of extra space for variables like count, i, and j.

Therefore, the overall space complexity is O(m), where m is the average length of the words. This is because, in each iteration of the inner loop, we create two sets with a size proportional to the length of the input words.

Edge Cases

Empty Input List: If the input list words is empty, the code should return 0 because there are no pairs to compare.
List with One Word: If the list contains only one word, the code should return 0 because there are no pairs to form.
Words with Different Lengths: The code correctly handles words of different lengths because it compares the sets of unique characters, not the words themselves.
Words with Repeated Characters: The code correctly handles words with repeated characters because the set() function automatically removes duplicates.
Large Input List: For a very large input list, the O(n^2) time complexity might become a bottleneck. In such cases, consider optimizing the solution using hashing techniques or parallel processing.
Words with Non-ASCII Characters: The code assumes that the words consist of lowercase English letters. If the words contain non-ASCII characters, the character set might need to be adjusted.