Taro Logo

Count Prefix and Suffix Pairs II

Hard
Uber logo
Uber
3 views
Topics:
ArraysStringsTwo Pointers

You are given a 0-indexed string array words.

Let's define a boolean function isPrefixAndSuffix that takes two strings, str1 and str2:

  • isPrefixAndSuffix(str1, str2) returns true if str1 is both a prefix and a suffix of str2, and false otherwise.

For example, isPrefixAndSuffix("aba", "ababa") is true because "aba" is a prefix of "ababa" and also a suffix, but isPrefixAndSuffix("abc", "abcd") is false.

Return an integer denoting the number of index pairs (i, j) such that i < j, and isPrefixAndSuffix(words[i], words[j]) is true.

Example 1:

Input: words = ["a","aba","ababa","aa"]
Output: 4

Example 2:

Input: words = ["pa","papa","ma","mama"]
Output: 2

Example 3:

Input: words = ["abab","ab"]
Output: 0

Write a function to solve this problem.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the maximum possible lengths for individual words in the input array `words`?
  2. Can the input array `words` contain empty strings or null values?
  3. If no pairs (i, j) satisfy the condition, what should the function return?
  4. Are the indices i and j allowed to be the same (i.e., can a word be both its own prefix and suffix when i == j)?
  5. Does the order of the pairs (i, j) matter? Should I only count (i, j) if i < j, or should I also count (j, i) separately if it also satisfies the condition?

Brute Force Solution

Approach

The brute force strategy involves checking every possible pair of words to see if they fit the specified criteria. We will compare each word to every other word in the list. If a pair of words meets the criteria, we increment a counter.

Here's how the algorithm would work step-by-step:

  1. Take the first word from the list.
  2. Compare this first word with every other word in the list, including itself.
  3. For each comparison, check if the first word is a prefix of the second word AND the first word is a suffix of the second word.
  4. If both conditions are true, then this pair of words counts as a valid pair. Remember to count it.
  5. Move on to the next word in the list and repeat the process, comparing it with every other word.
  6. Continue this process until every word in the list has been compared with every other word.
  7. Finally, add up all the valid pairs you've counted. This gives the total number of prefix and suffix pairs.

Code Implementation

def count_prefix_suffix_pairs_brute_force(words):
    total_valid_pairs = 0

    for first_word_index in range(len(words)):
        for second_word_index in range(len(words)):
            first_word = words[first_word_index]
            second_word = words[second_word_index]

            # Check if the first word is a prefix of the second word
            is_prefix = second_word.startswith(first_word)

            # Check if the first word is a suffix of the second word
            is_suffix = second_word.endswith(first_word)

            # Only count pairs where first is both a prefix and suffix of the second
            if is_prefix and is_suffix:
                total_valid_pairs += 1

    return total_valid_pairs

Big(O) Analysis

Time Complexity
O(n²)The provided brute force solution involves nested loops. The outer loop iterates through each of the n words in the input list. The inner loop, for each word, compares it with every other word in the list, resulting in n comparisons. Thus, for each of the n words, we perform n prefix and suffix checks. This leads to n * n operations, hence the time complexity is O(n²).
Space Complexity
O(1)The provided brute force algorithm iterates through the input list using nested loops, comparing each word with every other word. It only uses a constant number of integer variables for loop indices and counting valid pairs. No additional data structures, such as auxiliary arrays or hash maps, are created that scale with the input size N (number of words in the list). Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The goal is to efficiently count matching prefix and suffix pairs within a large set of words. Instead of directly comparing every word with every other word, we use a Trie data structure to quickly identify potential matches based on prefixes and suffixes.

Here's how the algorithm would work step-by-step:

  1. First, create a special data structure called a Trie (pronounced 'try') to store all the words. Think of it like a tree where each branch represents a character in a word.
  2. For each word in the input, add it to the Trie twice: once in the forward direction to represent its prefix, and once in the reverse direction to represent its suffix.
  3. Now, for each word, look up both its prefix and suffix within the Trie.
  4. When looking up the prefix, count how many words in the Trie have that prefix. When looking up the suffix, count how many words have that suffix.
  5. Be careful not to double-count pairs where the prefix and suffix are the same word! If you find the exact same word being counted, reduce the total number of pairs by one.
  6. Sum up all the counts you found during the prefix and suffix lookups to arrive at the final answer. This approach avoids unnecessary comparisons and leverages the Trie to quickly find potential matches.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.count = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for character in word:
            if character not in node.children:
                node.children[character] = TrieNode()
            node = node.children[character]
        node.count += 1

    def count_words_with_prefix(self, word):
        node = self.root
        for character in word:
            if character not in node.children:
                return 0
            node = node.children[character]
        return node.count

def count_prefix_suffix_pairs(words):
    trie = Trie()
    for word in words:
        trie.insert(word)
        trie.insert(word[::-1])

    total_pairs = 0
    for word in words:
        prefix_count = trie.count_words_with_prefix(word)
        suffix_count = trie.count_words_with_prefix(word[::-1])

        # This avoids overcounting pairs.
        if word == word[::-1]:
            total_pairs += (prefix_count + suffix_count - 1)
        else:
            total_pairs += (prefix_count + suffix_count)

    # Divide by two because we inserted the words twice (forward and reversed).
    return total_pairs // 2

Big(O) Analysis

Time Complexity
O(L*n)The algorithm iterates through each of the n words in the input list. For each word, it inserts the word into the Trie (for both prefix and suffix), which takes O(L) time, where L is the length of the word. Then, for each word, it searches for the prefix and suffix in the Trie, each search also taking O(L) time. Therefore, the total time complexity is O(n * (L + L + L)), which simplifies to O(L*n), assuming L is the average length of the words. While L can be at most the maximum length of a word, and in the worst case can be equal to n, we can consider it as a constant when the average word length is significantly smaller than n.
Space Complexity
O(N*L)The dominant space complexity comes from the Trie data structure. Each of the N words is inserted into the Trie, both in forward and reverse order. The Trie nodes store characters, and in the worst case, each word will create a path of length L (where L is the maximum length of a word) in the Trie, leading to a space complexity proportional to the total number of characters stored in the Trie which will be 2*N*L (since each word is stored twice). Therefore, the overall space complexity is O(N*L).

Edge Cases

CaseHow to Handle
Empty words arrayReturn 0 since no pairs are possible.
Words array with only one elementReturn 0 since a pair requires at least two elements.
All words in the array are identicalThe solution must correctly count all possible pairs (i, j) where i != j given all words are equal.
Long strings within words arrayThe solution's prefix/suffix check must handle long strings efficiently, potentially using early termination if a mismatch is found early.
Words array with a single character stringThe solution must handle the special case where a single character string could be both a prefix and suffix to other strings, and needs to avoid integer overflow.
Maximum array size constraints are hit and the solution has O(N^2) complexityConsider utilizing Tries/Prefix Trees or Hash Maps for better time complexity such as O(N*K) where K is the average length of the strings.
Words containing unicode charactersEnsure string comparison handles Unicode correctly depending on the underlying language's string implementation.
The same prefix/suffix string exists multiple timesUse hash map to count the frequencies of prefixes and suffixes, and calculate the number of valid pairs based on these frequencies.