Taro Logo

Count Prefix and Suffix Pairs I

Easy
Uber logo
Uber
0 views
Topics:
Strings

You are given a 0-indexed string array words.

Let's define a boolean function isPrefixAndSuffix that takes two strings, str1 and str2:

  • isPrefixAndSuffix(str1, str2) returns true if str1 is both a prefix and a suffix of str2, and false otherwise.

For example, isPrefixAndSuffix("aba", "ababa") is true because "aba" is a prefix of "ababa" and also a suffix, but isPrefixAndSuffix("abc", "abcd") is false.

Return an integer denoting the number of index pairs (i, j) such that i < j, and isPrefixAndSuffix(words[i], words[j]) is true.

For example:

words = ["a","aba","ababa","aa"]

Output: 4

Explanation:

The counted index pairs are:

i = 0 and j = 1 because isPrefixAndSuffix("a", "aba") is true. i = 0 and j = 2 because isPrefixAndSuffix("a", "ababa") is true. i = 0 and j = 3 because isPrefixAndSuffix("a", "aa") is true. i = 1 and j = 2 because isPrefixAndSuffix("aba", "ababa") is true.

Therefore, the answer is 4.

Write a function to solve this problem. What is the time and space complexity?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of any string in the input array, and what is the maximum number of strings in the array?
  2. Can the strings in the input array contain any special characters, or are they limited to alphanumeric characters?
  3. Is a string considered both a prefix and a suffix of itself when checking for pairs?
  4. If no pairs exist that satisfy the prefix and suffix conditions, what should the function return?
  5. Are the strings case-sensitive when comparing prefixes and suffixes?

Brute Force Solution

Approach

The brute force approach to counting prefix and suffix pairs involves checking every possible pair of words in the given list. For each pair, we'll determine if the first word is both a prefix of the second and a suffix of the second.

Here's how the algorithm would work step-by-step:

  1. Take the first word in the list.
  2. Compare it with every other word in the list, one at a time.
  3. For each comparison, check if the first word appears at the very beginning of the second word.
  4. Then check if the first word also appears at the very end of the second word.
  5. If both conditions are true, we have found a prefix and suffix pair, so we increment a counter.
  6. Repeat this process, moving on to the second word in the original list, and comparing it to every other word in the list.
  7. Continue until every word in the list has been compared to every other word.
  8. The final value of the counter will be the total number of prefix and suffix pairs.

Code Implementation

def count_prefix_suffix_pairs_brute_force(words):
    pair_count = 0
    number_of_words = len(words)

    for first_word_index in range(number_of_words):
        for second_word_index in range(number_of_words):
            first_word = words[first_word_index]
            second_word = words[second_word_index]

            # Check if the first word is a prefix of the second word
            if second_word.startswith(first_word):

                # Check if the first word is a suffix of the second word
                if second_word.endswith(first_word):

                    pair_count += 1

    return pair_count

Big(O) Analysis

Time Complexity
O(n³)The solution iterates through each of the n words in the input list. For each word, it compares it with every other word in the list, resulting in a nested loop structure of order n*n. Within the inner loop, the prefix and suffix checks involve string comparisons, each taking up to O(n) time in the worst case where we are comparing strings of similar lengths, or when we need to iterate through the entire length of a word. Therefore, the overall time complexity is O(n * n * n), which simplifies to O(n³).
Space Complexity
O(1)The provided algorithm iterates through pairs of words but does not use any auxiliary data structures that scale with the input size N (the number of words in the list). It only uses a constant amount of extra space to store a counter and potentially a few index variables. Therefore, the space complexity is constant.

Optimal Solution

Approach

The optimal approach focuses on efficiently determining pairs of words where one is both a prefix and suffix of the other. We avoid unnecessary comparisons by pre-calculating and storing the frequency of each word. This allows us to quickly identify and count matching prefix-suffix pairs.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each unique word appears in the list of words.
  2. Then, for each word in the original list, compare it against all the *shorter* words to see if the shorter word is both a prefix and a suffix of the longer one.
  3. If a shorter word is both a prefix and a suffix of the current word, add the number of times that shorter word appears to our total count. This is because each instance of the shorter word makes a valid pair with the current word.
  4. Finally, if a word is both a prefix and suffix of itself (meaning it's the same at the beginning and end), and we have more than one of them, we need to account for the combinations. We can do this by figuring out how many ways to pick two of the same word and add that to the total.
  5. The final total will be the answer to the question.

Code Implementation

def count_prefix_suffix_pairs(words):
    word_counts = {}
    for word in words:
        word_counts[word] = word_counts.get(word, 0) + 1

    total_pairs = 0
    for current_word in words:
        for shorter_word in word_counts:
            # Only compare against shorter words
            if len(shorter_word) <= len(current_word):
                if current_word.startswith(shorter_word) and current_word.endswith(shorter_word):
                    total_pairs += word_counts[shorter_word]

    # Adjust count if a word is both prefix and suffix of itself.
    for word, count in word_counts.items():
        if count > 1:
            total_pairs += count * (count - 1) // 2

    return total_pairs

Big(O) Analysis

Time Complexity
O(n*m)First, counting the frequency of each word takes O(n*k) where n is the number of words and k is the average length of a word which we will assume is much less than n and can be disregarded. The outer loop iterates through each of the n words. The inner loop iterates through all shorter words. In the worst case, we might compare each word to all the other words, meaning we iterate at most n times. For each comparison, we check if one word is both a prefix and suffix of the other which takes at most m operations, where m is the maximum length of a word in the input list. Therefore, the total time complexity approximates to O(n*m).
Space Complexity
O(N)The primary space usage comes from storing the frequency of each unique word in a hash map. In the worst case, all N words in the input list are unique, leading to a hash map storing N key-value pairs. Therefore, the auxiliary space is proportional to the number of words, N. Other variables used have constant space and do not dominate the overall space complexity.

Edge Cases

CaseHow to Handle
Empty input arrayReturn 0 since there are no prefix/suffix pairs.
Array with a single stringReturn 0 since a pair requires two strings.
Very large input array (N > 100) to test for time complexity issuesEnsure algorithm is O(N^2) or better, avoiding brute-force comparison.
Strings of very large length to check for string comparison efficiencyUtilize efficient string comparison methods to avoid performance bottlenecks.
All strings in the input array are identicalThe algorithm should correctly count all valid pairs without overcounting.
Strings are prefixes or suffixes of each other, but not equalThe prefix and suffix checks must be accurate to identify genuine pairs.
Strings containing special characters or UnicodeEnsure the comparison function correctly handles all character types.
Strings are empty stringsCarefully handle comparisons involving empty strings to avoid unexpected behavior.