Sum of Prefix Scores of Strings

Hard

Asked by:

90 views

Topics:

ArraysStrings

You are given an array words of size n consisting of non-empty strings.

We define the score of a string term as the number of strings words[i] such that term is a prefix of words[i].

For example, if words = ["a", "ab", "abc", "cab"], then the score of "ab" is 2, since "ab" is a prefix of both "ab" and "abc".

Return an array answer of size n where answer[i] is the sum of scores of every non-empty prefix of words[i].

Note that a string is considered as a prefix of itself.

Example 1:

Input: words = ["abc","ab","bc","b"]
Output: [5,4,3,2]
Explanation: The answer for each string is the following:
- "abc" has 3 prefixes: "a", "ab", and "abc".
- There are 2 strings with the prefix "a", 2 strings with the prefix "ab", and 1 string with the prefix "abc".
The total is answer[0] = 2 + 2 + 1 = 5.
- "ab" has 2 prefixes: "a" and "ab".
- There are 2 strings with the prefix "a", and 2 strings with the prefix "ab".
The total is answer[1] = 2 + 2 = 4.
- "bc" has 2 prefixes: "b" and "bc".
- There are 2 strings with the prefix "b", and 1 string with the prefix "bc".
The total is answer[2] = 2 + 1 = 3.
- "b" has 1 prefix: "b".
- There are 2 strings with the prefix "b".
The total is answer[3] = 2.

Example 2:

Input: words = ["abcd"]
Output: [4]
Explanation:
"abcd" has 4 prefixes: "a", "ab", "abc", and "abcd".
Each prefix has a score of one, so the total is answer[0] = 1 + 1 + 1 + 1 = 4.

Constraints:

1 <= words.length <= 1000
1 <= words[i].length <= 1000
words[i] consists of lowercase English letters.

Try coding it in LeetCode

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What is the maximum length of a single word in the input array, and what is the maximum number of words in the array?
Can the input array `words` be empty or contain null/empty strings?
Are the input strings case-sensitive? For example, should 'abc' and 'Abc' be considered different prefixes?
Should the prefix score for a given word include the word itself if it's a prefix of other words in the input?
Is there a specific character set I should expect in the input strings (e.g., ASCII, Unicode)? This could affect data structures and operations.

Brute Force Solution

Approach

To find the total 'score' of prefixes for a list of words, the brute force method calculates the score for each word one at a time. For each word, it will look at all the prefixes (beginnings) of that word and check if those prefixes appear as prefixes of any other words in the list.

Here's how the algorithm would work step-by-step:

Take the first word in the list.
Look at the very beginning of this word, just the first letter.
Now, go through every other word in the list and see if *they* start with that same letter.
If another word *does* start with that letter, it counts as one point for the prefix.
Keep track of the total points for this one-letter prefix.
Next, look at the first two letters of the first word. This is a longer prefix.
Again, go through every other word in the list and check if *they* start with those same two letters.
Add to the total points for this two-letter prefix for every match.
Continue doing this, making the prefix longer and longer, until you've looked at the entire first word and found all its prefix scores.
Add all prefix scores for the first word together.
Now, repeat this entire process for the second word, then the third word, and so on, until you have calculated the sum of prefix scores for every single word in the list.
Finally, add up the prefix score sums of all words to get the final result.

Code Implementation

def sum_prefix_scores_brute_force(words):
    total_score = 0
    for current_word in words:
        word_score = 0
        for prefix_length in range(1, len(current_word) + 1):
            prefix = current_word[:prefix_length]
            prefix_count = 0

            # We iterate through all other words to find matches
            for other_word in words:
                if other_word.startswith(prefix):

                    prefix_count += 1
            word_score += prefix_count

        total_score += word_score

    # Returning the final score
    return total_score

Big(O) Analysis

Time Complexity

O(n*m^2) – Let n be the number of words in the input list, and m be the maximum length of a word in the list. For each of the n words, we iterate through all possible prefixes, which takes m steps. For each prefix of a word, we iterate through all n words again to compare if each word starts with the considered prefix which takes m time in the worst case. Therefore, the time complexity is proportional to n * m * (n * m/2) which simplifies to O(n*m^2) assuming m is smaller or equal to n.

Space Complexity

O(1) – The brute force approach iterates through the input list of words and, for each word, iterates through its prefixes and then iterates through the list of words again to find matches. While it uses several nested loops, it does not create any auxiliary data structures like arrays, hash maps, or trees to store intermediate results. It only uses a few integer variables to track counts and indices, whose space remains constant regardless of the input size N (number of words). Therefore, the auxiliary space complexity is O(1).

Optimal Solution

Approach

The efficient solution uses a special tree-like structure to keep track of all the prefixes of the words. It smartly adds up the counts of how many words share a prefix, allowing us to quickly calculate the score for each word.

Here's how the algorithm would work step-by-step:

Create a special structure called a tree. Each part of the tree will represent a letter, and following a path from the top of the tree will spell out a prefix.
For each word, go through each of its prefixes (like the first letter, the first two letters, etc.) and follow the corresponding path in the tree.
Each time you pass a part of the tree while following a prefix, increase a counter on that part. This counter will tell you how many words share that prefix.
Now, for each word, follow its prefixes through the tree again. This time, instead of increasing the counter, add the counter's current value to a total score for that word.
The total score for each word will then be the sum of the counts of all the prefixes it has in common with other words.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.count = 0

def sum_prefix_scores(words):
    root = TrieNode()
    
    # Populate the trie and count prefix occurrences.
    for word in words:
        node = root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.count += 1

    result = []

    # Calculate the prefix score for each word.
    for word in words:
        score_for_word = 0
        node = root
        for char in word:
            node = node.children[char]
            score_for_word += node.count
        result.append(score_for_word)

    return result

Big(O) Analysis

Time Complexity

O(N * L) – Let N be the number of words and L be the maximum length of a word. Building the Trie involves iterating through each word and, for each word, iterating through its characters to insert it into the Trie, contributing O(L) per word. Therefore, building the Trie takes O(N * L). Calculating the prefix scores similarly involves traversing the Trie for each word, which also takes O(L) per word, resulting in O(N * L). Consequently, the overall time complexity is dominated by Trie construction and score calculation, both taking O(N * L).

Space Complexity

O(N*L) – The primary auxiliary space is used by the tree (Trie) data structure. In the worst case, where all words have unique prefixes, each character of each word will result in a new node in the tree. Therefore, the space required is proportional to the sum of the lengths of all words, where N is the number of words and L is the average length of the words. This leads to a space complexity of O(N*L) for storing the Trie structure. No significant additional space is used apart from the Trie itself.

Edge Cases

Case	How to Handle
Empty input array	Return an empty array if the input array is empty.
Input array contains null or empty strings	Handle null or empty strings gracefully, either by skipping them or treating them as prefixes (depending on the specific requirements).
All words in the input array are identical	The prefix score for each word should be the length of the input array.
One word is a prefix of many other words	The prefix score for the shorter word will be high, test the prefix calculation handles it efficiently.
Input array contains very long strings	Ensure the solution doesn't have excessive memory usage or string manipulation overhead for very long strings to prevent potential out-of-memory issues.
Large input array size	Use a data structure (like a Trie) that optimizes prefix counting for large input sizes to improve efficiency.
Strings with non-alphanumeric characters or special characters.	Handle these characters correctly in prefix calculations according to the requirements, such as filtering or escaping.
Case sensitivity	Clarify if prefix matching is case-sensitive or case-insensitive, and handle strings accordingly by converting to lowercase or uppercase.

Empty input array

How to Handle:

Return an empty array if the input array is empty.

Input array contains null or empty strings

How to Handle:

Handle null or empty strings gracefully, either by skipping them or treating them as prefixes (depending on the specific requirements).

All words in the input array are identical

How to Handle:

The prefix score for each word should be the length of the input array.

One word is a prefix of many other words

How to Handle:

The prefix score for the shorter word will be high, test the prefix calculation handles it efficiently.

Input array contains very long strings

How to Handle:

Ensure the solution doesn't have excessive memory usage or string manipulation overhead for very long strings to prevent potential out-of-memory issues.

Large input array size

How to Handle:

Use a data structure (like a Trie) that optimizes prefix counting for large input sizes to improve efficiency.

Strings with non-alphanumeric characters or special characters.

How to Handle:

Handle these characters correctly in prefix calculations according to the requirements, such as filtering or escaping.

Case sensitivity

How to Handle:

Clarify if prefix matching is case-sensitive or case-insensitive, and handle strings accordingly by converting to lowercase or uppercase.