Taro Logo

Longest Word With All Prefixes

Medium
Google logo
Google
1 view
Topics:
StringsTrees

Question

Given a list of strings words, write a function to find the longest word in words such that every prefix of the word is also in words. If there are multiple such words, return the lexicographically smallest one. If there is no such word, return an empty string.

Clarifications:

  • The input words will only contain lowercase English letters.
  • The length of the input list words can vary.

Examples:

  1. Input: words = ["w", "wo", "wor", "worl", "world"] Output: "world" Explanation: The word "world" can be built one character at a time by "w", "wo", "wor", and "worl", which are all in the input words.

  2. Input: words = ["a", "banana", "app", "appl", "ap", "apply", "apple"] Output: "apple" Explanation: Both "apple" and "apply" can be built from other words in the list. "apple" is lexicographically smaller than "apply".

  3. Input: words = ["m", "mo", "moc", "moch", "mocha", "l", "la", "lac", "lack"] Output: "mocha"

  4. Input: words = ["k", "ki", "kir", "kira", "kiran"] Output: "kiran"

  5. Input: words = ["a", "b", "c"] Output: "" Explanation: None of the words have prefixes that are present in the input.

Develop an algorithm that is efficient in terms of both time and space complexity. Consider using appropriate data structures to optimize your solution.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible lengths for the words in the input list? Is there a maximum length?
  2. Can the input list contain null or empty strings? If so, how should I handle them?
  3. If there are multiple words that satisfy the conditions (longest word with all prefixes present), should I return any one of them, or is there a specific word I should prioritize (e.g., lexicographically smallest)?
  4. Is the input list guaranteed to be sorted in any way, or can the words appear in any order?
  5. If no such word exists in the input list (i.e., no word has all its prefixes also present in the list), what should I return? Should I return null, an empty string, or throw an exception?

Brute Force Solution

Approach

We want to find the longest word that can be built one letter at a time from other words in our list. The brute force approach checks every single word to see if its prefixes are also words in the list. If all prefixes of a word exist in the list, we consider that word as a candidate.

Here's how the algorithm would work step-by-step:

  1. Take each word from the list, one at a time.
  2. For each word, examine its prefixes - these are the beginning portions of the word (for example, if the word is 'apple', the prefixes are 'a', 'ap', 'app', 'appl').
  3. Check if each prefix we found is also present as a separate word in the original list.
  4. If every single prefix of a word is found in the list, that word is a valid candidate.
  5. Keep track of all the valid candidate words we find.
  6. After checking every word in the original list, compare all the valid candidate words. The longest one is the answer.

Code Implementation

def longest_word_with_all_prefixes(word_list):
    longest_word = ''
    valid_candidate_words = []

    for current_word in word_list:
        is_candidate = True
        # Iterate through prefixes of the current word.

        for i in range(1, len(current_word) + 1):
            prefix = current_word[:i]
            # Check if the current prefix exists in the word list.

            if prefix not in word_list:
                is_candidate = False
                break

        # Add the current word to the candidates if all prefixes exist
        if is_candidate:
            valid_candidate_words.append(current_word)

    # Determine the longest word from the valid candidates
    for candidate_word in valid_candidate_words:
        if len(candidate_word) > len(longest_word):
            longest_word = candidate_word
        elif len(candidate_word) == len(longest_word) and candidate_word < longest_word:
            # Choose the lexicographically smaller word if lengths are the same
            longest_word = candidate_word

    return longest_word

Big(O) Analysis

Time Complexity
O(n*m^2)The algorithm iterates through each of the n words in the input list. For each word, it generates all possible prefixes. In the worst case, a word can have a length of m, leading to m prefixes. For each of these m prefixes, the algorithm searches the original list of n words to see if that prefix exists. Each prefix search takes O(m) time because we are comparing strings of length at most m. Therefore, for one word, the complexity is m * m. For all n words, the complexity becomes O(n*m^2).
Space Complexity
O(N)The algorithm keeps track of all valid candidate words. In the worst-case scenario, all N words in the input list could be valid candidates, requiring us to store each of them. Storing these candidate words consumes auxiliary space proportional to the number of words in the input list. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

To find the longest word made from other words in a list, we build a special data structure that lets us quickly check if a word is valid. This helps us efficiently explore potential words without wasting time on invalid ones.

Here's how the algorithm would work step-by-step:

  1. First, put all the words into a special container designed for quickly checking prefixes.
  2. Sort all the words by length, starting with the longest.
  3. Go through each word, starting with the longest.
  4. For each word, check if all its prefixes are also in the container we created. This tells us if the word can be built from other words in the list.
  5. If we find a word where all its prefixes are also present, that's the longest word we are looking for. Return it.
  6. If you checked all the words and didn't find one with all prefixes present, then return an empty word.

Code Implementation

def find_longest_word_with_all_prefixes(word_list):
    prefix_set = set(word_list)

    # Sort the words by length in descending order.
    word_list.sort(key=len, reverse=True)

    for current_word in word_list:
        is_valid = True
        # Check if all prefixes of the current word are in the set.
        for i in range(1, len(current_word)): 
            prefix = current_word[:i]

            if prefix not in prefix_set:
                is_valid = False
                break

        # If all prefixes are present, return the word.
        if is_valid:
            return current_word

    return ""

Big(O) Analysis

Time Complexity
O(n log n)First, inserting all n words into the Trie data structure takes O(n * k) time where k is the average length of the words. However, this operation is dominated by the sorting and prefix checking. Sorting the words by length takes O(n log n) time. Then, for each word, we iterate through its prefixes, and check for their presence in the Trie. The prefix checking operation for each word takes O(k) time, where k is the length of the word, and this is done for at most n words. The maximum length of a word is limited, so we can consider this constant. Therefore the complexity is dominated by O(n log n) for sorting the input.
Space Complexity
O(NW)The algorithm uses a set or similar data structure to store all the words. In the worst case, all N words are stored, and each word has an average length of W. Therefore, the space required for the set is proportional to the total characters in all words, resulting in a space complexity of O(NW), where N is the number of words and W is the average length of a word.

Edge Cases

CaseHow to Handle
Empty input listReturn an empty string since no word can be formed.
Input list contains only one wordReturn that single word, as it's trivially the longest and its own prefix.
Input list contains words with different casing (e.g., 'a', 'A')Normalize the case of all words (e.g., lowercase) to ensure case-insensitive comparison.
No word is a prefix of another word in the listReturn an empty string if no word's prefixes are all present in the input.
Multiple words have the same maximum length and are valid solutionsReturn the lexicographically smallest longest word among the valid solutions.
Input contains non-alphabetic charactersFilter the words to only include alphabetic characters or handle non-alphabetic characters based on the specific problem context.
Very long words in the input list (potential memory issues)Use a Trie data structure for efficient prefix checking and memory usage.
Words that are prefixes of themselves (e.g., an empty string)Handle empty strings explicitly by either excluding them from the input or defining their prefix status clearly based on the problem requirements.