Taro Logo

Longest Word in Dictionary

Medium
8 views
a month ago

Given an array of strings words representing an English Dictionary, return the longest word in words that can be built one character at a time by other words in words.

If there is more than one possible answer, return the longest word with the smallest lexicographical order. If there is no answer, return the empty string.

Note that the word should be built from left to right with each additional character being added to the end of a previous word.

Example 1:

Input: words = ["w","wo","wor","worl","world"] Output: "world" Explanation: The word "world" can be built one character at a time by "w", "wo", "wor", and "worl".

Example 2:

Input: words = ["a","banana","app","appl","ap","apply","apple"] Output: "apple" Explanation: Both "apply" and "apple" can be built from other words in the dictionary. However, "apple" is lexicographically smaller than "apply".

Sample Answer
class Solution:
    def longestWord(self, words: list[str]) -> str:
        words.sort()  # Sort lexicographically and by length (shorter first)
        wordset = set([''])  # Start with an empty string as buildable
        longest = ''
        for word in words:
            if word[:-1] in wordset:
                wordset.add(word)
                if len(word) > len(longest):
                    longest = word
                elif len(word) == len(longest) and word < longest:
                    longest = word
        return longest

Explanation:

The idea is to use a set to keep track of words that can be built. We iterate through the sorted list of words. If the word without the last character is in the set, it means we can build the current word. We then check if the current word is longer than the longest word found so far. If it is, we update the longest word. If the lengths are equal, we choose the lexicographically smaller word.

Example:

For words = ["w","wo","wor","worl","world"]:

  1. words is sorted to ["w", "wo", "wor", "worl", "world"].
  2. wordset is initialized to {''}.
  3. Iterate through words:
    • word = "w". "" in wordset is true. wordset.add("w"). longest = "w".
    • word = "wo". "w" in wordset is true. wordset.add("wo"). longest = "wo".
    • word = "wor". "wo" in wordset is true. wordset.add("wor"). longest = "wor".
    • word = "worl". "wor" in wordset is true. wordset.add("worl"). longest = "worl".
    • word = "world". "worl" in wordset is true. wordset.add("world"). longest = "world".
  4. Return "world".

Naive Solution:

A naive solution would be to iterate through all the words and for each word, check if it can be built from other words in the list. This would involve checking all prefixes of each word. This solution has a time complexity of O(N^2 * L), where N is the number of words and L is the average length of the words.

def longestWord_naive(words: list[str]) -> str:
    longest = ''
    for word in words:
        buildable = True
        for i in range(1, len(word)):
            if word[:i] not in words:
                buildable = False
                break
        if buildable:
            if len(word) > len(longest):
                longest = word
            elif len(word) == len(longest) and word < longest:
                longest = word
    return longest

Optimal Solution:

The optimal solution sorts the words lexicographically and by length (shorter words first). We use a set to keep track of words that can be built. We iterate through the sorted list of words. If the word without the last character is in the set, it means we can build the current word. We then check if the current word is longer than the longest word found so far. If it is, we update the longest word. If the lengths are equal, we choose the lexicographically smaller word. This reduces the time complexity to O(N log N + N*L), where N is the number of words and L is the average length of the words.

Big(O) Run-time Analysis:

  • Sorting: words.sort() takes O(N log N) time, where N is the number of words.
  • Iteration: The loop for word in words: iterates through each word once, taking O(N) time.
  • String Slicing: word[:-1] takes O(L) time, where L is the average length of the words.
  • Set Operations: word[:-1] in wordset and wordset.add(word) take O(1) on average.

Therefore, the overall time complexity is dominated by the sorting step, which is O(N log N). The iteration and set operations contribute O(N * L) which is smaller.

Overall Time Complexity: O(N log N)

Big(O) Space Usage Analysis:

  • wordset: The wordset can contain at most all the words in the input list. Thus, it takes O(N*L) space where N is the number of words and L is the average word length.
  • longest: The longest variable stores a word, which takes O(L) space.

The sort method might use O(N) space in worst-case scenarios like merge sort or quicksort with bad pivot selection; however, Python's Timsort usually has O(1) auxiliary space.

Overall Space Complexity: O(N*L)

Edge Cases:

  1. Empty Input: If the input list is empty, the function should return an empty string. The current solution handles this case correctly because the loop will not execute.
  2. No Buildable Word: If no word can be built from other words, the function should return an empty string. The current solution handles this case because longest is initialized to '' and only updated if a buildable word is found.
  3. Words with Same Length: If there are multiple words with the same length that can be built, the function should return the lexicographically smallest one. The current solution handles this case because the elif condition len(word) == len(longest) and word < longest ensures that the lexicographically smaller word is chosen.
  4. Single-Character Words: The solution correctly handles cases with single-character words like ['a', 'b', 'c']. It will identify them as buildable since the empty string is in the wordset.
  5. Duplicate Words: If there are duplicate words in the input, the set data structure helps avoid overcounting or errors.
  6. Very Long Words: The solution handles words of up to 30 characters as per the problem constraints without issues, given sufficient memory.