Word Break II

Hard

Grammarly

0 views

Topics:

StringsArraysDynamic ProgrammingRecursion

Given a string s and a dictionary of strings wordDict, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences in any order.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:

Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
Output: ["cats and dog","cat sand dog"]

Example 2:

Input: s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"]
Output: ["pine apple pen apple","pineapple pen apple","pine applepen apple"]
Explanation: Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: []

Constraints:

1 <= s.length <= 20
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 10
s and wordDict[i] consist of only lowercase English letters.
All the strings of wordDict are unique.
Input is generated in a way that the length of the answer doesn't exceed 10⁵.

Word Break II

Hard

Grammarly

0 views

Topics:

StringsArraysDynamic ProgrammingRecursion

Given a string s and a dictionary of strings wordDict, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences in any order.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:

Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
Output: ["cats and dog","cat sand dog"]

Example 2:

Input: s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"]
Output: ["pine apple pen apple","pineapple pen apple","pine applepen apple"]
Explanation: Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: []

Constraints:

1 <= s.length <= 20
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 10
s and wordDict[i] consist of only lowercase English letters.
All the strings of wordDict are unique.
Input is generated in a way that the length of the answer doesn't exceed 10⁵.

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

Can the same word from `wordDict` be used multiple times in a single segmentation?
What is the expected output if `s` cannot be segmented into words from `wordDict`? Should I return an empty list?
Are there any constraints on the length of the string `s` or the words in `wordDict`?
Does the word dictionary contain empty strings? How should I handle it?
Is the word dictionary guaranteed to have only unique words, or could there be duplicate words?

Brute Force Solution

Approach

The brute force strategy explores every possible way to break the given string into valid words from the dictionary. We will recursively try all combinations, building sentences until we either find a valid sentence or exhaust all possibilities. It's like trying every single path until we stumble upon the correct one(s).

Here's how the algorithm would work step-by-step:

Start at the beginning of the string.
Try cutting off the first few letters to see if they form a valid word from the dictionary.
If they do, remember this word and repeat the process with the remaining part of the string.
Keep doing this until you reach the end of the string. If all the words you found are valid and cover the entire string, you've found a possible sentence.
If, at any point, you can't find a valid word, backtrack and try a different way of cutting the string.
Repeat these steps by trying all possible starting points and word combinations until you've exhausted all possible sentence structures.
Collect all the valid sentences you find along the way.

Code Implementation

def word_break_brute_force(input_string, word_dictionary):
    results = []

    def backtrack(start_index, current_sentence):
        # If we've reached the end of the string, we've found a valid sentence
        if start_index == len(input_string):
            results.append(" ".join(current_sentence))
            return

        for end_index in range(start_index + 1, len(input_string) + 1):
            word = input_string[start_index:end_index]

            # Check if the substring is a valid word.
            if word in word_dictionary:
                # Add the valid word to the current sentence
                current_sentence.append(word)

                # Recursively call backtrack to explore the remaining string
                backtrack(end_index, current_sentence)

                # Backtrack: Remove the last word to explore other possibilities
                current_sentence.pop()

    backtrack(0, [])
    return results

Try it yourself on LeetCode

Big(O) Analysis

Time Complexity

O(2^n) – The brute force approach recursively explores all possible substrings of the input string of length n. In the worst-case scenario, where no word is in the dictionary, we might try every possible cut. Each character presents a potential cut or no-cut decision, leading to 2 options for each of the n characters. Therefore, the total number of possible combinations grows exponentially with the input size, resulting in a time complexity of O(2^n).

Space Complexity

O(N^2) – The primary space usage stems from the recursion depth and the storage of intermediate sentence fragments. In the worst-case scenario, where no word in the dictionary is a prefix of another and the string can be segmented into words of length 1, the recursion depth can reach N, where N is the length of the input string. Each recursive call might store a sentence fragment of up to N characters to build a valid sentence. Furthermore, due to backtracking, multiple sentence fragments may be stored at each level of the recursion. This leads to a space complexity proportional to N for the recursion stack, and N to store the sentence fragments at each level of recursion. Therefore, the space complexity can be estimated as O(N^2).

Optimal Solution

Approach

The goal is to break a long string into a sequence of valid words from a dictionary and return all possible sentences. The optimal solution uses a technique that avoids recomputing results for overlapping subproblems, significantly speeding up the process. It efficiently builds valid sentences by remembering which parts of the string can be broken down further.

Here's how the algorithm would work step-by-step:

First, create a way to remember whether a certain part of the long string can be broken into valid words. This prevents us from recalculating the same thing over and over.
Start at the beginning of the string and see if the initial part is a valid word. If it is, try to break down the rest of the string starting from that point.
If the initial part isn't a valid word, move to the next character and repeat the process.
When you find a valid word, remember all the possible ways to break down the remaining part of the string. This way, you don't have to redo that work later.
If you reach the end of the string, you've found a complete sentence. Add it to the list of possible sentences.
Because you are saving the results of previously computed solutions, you can retrieve the breakdown for the remaining part of the string rather than recalculate it. This makes the process much faster and more efficient, especially for longer strings.

Code Implementation

def word_break_ii(long_string, word_dictionary): 
    memo = {len(long_string): ['']}

    def word_break_recursive(start_index):
        if start_index in memo:
            return memo[start_index]

        sentences = []

        # Iterate through possible word endings.
        for end_index in range(start_index + 1, len(long_string) + 1): 
            word = long_string[start_index:end_index]

            # Check if the current substring is a valid word.
            if word in word_dictionary:

                # Recursively break down the remaining string.
                remaining_sentences = word_break_recursive(end_index)

                # Build sentences by combining the current word with the remaining sentences.
                for sentence in remaining_sentences:
                    if sentence:
                        sentences.append(word + ' ' + sentence)
                    else:
                        sentences.append(word)

        memo[start_index] = sentences
        return sentences

    # Initiate the recursive process starting from index 0.
    result = word_break_recursive(0)
    return result

Try it yourself on LeetCode

Big(O) Analysis

Time Complexity

O(n^3) – Let n be the length of the input string. The algorithm iterates through all possible starting positions for words within the string (O(n)). For each starting position, it iterates through all possible ending positions to form a word (O(n)). If a valid word is found, it recursively calls the function for the remaining substring. In the worst-case, where every prefix of the string is a valid word, the number of possible sentences can grow exponentially. The memoization helps, but string concatenation operations within the recursion to build sentences could add a linear cost (O(n)) for each valid sentence. Therefore, the overall time complexity, considering the string concatenation, could approach O(n^3) in the worst-case scenarios, especially when the dictionary enables numerous valid sentence formations.

Space Complexity

–

Edge Cases

Case	How to Handle
Empty string s	If s is empty and wordDict is empty return empty list, if s is empty and wordDict is not empty, return empty list, if s is empty and wordDict contains the empty string, return list containing the empty string.
Empty wordDict	If wordDict is empty, and s is non-empty, return empty list, as no segmentation is possible.
s is a single character not in wordDict	Return an empty list as no segmentation is possible.
s is a single word in wordDict	Return a list containing only s.
s is a long string and no valid segmentation exists	The algorithm should terminate gracefully and return an empty list without excessive recursion depth.
wordDict contains very long words not found in s	The algorithm should efficiently skip these words during segmentation attempts.
wordDict contains duplicate words	Duplicates in wordDict should not affect the segmentation result; the algorithm should still find all valid segmentations.
s is very long, and there are many valid segmentations (potential for stack overflow with naive recursion)	Use memoization (dynamic programming) to avoid recomputing solutions for substrings and improve efficiency, limiting recursion depth.