Taro Logo

Word Break

Medium
2 months ago

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words. Note that the same word in the dictionary may be reused multiple times in the segmentation. Consider the following examples:

  1. s = "leetcode", wordDict = ["leet","code"]. The expected output is true because "leetcode" can be segmented as "leet code".
  2. s = "applepenapple", wordDict = ["apple","pen"]. The expected output is true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word.
  3. s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]. The expected output is false.

Provide an efficient algorithm to solve this problem. Detail the runtime and space complexity of your approach.

Sample Answer
## Word Break Problem

This problem asks whether a given string `s` can be segmented into a space-separated sequence of one or more words from a given dictionary `wordDict`.  The same word in the dictionary may be reused multiple times.

### 1. Brute Force Solution (with Recursion)

We can solve this problem recursively. For each prefix of the string `s`, we check if it's in the dictionary. If it is, we recursively check if the remaining suffix can also be segmented.  This approach explores all possible segmentations.

```python
def word_break_recursive(s: str, word_dict: set[str]) -> bool:
    if not s:
        return True

    for i in range(1, len(s) + 1):
        prefix = s[:i]
        if prefix in word_dict and word_break_recursive(s[i:], word_dict):
            return True

    return False


def word_break_brute_force(s: str, wordDict: list[str]) -> bool:
    return word_break_recursive(s, set(wordDict))

# Example Usage
s = "leetcode"
wordDict = ["leet", "code"]
print(f"Brute Force: {word_break_brute_force(s, wordDict)}")  # Output: True

s = "applepenapple"
wordDict = ["apple", "pen"]
print(f"Brute Force: {word_break_brute_force(s, wordDict)}")  # Output: True

s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
print(f"Brute Force: {word_break_brute_force(s, wordDict)}")  # Output: False

This recursive approach has exponential time complexity due to overlapping subproblems. For example, if s = 'aaaaab' and wordDict = ['a', 'aa', 'aaa'], the function will recalculate the possibility of breaking down 'a', 'aa', 'aaa', 'aaaa', 'aaaaa' repeatedly.

2. Optimal Solution (Dynamic Programming)

To avoid redundant calculations, we can use dynamic programming. We create a boolean array dp where dp[i] represents whether the substring s[:i] can be segmented into words from the dictionary.

  1. Initialize dp[0] to True (an empty string can be segmented).
  2. Iterate through the string s from index 1 to len(s). For each index i, iterate through all possible starting indices j (from 0 to i-1).
  3. If dp[j] is True (meaning the substring s[:j] can be segmented) and the substring s[j:i] is in the dictionary, then dp[i] is also True.
def word_break_dp(s: str, wordDict: list[str]) -> bool:
    word_set = set(wordDict)
    dp = [False] * (len(s) + 1)
    dp[0] = True

    for i in range(1, len(s) + 1):
        for j in range(i):
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[len(s)]

# Example Usage
s = "leetcode"
wordDict = ["leet", "code"]
print(f"Dynamic Programming: {word_break_dp(s, wordDict)}")  # Output: True

s = "applepenapple"
wordDict = ["apple", "pen"]
print(f"Dynamic Programming: {word_break_dp(s, wordDict)}")  # Output: True

s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
print(f"Dynamic Programming: {word_break_dp(s, wordDict)}")  # Output: False

3. Big(O) Runtime Analysis

  • Brute Force (Recursion): O(2n), where n is the length of the string s. In the worst case, each character could potentially be the start of a new word, leading to an exponential number of possible segmentations.
  • Dynamic Programming: O(n2), where n is the length of the string s. The outer loop iterates n times, and the inner loop iterates up to n times, resulting in O(n2) complexity. Checking if a substring exists in the dictionary word_set takes O(1) on average (if word_set is a hash set).

4. Big(O) Space Usage Analysis

  • Brute Force (Recursion): O(n) in the worst case, due to the recursion depth. The maximum recursion depth occurs when each character is a separate word.
  • Dynamic Programming: O(n), where n is the length of the string s. The dp array stores boolean values for each prefix of s, requiring linear space.

5. Edge Cases

  • Empty String: If the input string s is empty, the function should return True (because an empty string can be considered segmented).
  • Empty Dictionary: If the dictionary wordDict is empty, the function should return False (unless the input string s is also empty).
  • Long String, Short Words: If the string s is very long and the words in wordDict are very short, the dynamic programming approach becomes much more efficient than the brute force approach.
  • No Segmentation Possible: If the string s cannot be segmented into words from wordDict, the function should return False (e.g., s = "aaaaab", wordDict = ["a", "aa", "aaa"]).
  • Dictionary Contains Empty String: If the dictionary wordDict contains an empty string, the dynamic programming approach can lead to an infinite loop or incorrect results. It's best to handle this case by ensuring the dictionary does not contain an empty string.