Taro Logo

Word Break

Medium
Walmart logo
Walmart
0 views
Topics:
StringsDynamic Programming

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:

Input: s = "leetcode", wordDict = ["leet","code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple","pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: false

Constraints:

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string `s` and words in `wordDict` contain characters other than lowercase English letters?
  2. What are the maximum lengths of the input string `s` and individual words in `wordDict`?
  3. Can `wordDict` contain empty strings, and if so, how should I handle them?
  4. If the string `s` cannot be segmented, should I return `false` or is there a specific error code?
  5. Is the `wordDict` a fixed dictionary or could it potentially change during multiple calls to the function (if we were designing an API)? (Assume a single call for the purpose of this interview.)

Brute Force Solution

Approach

The brute force approach to the 'Word Break' problem is like trying every single way to chop up a long string into smaller pieces to see if the pieces are valid words. We're essentially exploring every possible combination of word breaks to find a solution.

Here's how the algorithm would work step-by-step:

  1. Start by checking if the entire string is a valid word.
  2. Then, see if the first one letter is a valid word. If it is, check if the rest of the string can also be broken into valid words using this same approach.
  3. Next, see if the first two letters are a valid word. If so, check if the rest of the string can be broken down too.
  4. Keep increasing the length of the first potential word, and each time check if it's valid and if the remaining part of the string can also be broken down.
  5. If at any point, you find that you can break the entire string into valid words, you've found a solution.
  6. Continue trying all possible combinations, even after finding one solution, to make sure you haven't missed any other valid ways to break the string.

Code Implementation

def word_break_brute_force(input_string, word_dictionary):
    string_length = len(input_string)

    # Check if the entire string is a word
    if input_string in word_dictionary:
        return True

    for i in range(1, string_length + 1):
        # Consider all possible prefixes of the string
        prefix = input_string[:i]

        # If current prefix is a valid word
        if prefix in word_dictionary:

            # Recursively check the remaining string
            suffix = input_string[i:]

            # See if the rest of the string can also be segmented.
            if word_break_brute_force(suffix, word_dictionary):
                return True

    # If no break combination works, return false
    return False

Big(O) Analysis

Time Complexity
O(2^n)The brute force approach explores all possible prefixes of the string. For a string of length n, there are n-1 possible break points. At each break point, we have two choices: either break the string or don't break it. This leads to 2^(n-1) possible combinations of breaks. Since 2^(n-1) is still exponential in n, the time complexity is O(2^n). In the worst case, we might explore all these combinations before finding a valid word break or determining that no such break exists.
Space Complexity
O(N)The described brute force approach uses recursion. Each recursive call creates a new stack frame, and the maximum depth of the recursion can be N, where N is the length of the input string, because in the worst case, we could be checking substrings of decreasing length from N down to 1. Therefore, the maximum space used by the call stack is proportional to N. Thus, the space complexity is O(N).

Optimal Solution

Approach

The best way to solve the Word Break problem is to figure out if we can build the target phrase piece by piece using words from our dictionary. We can do this efficiently by remembering if we've already figured out how to build a part of the phrase, preventing redundant work and significantly speeding up the process.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the phrase.
  2. Check if the first word is in our dictionary. If it is, mark that we know we can build up to that point.
  3. Move to the next part of the phrase. Can we add another word from the dictionary to what we've already built?
  4. If we can, mark that new, longer segment as 'buildable'.
  5. Keep going, adding words from the dictionary to previously 'buildable' segments.
  6. Remember the 'buildable' spots. If we reach a segment we've already checked, don't re-check it.
  7. If, by the end, we've marked the entire phrase as 'buildable', then the phrase can be broken into words from the dictionary. Otherwise, it can't.

Code Implementation

def word_break(phrase, word_dictionary):
    phrase_length = len(phrase)
    buildable = [False] * (phrase_length + 1)
    buildable[0] = True

    for i in range(1, phrase_length + 1):
        for j in range(i):
            # See if we've found a valid word to extend a previous valid break
            if buildable[j] and phrase[j:i] in word_dictionary:

                buildable[i] = True
                break

    # If the last element is true, entire phrase is buildable
    return buildable[phrase_length]

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through all possible prefixes of the input string of length n, which takes O(n) time. For each prefix, it iterates through the dictionary to check if a word exists that can extend the prefix to a valid word sequence. In the worst case, the inner loop iterates through the entire dictionary for each prefix, and the dictionary lookup can take O(1) time (assuming a hash set). Additionally, within the outer loop, we also have an inner loop that implicitly checks all possible starting points of the words that make up the string, which will also be O(n) at the most. This results in a nested loop structure where we do work that approximates n * n/2 operations, leading to a time complexity of O(n²).
Space Complexity
O(N)The algorithm uses a data structure to remember which segments of the phrase are 'buildable'. This can be implemented using a boolean array or set, where each element corresponds to a position in the input string of length N, indicating whether the substring up to that position can be formed using words from the dictionary. Therefore, the auxiliary space used scales linearly with the length of the input string, N. In the worst case, we might need to store information for every prefix of the input string.

Edge Cases

CaseHow to Handle
Empty string sIf s is empty, return true, as an empty string can be considered segmented.
Empty dictionary wordDictIf the dictionary is empty and s is not empty, return false since no words can be used for segmentation.
s is a single character, not in wordDictReturn false, since the single character cannot be segmented if it's not a valid word.
s is a very long string, and wordDict is smallThe solution's time complexity should be considered, a dynamic programming approach can prevent exponential time in recursion.
wordDict contains very long words, longer than sLonger words can be ignored since they will never match any part of s.
wordDict contains duplicate wordsDuplicate words are treated the same as the original and do not cause incorrect output if a valid segmentation exists.
s cannot be segmented at allThe algorithm should correctly return false if no combination of words from wordDict can form s.
s can be segmented in multiple waysThe algorithm only needs to determine if *any* segmentation is possible, the existence of multiple solutions doesn't affect correctness.