Taro Logo

Word Break

Medium
Netflix logo
Netflix
1 view
Topics:
StringsDynamic Programming

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

For example:

  1. s = "leetcode", wordDict = ["leet","code"] should return true because "leetcode" can be segmented as "leet code".
  2. s = "applepenapple", wordDict = ["apple","pen"] should return true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word.
  3. s = "catsandog", wordDict = ["cats","dog","sand","and","cat"] should return false.

How would you approach solving this problem? What would be the time and space complexity of your solution? Can you provide code?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string `s` be empty, and can the `wordDict` be empty?
  2. Are all words in `wordDict` guaranteed to be non-empty?
  3. Are there any constraints on the length of the string `s` or the number of words in `wordDict`?
  4. Can the same word in `wordDict` be used multiple times and in any order to segment the string `s`?
  5. Is the word matching case-sensitive?

Brute Force Solution

Approach

The brute force approach to the word break problem is like trying all the possible ways to cut a long string into smaller pieces, and then checking if those pieces are valid words. It's like trying every combination until you find one that works. We explore every option, even if it takes a very long time.

Here's how the algorithm would work step-by-step:

  1. Start by considering the very first letter of the string. Is that a valid word by itself?
  2. Then, consider the first two letters combined. Is that a valid word?
  3. Keep extending the potential word by one letter at a time and checking if it's a valid word.
  4. If you find a valid word, remember it and see what's left of the string after you remove that word.
  5. Now, repeat the same process on the remaining part of the string: try the first letter, the first two letters, and so on, checking if they're valid words.
  6. Continue doing this until either you've broken the entire original string into valid words, or you've exhausted all possible ways to break it starting from the beginning.
  7. If you break the entire string into valid words, then you've found a solution. If you run out of possibilities without finding a solution, then the string cannot be broken into valid words using the provided dictionary.

Code Implementation

def word_break(input_string, word_dictionary):
    string_length = len(input_string)

    def can_break(start_index):
        # If we've reached the end, we've successfully broken the string
        if start_index == string_length:
            return True

        for end_index in range(start_index + 1, string_length + 1):
            # Check every substring starting from start_index
            substring = input_string[start_index:end_index]

            if substring in word_dictionary:
                # If current substring is a valid word, explore the rest

                if can_break(end_index):
                    return True

        return False

    # Initiate the recursion from the beginning of the string
    return can_break(0)

Big(O) Analysis

Time Complexity
O(2^n)The brute force approach explores all possible prefixes of the input string of length n. For each prefix, it checks if the prefix is a valid word in the dictionary. If a valid prefix is found, the function recursively calls itself with the remaining suffix. In the worst case, where nearly every prefix is a valid word or near-valid word, this leads to exploring a binary tree of possibilities where each character has two choices: either be the end of the word or continue to the next character. This results in a time complexity of O(2^n) since, in the worst case, the recursion branches at each character.
Space Complexity
O(N)The brute force approach, as described, involves recursion. In the worst-case scenario, where no prefix of the input string 's' (of length N) is present in the dictionary 'wordDict' except potentially the entire string 's' itself, the algorithm will make a recursive call for each possible prefix of the string. This leads to a recursion depth of N, where each level of recursion stores the current state of the string and other related variables on the call stack. Therefore, the space complexity is proportional to the depth of the recursion, resulting in O(N) auxiliary space due to the call stack.

Optimal Solution

Approach

The optimal solution uses a 'can we build this' strategy. It figures out if we can construct a sentence by breaking the given string into valid words from the provided dictionary, building up from small chunks.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of the given string.
  2. Check if the first word (or part of it) is a valid word from the dictionary.
  3. If it is, remember that we can reach that point in the string.
  4. Now, move forward, and from each 'reachable' point, check if the next word (or part of it) from that point is also a valid word.
  5. If we find another valid word, mark the end of *that* word as reachable.
  6. Keep doing this, advancing through the string and building on the previously reachable points.
  7. Finally, if the very end of the original string is marked as 'reachable', it means we successfully broke the entire string into valid words.

Code Implementation

def word_break(input_string, word_dictionary):
    string_length = len(input_string)
    reachable = [False] * (string_length + 1)
    reachable[0] = True

    # Mark the start of the string as reachable

    for i in range(string_length):
        if reachable[i]:
            for word in word_dictionary:
                if input_string[i:i + len(word)] == word:

                    # Mark the end of the valid word as reachable.

                    reachable[i + len(word)] = True

    # Check if the entire string can be broken into words.
    return reachable[string_length]

Big(O) Analysis

Time Complexity
O(n³)The outer loop iterates through each index of the string of length n. The inner loop iterates from the current index to the end of the string, also up to n iterations. Inside the inner loop, we perform a substring operation and a dictionary lookup, where the substring operation can take O(n) time in the worst case. Therefore, the overall time complexity is O(n * n * n) which simplifies to O(n³).
Space Complexity
O(N)The algorithm uses a data structure to remember the 'reachable' points in the string. This can be implemented using a boolean array (or set) of size N+1, where N is the length of the input string. Each element in this array indicates whether the corresponding index in the string is reachable. Therefore, the auxiliary space used is proportional to the length of the input string, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Empty string sReturn true if wordDict is empty, otherwise return false as an empty string can only be segmented if the dictionary is empty.
Empty wordDictReturn false as the string s cannot be segmented into words from an empty dictionary.
s is a single character not present in wordDictReturn false because the single character cannot be segmented from the dictionary.
s contains only one word which exists in wordDictReturn true as the entire string s can be segmented into this one word.
s is very long and contains no valid segmentationThe dynamic programming approach should eventually determine that no valid segmentation exists and return false, but its performance needs to be considered.
wordDict contains very long words that don't contribute to a solutionThe algorithm should still work correctly, but performance might be impacted due to iterating over these long, useless words.
wordDict contains duplicate wordsThe algorithm should work correctly as duplicates are redundant, though sets can improve efficiency by removing the duplicates.
s can be segmented in multiple waysThe algorithm should return true as long as at least one valid segmentation is possible.