Word Search II

Hard

Apple

3 views

Topics:

ArraysStringsRecursionGraphs

You are given an m x n board of characters and a list of strings words. Your task is to find and return all words on the board. Each word must be constructed from letters of sequentially adjacent cells, where adjacent cells are horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

Example 1:

Consider the following board and word list:

board = [
  ["o","a","a","n"],
  ["e","t","a","e"],
  ["i","h","k","r"],
  ["i","f","l","v"]
]
words = ["oath","pea","eat","rain"]

The expected output would be ["eat","oath"] because these words can be found on the board following the specified rules.

Example 2:

Consider a smaller board and word list:

board = [
  ["a","b"],
  ["c","d"]
]
words = ["abcb"]

In this case, the expected output is [] because the word "abcb" cannot be formed on the board without reusing the same cell.

Constraints:

m == board.length
n == board[i].length
1 <= m, n <= 12
board[i][j] is a lowercase English letter.
1 <= words.length <= 3 * 10^4
1 <= words[i].length <= 10
words[i] consists of lowercase English letters.
All the strings of words are unique.

Describe an algorithm to efficiently find all words from the given list that exist on the board, adhering to the given constraints.

Word Search II

Hard

Apple

3 views

Topics:

ArraysStringsRecursionGraphs

You are given an m x n board of characters and a list of strings words. Your task is to find and return all words on the board. Each word must be constructed from letters of sequentially adjacent cells, where adjacent cells are horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

Example 1:

Consider the following board and word list:

board = [
  ["o","a","a","n"],
  ["e","t","a","e"],
  ["i","h","k","r"],
  ["i","f","l","v"]
]
words = ["oath","pea","eat","rain"]

The expected output would be ["eat","oath"] because these words can be found on the board following the specified rules.

Example 2:

Consider a smaller board and word list:

board = [
  ["a","b"],
  ["c","d"]
]
words = ["abcb"]

In this case, the expected output is [] because the word "abcb" cannot be formed on the board without reusing the same cell.

Constraints:

m == board.length
n == board[i].length
1 <= m, n <= 12
board[i][j] is a lowercase English letter.
1 <= words.length <= 3 * 10^4
1 <= words[i].length <= 10
words[i] consists of lowercase English letters.
All the strings of words are unique.

Describe an algorithm to efficiently find all words from the given list that exist on the board, adhering to the given constraints.

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What are the dimensions of the board, and what's the maximum length of any word in the `words` list?
Can the board contain characters other than lowercase English letters?
Are there any duplicate words in the `words` list? If so, should they appear multiple times in the result if found?
If a word appears multiple times on the board via different paths, should it be included multiple times in the result, or only once?
If no words are found on the board, should I return an empty list, or is there any other specific return value?

Brute Force Solution

Approach

Imagine you are searching for words in a grid of letters. The brute force approach is like checking every single possible path in the grid to see if it spells out any of your target words. It's thorough but not very smart.

Here's how the algorithm would work step-by-step:

For each cell in the grid, treat it as the starting point for a potential word.
From that starting point, explore all possible paths by moving to adjacent cells (up, down, left, right).
As you explore each path, build a word from the letters you encounter along the way.
Compare the word you've built so far with your list of target words.
If the built word exactly matches a target word, add it to the list of found words.
Continue exploring paths from each starting cell until you've tried all possibilities.
Finally, return the list of target words that you successfully found in the grid.

Code Implementation

def word_search_brute_force(board, words):
    rows = len(board)
    cols = len(board[0])
    found_words = set()

    def explore_path(row, col, current_word, visited):
        # Base case: word found
        if current_word in words:
            found_words.add(current_word)

        # Check boundaries and visited cells
        if (row < 0 or row >= rows or col < 0 or col >= cols or
            (row, col) in visited):
            return

        new_word = current_word + board[row][col]

        # If no word starts with the current path, stop exploring
        if not any(word.startswith(new_word) for word in words):
            return

        new_visited = set(visited)
        new_visited.add((row, col))

        # Explore adjacent cells
        explore_path(row + 1, col, new_word, new_visited)
        explore_path(row - 1, col, new_word, new_visited)
        explore_path(row, col + 1, new_word, new_visited)
        explore_path(row, col - 1, new_word, new_visited)

    # Iterate through each cell as a starting point
    for row_index in range(rows):
        for col_index in range(cols):
            # Explore all possible paths starting from this cell
            explore_path(row_index, col_index, "", set())

    # Convert set to list before returning
    return list(found_words)

Big(O) Analysis

Time Complexity

O(M * N * 4^(L)) – Let M be the number of rows and N be the number of columns in the board. L is the maximum length of a word in the words list. We iterate through each cell in the M x N board, leading to M * N starting points. From each starting point, we explore all possible paths of length up to L. In the worst case, each cell has 4 adjacent cells (up, down, left, right) to explore, and the path can have a length of up to L. Therefore, the time complexity for exploring paths from each starting cell is 4^L. Thus, the overall time complexity is O(M * N * 4^(L)).

Space Complexity

O(W * L + M * N) – The brute force approach explores all possible paths using recursion. The maximum depth of the recursion is bounded by the dimensions of the board, M rows and N columns, so the recursion stack can grow to O(M * N) in the worst case. Additionally, to keep track of the found words, we might store them in a list that could contain up to W words, each of length up to L, contributing O(W * L) space, assuming W is the number of words in the input word list and L is the maximum length of any word. Thus the overall space complexity is O(W * L + M * N).

Optimal Solution

Approach

We're given a grid of letters and a list of words, and need to find all the words in the list that can be found in the grid. The efficient approach involves building a special data structure to store all the words, then searching the grid in a way that lets us quickly discard paths that can't possibly lead to a valid word.

Here's how the algorithm would work step-by-step:

First, organize all the words into a kind of branching tree. Think of it like a family tree where each node represents a letter, and paths from the root to other nodes spell out words.
Start at the top-left of the letter grid. Check if the letter in the grid exists as a branch in our word tree. If it does, we're potentially on the right path to finding a word.
If the letter matches, move to the next letter in a neighboring cell (up, down, left, or right). Again, check if this new letter continues a valid path in our word tree.
Keep moving from neighbor to neighbor as long as the path of letters we're creating still exists in our word tree. If we ever reach a point where the current path *doesn't* exist in the tree, we know we're on a dead end and can immediately stop searching along that path.
If, at any point, the letters we've found spell out a complete word in our word tree, add that word to our list of found words. Be sure to avoid adding duplicates.
Repeat this process for every starting position in the letter grid. By only exploring paths that are actually prefixes of words in our list, we avoid a lot of unnecessary work and find all the words efficiently.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.word = None

class Solution:
    def findWords(self, board, words):
        root_node = self.buildTrie(words)
        found_words = set()
        rows, cols = len(board), len(board[0])

        def depthFirstSearch(row, col, node):
            if row < 0 or row >= rows or col < 0 or col >= cols or board[row][col] == '#' or board[row][col] not in node.children:
                return

            char = board[row][col]
            node = node.children[char]

            # If we find a word, add it to results
            if node.word:
                found_words.add(node.word)

            original_char = board[row][col]
            board[row][col] = '#'

            depthFirstSearch(row + 1, col, node)
            depthFirstSearch(row - 1, col, node)
            depthFirstSearch(row, col + 1, node)
            depthFirstSearch(row, col - 1, node)

            board[row][col] = original_char

        for row in range(rows):
            for col in range(cols):
                depthFirstSearch(row, col, root_node)

        return list(found_words)

    def buildTrie(self, words):
        root_node = TrieNode()
        for word in words:
            node = root_node
            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
            node.word = word
        return root_node

Big(O) Analysis

Time Complexity

O(M * N * 4^L) – Let M be the number of rows and N be the number of columns in the board, and L be the maximum length of a word in the words list. Building the Trie takes O(sum of word lengths) which is less significant. The algorithm iterates through each cell in the board (M * N). For each cell, we perform a Depth-First Search (DFS). In the worst case, the DFS might explore all possible paths up to the maximum word length L. Since each cell has up to 4 neighbors to explore, the branching factor is 4, leading to a time complexity of O(4^L) for the DFS in the worst case. Therefore, the overall time complexity is O(M * N * 4^L).

Space Complexity

O(M) – The primary space usage comes from building the Trie data structure, where M is the total number of characters across all words in the input word list. Each node in the Trie stores a character and pointers to its children, potentially storing all unique characters of all the words. During the depth-first search (DFS), a visited set could be used to prevent cycles, which, in the worst case, could contain all the cells of the board, but this is accounted for within the O(M) Trie construction, as it will be visited when a complete word is found. Therefore, the space complexity is dominated by the Trie.

Edge Cases

Case	How to Handle
Empty board or empty words list	Return an empty list immediately as there are no words to search for or no board to search on.
Board with one row and one column, and the words list contains a word of length 1 that matches the cell	The algorithm should correctly identify the one-letter word if it matches the single cell on the board.
Words list contains duplicate words	The solution should avoid adding the same word multiple times to the result list by using a set or similar de-duplication mechanism.
Words contain characters not present in the board	The algorithm should not find these words and should not throw any errors.
Word is longer than the maximum possible path on the board (e.g., a word longer than board width * board height)	The depth-first search will terminate prematurely when it runs out of board space before finding the complete word.
Words contain repeated characters leading to cycles if not carefully handled.	The backtracking algorithm must mark visited cells during a search to prevent infinite recursion due to cycles.
Large board and many words, potentially leading to stack overflow with recursion.	The recursion depth should be considered and the implementation may need to be optimized, potentially using iterative approach for DFS.
Board contains unicode characters.	Ensure that the character comparisons and lookups in the Trie and DFS work correctly with unicode characters.