Taro Logo

Number of Valid Words for Each Puzzle

Hard
Dropbox logo
Dropbox
2 views
Topics:
ArraysBit Manipulation
With respect to a given puzzle string, a word is valid if both the following conditions are satisfied:
  • word contains the first letter of puzzle.
  • For each letter in word, that letter is in puzzle.
    • For example, if the puzzle is "abcdefg", then valid words are "faced", "cabbage", and "baggage", while
    • invalid words are "beefed" (does not include 'a') and "based" (includes 's' which is not in the puzzle).
Return an array answer, where answer[i] is the number of words in the given word list words that is valid with respect to the puzzle puzzles[i].

Example 1:

Input: words = ["aaaa","asas","able","ability","actt","actor","access"], puzzles = ["aboveyz","abrodyz","abslute","absoryz","actresz","gaswxyz"]
Output: [1,1,3,2,4,0]
Explanation: 
1 valid word for "aboveyz" : "aaaa" 
1 valid word for "abrodyz" : "aaaa"
3 valid words for "abslute" : "aaaa", "asas", "able"
2 valid words for "absoryz" : "aaaa", "asas"
4 valid words for "actresz" : "aaaa", "asas", "actt", "access"
There are no valid words for "gaswxyz" cause none of the words in the list contains letter 'g'.

Example 2:

Input: words = ["apple","pleas","please"], puzzles = ["aelwxyz","aelpxyz","aelpsxy","saelpxy","xaelpsy"]
Output: [0,1,3,2,0]

Constraints:

  • 1 <= words.length <= 105
  • 4 <= words[i].length <= 50
  • 1 <= puzzles.length <= 104
  • puzzles[i].length == 7
  • words[i] and puzzles[i] consist of lowercase English letters.
  • Each puzzles[i] does not contain repeated characters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the maximum lengths for the `words` and `puzzles` lists, and the maximum length of each word or puzzle string?
  2. Can the input lists `words` and `puzzles` contain null or empty strings, or be null or empty lists themselves?
  3. If a word is valid for a puzzle multiple times (e.g., the word "apple" is valid for the puzzle "aplle"), should I count it multiple times, or only once?
  4. Are the letters in the words and puzzles guaranteed to be lowercase English letters, or could there be other characters?
  5. If no words are valid for a given puzzle, should the corresponding entry in the output list be 0?

Brute Force Solution

Approach

For each puzzle, we check every word in the word list to see if it's a valid word for that puzzle. We do this by examining each word against the rules of a given puzzle.

Here's how the algorithm would work step-by-step:

  1. Take the first puzzle.
  2. Go through each word in the word list.
  3. For each word, check if it contains the first letter of the current puzzle.
  4. If it does, check if all the letters of the word are also present in the puzzle.
  5. If both conditions are true, the word is a valid word for this puzzle, so count it.
  6. Move to the next word and repeat steps 3-5 until all words have been checked.
  7. Record the number of valid words for the current puzzle.
  8. Move to the next puzzle and repeat steps 2-7 until all puzzles have been checked.

Code Implementation

def number_of_valid_words(words, puzzles):
    result = []
    for puzzle in puzzles:
        valid_word_count = 0
        # Iterate through each word to check against current puzzle.
        for word in words:
            first_letter_of_puzzle = puzzle[0]
            # Optimization: Word must contain first letter of puzzle.
            if first_letter_of_puzzle in word:

                is_valid = True
                # Ensure all letters in the word are present in puzzle.
                for letter in word:
                    if letter not in puzzle:
                        is_valid = False
                        break

                if is_valid:
                    valid_word_count += 1

        result.append(valid_word_count)
    return result

Big(O) Analysis

Time Complexity
O(p * w * (l_w + l_p))The provided approach iterates through each of the 'p' puzzles. For each puzzle, it iterates through each of the 'w' words in the word list. For each word, it checks if the first letter of the puzzle is in the word, which takes O(l_w) time, where l_w is the length of the word. If the first letter is present, it then checks if all letters of the word are present in the puzzle, which can take up to O(l_w * l_p) if we check each letter of the word against each letter of the puzzle, where l_p is the length of the puzzle. The combined cost inside the nested loops becomes O(l_w + l_w * l_p) which is effectively O(l_w * l_p). Thus, the overall time complexity is O(p * w * (l_w * l_p)). We can simplify this to O(p * w * (l_w + l_p)) since checking the first letter can be done within the word validation loop.
Space Complexity
O(1)The provided plain English explanation does not mention any auxiliary data structures such as arrays, hashmaps, or sets being created. The algorithm iterates through the puzzles and words, using a constant amount of extra space for variables to store counts and track progress. The memory usage remains constant irrespective of the number of puzzles or words in the input list. Therefore, the space complexity is O(1).

Optimal Solution

Approach

The key idea is to represent each word and puzzle as a set of letters. Then, for each puzzle, we efficiently check which words are valid by ensuring the word's letters are a subset of the puzzle's letters and that the word contains the puzzle's first letter.

Here's how the algorithm would work step-by-step:

  1. Convert each word into a simplified representation. Instead of storing the word as a string, store it as a number that represents which letters are present. For example, if the word has the letters 'a' and 'c', the number will represent that the first and third bits are set.
  2. Do the same to convert each puzzle into a number that represents its set of letters.
  3. Go through each puzzle one by one.
  4. For each puzzle, go through all of the simplified word representations.
  5. Check if the simplified word is a subset of the puzzle's simplified representation. This means all the letters in the word are also in the puzzle.
  6. Also, check if the first letter of the puzzle exists in the simplified word representation. If this check fails, the word cannot be valid.
  7. If both checks pass, the word is valid, and we increment the count for the current puzzle.
  8. After checking all words for a given puzzle, store the total count of valid words for that puzzle.
  9. Return the list of counts for each puzzle.

Code Implementation

def find_num_valid_words(words, puzzles):
    puzzle_results = []
    for puzzle in puzzles:
        puzzle_mask = 0
        for char in puzzle:
            puzzle_mask |= (1 << (ord(char) - ord('a')))

        first_letter_mask = (1 << (ord(puzzle[0]) - ord('a')))
        valid_word_count = 0

        for word in words:
            word_mask = 0
            for char in word:
                word_mask |= (1 << (ord(char) - ord('a')))

            # Check if word is subset of puzzle and contains first letter
            if (word_mask & puzzle_mask) == word_mask:

                # Verifies the word contains the puzzle's first letter
                if (word_mask & first_letter_mask) != 0:
                    valid_word_count += 1

        puzzle_results.append(valid_word_count)

    # Returns the count of valid words for each puzzle
    return puzzle_results

Big(O) Analysis

Time Complexity
O(p * w)The algorithm iterates through each of the 'p' puzzles. For each puzzle, it iterates through all 'w' words. Inside the inner loop, subset checking and first letter checking are done, which take constant time, O(1). Therefore, the dominant operations consist of these nested loops, resulting in a time complexity of O(p * w), where 'p' is the number of puzzles and 'w' is the number of words.
Space Complexity
O(W + P)The algorithm converts each word and puzzle into a simplified representation, storing them. Let W be the number of words and P be the number of puzzles. The space to store the simplified word representations is O(W), and the space to store the puzzle representations and their corresponding counts is O(P). Therefore, the auxiliary space complexity is O(W + P).

Edge Cases

CaseHow to Handle
Null or empty words arrayReturn a list of zeros with the same length as puzzles array.
Null or empty puzzles arrayReturn an empty list since no puzzles to process exist.
Empty string in words arrayTreat empty string as no matching letters so it never contributes to a count.
Empty string in puzzles arrayNo word will ever be valid for the empty puzzle so the count is zero.
Words or puzzles with very long strings (close to maximum string length)Ensure the bitmask approach used to represent the letters in words and puzzles handles potentially long string inputs without memory issues.
All words contain only the first letter of a given puzzle.The solution must correctly count all such words as valid for that puzzle if all other letters in the word are in the puzzle.
Words or puzzles contain duplicate letters.The set or bitmask representation will naturally handle duplicate letters without error.
Maximum number of words and puzzles are provided to test for performance bottlenecks.Evaluate the time complexity, ensuring bit manipulation is efficient and the solution scales linearly or better with the number of words and puzzles.