Taro Logo

Count Words Obtained After Adding a Letter

Medium
Amazon logo
Amazon
1 view
Topics:
ArraysStringsBit Manipulation

You are given two 0-indexed arrays of strings startWords and targetWords. Each string consists of lowercase English letters only.

For each string in targetWords, check if it is possible to choose a string from startWords and perform a conversion operation on it to be equal to that from targetWords.

The conversion operation is described in the following two steps:

  1. Append any lowercase letter that is not present in the string to its end.
    • For example, if the string is "abc", the letters 'd', 'e', or 'y' can be added to it, but not 'a'. If 'd' is added, the resulting string will be "abcd".
  2. Rearrange the letters of the new string in any arbitrary order.
    • For example, "abcd" can be rearranged to "acbd", "bacd", "cbda", and so on. Note that it can also be rearranged to "abcd" itself.

Return the number of strings in targetWords that can be obtained by performing the operations on any string of startWords.

Note that you will only be verifying if the string in targetWords can be obtained from a string in startWords by performing the operations. The strings in startWords do not actually change during this process.

For example:

Input: startWords = ["ant","act","tack"], targetWords = ["tack","act","acti"]

Output: 2

Explanation:

  • In order to form targetWords[0] = "tack", we use startWords[1] = "act", append 'k' to it, and rearrange "actk" to "tack".
  • There is no string in startWords that can be used to obtain targetWords[1] = "act". Note that "act" does exist in startWords, but we must append one letter to the string before rearranging it.
  • In order to form targetWords[2] = "acti", we use startWords[1] = "act", append 'i' to it, and rearrange "acti" to "acti" itself.

Could you provide an algorithm to efficiently solve this problem?

Solution


Naive Solution

The brute force solution would involve iterating through each word in targetWords and, for each word, iterating through each word in startWords. For each pair of words, we would check if we can convert a start word to a target word according to the given rules.

  1. Append: Try appending every possible letter not present in the start word.
  2. Rearrange: Check if the rearranged start word is equal to the target word. This can be done by comparing the sorted versions of both words.

This solution has a high time complexity.

def can_convert(start, target):
    for char in 'abcdefghijklmnopqrstuvwxyz':
        if char not in start:
            new_start = start + char
            if sorted(new_start) == sorted(target):
                return True
    return False

def word_count(startWords, targetWords):
    count = 0
    for target in targetWords:
        for start in startWords:
            if can_convert(start, target):
                count += 1
                break
    return count
  • Time Complexity: O(N * M * 26 * (K1 + K2)log(K1 + K2)), where N is the length of targetWords, M is the length of startWords, K1 is the maximum length of a word in startWords and K2 is the maximum length of a word in targetWords.
  • Space Complexity: O(K1 + K2), due to sorting.

Optimal Solution

A more efficient solution involves using bit manipulation to represent each word as an integer. Each bit in the integer corresponds to a letter in the alphabet. If a letter is present in the word, the corresponding bit is set to 1. For example, "abc" would be represented as 111 in binary, which is 7 in decimal.

  1. Convert to Bitmasks: Convert each word in both startWords and targetWords to its bitmask representation.
  2. Store Start Word Bitmasks: Store the bitmasks of startWords in a set.
  3. Check Target Words: For each word in targetWords, check if it can be formed from any of the startWords. To do this, generate all possible bitmasks that can be obtained by removing one bit from the target word's bitmask. If any of these resulting bitmasks are present in the set of startWords bitmasks, then the target word can be formed from a start word.
def word_count(startWords, targetWords):
    start_masks = set()
    for word in startWords:
        mask = 0
        for char in word:
            mask |= (1 << (ord(char) - ord('a')))
        start_masks.add(mask)

    count = 0
    for word in targetWords:
        mask = 0
        for char in word:
            mask |= (1 << (ord(char) - ord('a')))

        for i in range(26):
            if (mask >> i) & 1:
                new_mask = mask ^ (1 << i)
                if new_mask in start_masks:
                    count += 1
                    break
    return count
  • Time Complexity: O(N + M * L), where N is the length of targetWords, M is the length of startWords, and L is the maximum length of a word. The N comes from creating the set of startmasks. Then for each target word, we iterate through its letters which is bounded by 26.
  • Space Complexity: O(M), where M is the number of start words (to store the start masks).

Edge Cases

  • Empty input arrays.
  • Duplicate words in input arrays.
  • Words with non-lowercase English letters.
  • Very long input arrays that may cause memory issues (although the problem constraints limit array sizes).