Taro Logo

Find and Replace Pattern

Medium
Zomato logo
Zomato
2 views
Topics:
ArraysStrings

Given a list of strings words and a string pattern, return a list of words[i] that match pattern. You may return the answer in any order.

A word matches the pattern if there exists a permutation of letters p so that after replacing every letter x in the pattern with p(x), we get the desired word.

Recall that a permutation of letters is a bijection from letters to letters: every letter maps to another letter, and no two letters map to the same letter.

Example 1:

Input: words = ["abc","deq","mee","aqq","dkd","ccc"], pattern = "abb"
Output: ["mee","aqq"]
Explanation: "mee" matches the pattern because there is a permutation {a -> m, b -> e, ...}. 
"ccc" does not match the pattern because {a -> c, b -> c, ...} is not a permutation, since a and b map to the same letter.

Example 2:

Input: words = ["a","b","c"], pattern = "a"
Output: ["a","b","c"]

Constraints:

  • 1 <= pattern.length <= 20
  • 1 <= words.length <= 50
  • words[i].length == pattern.length
  • pattern and words[i] are lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the `pattern` and words in the `words` list contain only lowercase English letters, or can they contain other characters?
  2. If a word does not match the pattern, should it simply be excluded from the output list, or should an error be thrown?
  3. Is the pattern guaranteed to be non-empty? What should I return if the `pattern` is an empty string?
  4. Are the words in `words` guaranteed to be of the same length as the `pattern`?
  5. If multiple words match the pattern, is there any specific order required in the output list?

Brute Force Solution

Approach

The core idea of the brute force method here is to examine each word in a given list and see if it matches a given pattern. We systematically check every word against the pattern to find matches. If a word matches the pattern, we keep it.

Here's how the algorithm would work step-by-step:

  1. Take the first word from the list of words.
  2. Compare the structure of the word to the structure of the pattern, character by character.
  3. To compare, imagine creating a mapping where each unique character in the word corresponds to a character in the pattern, and vice-versa.
  4. If, at any point, a character in the word tries to map to two different characters in the pattern, or a character in the pattern tries to map to two different characters in the word, it's not a match.
  5. If you can go through the entire word and the pattern without any conflicting mappings, that word matches the pattern.
  6. If the word matches, save it in a separate collection of matching words.
  7. Repeat this process for every word in the original list.
  8. At the end, you will have a list of all the words that match the pattern.

Code Implementation

def find_and_replace_pattern(words, pattern):
    matching_words = []

    for word in words:
        if matches_pattern(word, pattern):
            matching_words.append(word)

    return matching_words

def matches_pattern(word, pattern):
    if len(word) != len(pattern):
        return False

    word_to_pattern = {}
    pattern_to_word = {}

    for i in range(len(word)):
        word_char = word[i]
        pattern_char = pattern[i]

        # Need to check existing mappings before creating new ones
        if word_char in word_to_pattern:
            if word_to_pattern[word_char] != pattern_char:
                return False
        
        if pattern_char in pattern_to_word:
            if pattern_to_word[pattern_char] != word_char:
                return False

        # Ensure consistency: chars map only to one other char
        if word_char not in word_to_pattern and pattern_char not in pattern_to_word:
            word_to_pattern[word_char] = pattern_char
            pattern_to_word[pattern_char] = word_char

    return True

Big(O) Analysis

Time Complexity
O(N * M)The algorithm iterates through each of the N words in the input list. For each word, it compares its structure against the pattern of length M. The comparison involves creating and checking mappings between characters in the word and the pattern. This mapping and comparison process takes O(M) time in the worst case. Therefore, the overall time complexity is O(N * M), where N is the number of words and M is the length of the pattern and each word.
Space Complexity
O(1)The algorithm uses a fixed number of hashmaps (or dictionaries) to store the mappings between characters in the word and the pattern. In the worst case, the size of these hashmaps will be limited by the number of unique characters in either the word or the pattern, which is bounded by the size of the alphabet (a constant). The algorithm also uses a list to store matching words, but the space complexity analysis is only concerned with auxiliary space, meaning the space *beyond* the space needed to store the output. Hence, the extra space used is constant, independent of the number of words (N) in the input list.

Optimal Solution

Approach

The core idea is to transform both the input word and the pattern into a normalized form. This normalized form reflects the relative order of characters rather than the specific characters themselves, allowing for direct comparison.

Here's how the algorithm would work step-by-step:

  1. Create a way to represent each word as a pattern of numbers instead of letters. We want this representation to show the order that the letters appear.
  2. For example, if a letter appears for the first time, give it a '1'. If a letter appears again that you've already seen, give it the same number. If it's a new letter, give it the next available number.
  3. Do this same numbering process for the pattern.
  4. Now, compare the number pattern of each word to the number pattern of the given pattern.
  5. If the two number patterns are the same, that means the word matches the pattern. If they are different, it doesn't.
  6. Keep track of all the words whose number patterns matched the given pattern.

Code Implementation

def find_and_replace_pattern(words, pattern):
    matching_words = []

    def normalize_word(word):
        mapping = {}
        normalized_pattern = []
        counter = 1

        # Use a map to keep track of the order that each letter appears
        for letter in word:
            if letter not in mapping:
                mapping[letter] = counter
                counter += 1
            normalized_pattern.append(str(mapping[letter]))
        return "".join(normalized_pattern)

    normalized_pattern = normalize_word(pattern)

    # Normalize the pattern for consistent comparisons
    for word in words:
        normalized_word = normalize_word(word)

        # Compare normalized forms to see if they match.
        if normalized_word == normalized_pattern:
            matching_words.append(word)

    # Only return the words that matched the pattern.
    return matching_words

Big(O) Analysis

Time Complexity
O(N * M)The algorithm iterates through each word in the input list of words, which we'll define as having a length of N. For each word, it transforms the word and the given pattern into a numerical representation of length M, where M is the length of the word (and the pattern). The transformation process involves iterating through each character in the word/pattern. Then we compare the transformed word with the transformed pattern, taking O(M) time. Therefore, the overall time complexity is O(N * M).
Space Complexity
O(N)The algorithm transforms both each word and the given pattern into numerical patterns. These numerical patterns are represented using auxiliary data structures such as arrays or strings of length N, where N is the length of the word or pattern. We store one numerical pattern representation for the input pattern and at most one numerical pattern for each word as we compare them. Therefore, the space complexity is directly proportional to the length of the input words and the pattern, resulting in O(N) auxiliary space.

Edge Cases

CaseHow to Handle
Empty words arrayReturn an empty list immediately as there are no words to process.
Empty pattern stringReturn an empty list immediately as there is no pattern to match against.
Null words array or pattern stringThrow an IllegalArgumentException or return an empty list to handle null input.
Words array contains an empty stringThe solution should handle this correctly by considering the empty string as a valid word to compare with the pattern.
Words array contains very long stringsThe solution's performance might degrade with very long strings, potentially requiring optimization if length becomes a constraint.
Pattern string with repeating characters at the beginning or end (e.g., 'aabb')The solution should correctly map the repeating pattern characters to the corresponding characters in the word.
Word and pattern have different lengthsThe solution should immediately determine that the word does not match the pattern and move on.
Large number of words in the words arrayThe solution's efficiency depends on the number of words; consider optimizing for large datasets if performance becomes critical.