Taro Logo

Vowel Spellchecker

Medium
Grammarly logo
Grammarly
0 views
Topics:
ArraysStrings

Given a wordlist, we want to implement a spellchecker that converts a query word into a correct word.

For a given query word, the spell checker handles two categories of spelling mistakes:

  • Capitalization: If the query matches a word in the wordlist (case-insensitive), then the query word is returned with the same case as the case in the wordlist.
    • Example: wordlist = ["yellow"], query = "YellOw": correct = "yellow"
    • Example: wordlist = ["Yellow"], query = "yellow": correct = "Yellow"
    • Example: wordlist = ["yellow"], query = "yellow": correct = "yellow"
  • Vowel Errors: If after replacing the vowels ('a', 'e', 'i', 'o', 'u') of the query word with any vowel individually, it matches a word in the wordlist (case-insensitive), then the query word is returned with the same case as the match in the wordlist.
    • Example: wordlist = ["YellOw"], query = "yollow": correct = "YellOw"
    • Example: wordlist = ["YellOw"], query = "yeellow": correct = "" (no match)
    • Example: wordlist = ["YellOw"], query = "yllw": correct = "" (no match)

In addition, the spell checker operates under the following precedence rules:

  • When the query exactly matches a word in the wordlist (case-sensitive), you should return the same word back.
  • When the query matches a word up to capitlization, you should return the first such match in the wordlist.
  • When the query matches a word up to vowel errors, you should return the first such match in the wordlist.
  • If the query has no matches in the wordlist, you should return the empty string.

Given some queries, return a list of words answer, where answer[i] is the correct word for query = queries[i].

Example 1:

Input: wordlist = ["KiTe","kite","hare","Hare"], queries = ["kite","Kite","KiTe","Hare","HARE","Hear","hear","keti","keet","keto"]
Output: ["kite","KiTe","KiTe","Hare","hare","","","KiTe","","KiTe"]

Example 2:

Input: wordlist = ["yellow"], queries = ["YellOw"]
Output: ["yellow"]

Constraints:

  • 1 <= wordlist.length, queries.length <= 5000
  • 1 <= wordlist[i].length, queries[i].length <= 7
  • wordlist[i] and queries[i] consist only of only English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected character set for the words in both the `wordlist` and `queries`? Are they strictly alphabetical, or can they include numbers and symbols?
  2. To clarify the third matching rule, when ignoring both case and vowels, should I treat uppercase vowels ('A', 'E', 'I', 'O', 'U') the same as their lowercase counterparts?
  3. What are the constraints on the size of the `wordlist` and `queries`, and the maximum length of a word? This will help me choose the right data structures for an optimal solution.
  4. How should the spellchecker handle an empty string if it appears in the `wordlist` or as a `query`?
  5. The problem states to return the 'first' word from the `wordlist` that matches. If the `wordlist` itself contains duplicates (e.g., `['Yellow', 'YellOw']`), and a query matches both via a non-exact rule, should I return the one that appears first, in this case 'Yellow'?

Brute Force Solution

Approach

To find the correct spelling for each query word, we methodically check it against every single word in our dictionary. We do this multiple times, applying one spelling rule at a time in order of priority, and we stop as soon as we find any valid match.

Here's how the algorithm would work step-by-step:

  1. First, we'll scan the entire dictionary, looking for a word that is an identical, case-sensitive match to our query word. If we find one, that's our answer.
  2. If no identical match is found after checking the whole dictionary, we start a new search from the beginning of the dictionary.
  3. On this second pass, we check every word to see if it matches our query if we ignore whether letters are uppercase or lowercase. The very first word that works is our answer.
  4. If we still don't have an answer, we perform one last search through the entire dictionary from the beginning.
  5. This time, we create a 'vowel-blind' version of our query and each dictionary word by replacing all vowels with a placeholder. The first dictionary word whose 'vowel-blind' version matches our query's version is the answer.
  6. If none of these three complete searches through the dictionary yield a match, we conclude that there is no correct spelling for our word.

Code Implementation

def vowel_spellchecker_brute_force(wordlist, queries):
    corrected_queries_list = []
    vowels = {'a', 'e', 'i', 'o', 'u'}

    for query_word in queries:
        match_found_for_query = False

        # We check for a perfect, case-sensitive match first because it's the highest priority rule.
        for dictionary_word in wordlist:
            if dictionary_word == query_word:
                corrected_queries_list.append(dictionary_word)
                match_found_for_query = True
                break
        if match_found_for_query:
            continue

        # If no exact match exists, we try a case-insensitive match as the next priority.
        for dictionary_word in wordlist:
            if dictionary_word.lower() == query_word.lower():
                corrected_queries_list.append(dictionary_word)
                match_found_for_query = True
                break
        if match_found_for_query:
            continue

        # Finally, we handle vowel errors by comparing a 'devoweled' version of the words.
        query_chars_devoweled = []
        for character in query_word.lower():
            if character in vowels:
                query_chars_devoweled.append('*')
            else:
                query_chars_devoweled.append(character)
        devoweled_query = "".join(query_chars_devoweled)

        for dictionary_word in wordlist:
            dict_word_chars_devoweled = []
            for character in dictionary_word.lower():
                if character in vowels:
                    dict_word_chars_devoweled.append('*')
                else:
                    dict_word_chars_devoweled.append(character)
            devoweled_dictionary_word = "".join(dict_word_chars_devoweled)

            if devoweled_dictionary_word == devoweled_query:
                corrected_queries_list.append(dictionary_word)
                match_found_for_query = True
                break
        if match_found_for_query:
            continue

        corrected_queries_list.append("")

    return corrected_queries_list

Big(O) Analysis

Time Complexity
O(Q * W * L) – The time complexity is determined by processing each query against the entire dictionary. Let Q be the number of queries, W be the number of words in the dictionary, and L be the maximum word length. For each of the Q queries, the described approach performs up to three sequential scans through all W words in the dictionary. During each scan, comparing the query to a dictionary word or creating its vowel-blind version takes time proportional to L. This leads to a worst-case total number of operations proportional to Q * W * L, which simplifies to O(Q * W * L).
Space Complexity
O(L) where L is the maximum word length – The algorithm's space usage is determined by the temporary strings created during the case-insensitive and vowel-error checks for each query. For a given comparison, the process generates a temporary 'vowel-blind' or lowercase version of both the query and a dictionary word. The size of these temporary strings is proportional to the maximum length of a word, which we denote as L. Since this memory is reused for each word comparison and does not grow with the total number of words in the dictionary, N, the auxiliary space complexity is O(L).

Optimal Solution

Approach

The best approach is to avoid repeatedly searching the entire word list for each query. Instead, we do a one-time setup by creating a few specialized lookup guides from the word list, which lets us find the correct spelling for any query almost instantly by checking these guides in a specific order.

Here's how the algorithm would work step-by-step:

  1. First, we'll prepare for all the queries by pre-processing the original word list. We'll create three different kinds of organized lists, like 'address books', to make finding matches very fast.
  2. The first address book is for perfect matches. It simply contains every word from the word list exactly as it is, allowing for a quick check.
  3. The second address book is for fixing capitalization mistakes. For each word, we'll create a lowercase version to look up. We'll store the original word alongside it. If different words become the same when made lowercase, we only keep the one that appeared first in the original word list.
  4. The third address book is for fixing vowel mistakes. We'll create another lookup key by taking a word, making it lowercase, and replacing all its vowels with a generic symbol like a star. We'll store the original word with this 'devoweled' key. Again, we only keep the first one we see for any given devoweled form.
  5. With our three address books ready, we now process each query one by one following a strict order of priority.
  6. First, check for the query in the 'perfect match' book. If it's there, that's our answer, and we're done with that query.
  7. If not found, we check the 'capitalization' book using the lowercase version of the query. If we find a match, we use the corresponding original word as the answer.
  8. If we still don't have a match, we check the 'vowel mistake' book. We create a 'devoweled' version of the query and look for that. If it's in our book, we use the original word we stored with it.
  9. Finally, if a query doesn't match in any of our three books, it means there's no valid correction, and the answer for that query is simply blank.

Code Implementation

def vowel_spellchecker(word_list, queries_list):
    exact_match_words = set(word_list)
    capitalization_map = {}
    vowel_error_map = {}

    # Pre-processing builds lookup maps to make query time near-instant, avoiding N*M complexity.

    for original_word in word_list:
        lowercase_version = original_word.lower()
        # A 'devoweled' key generalizes words to handle any vowel mistake, like 'kite' and 'kote'.

        devoweled_key = "".join('*' if char in 'aeiou' else char for char in lowercase_version)

        if lowercase_version not in capitalization_map:
            capitalization_map[lowercase_version] = original_word
        if devoweled_key not in vowel_error_map:
            vowel_error_map[devoweled_key] = original_word
            
    result_list = []
    # Queries are checked in a strict priority order: exact, case-insensitive, then vowel error.

    for current_query in queries_list:
        if current_query in exact_match_words:
            result_list.append(current_query)
            continue
            
        lowercase_query = current_query.lower()
        if lowercase_query in capitalization_map:
            result_list.append(capitalization_map[lowercase_query])
            continue
            
        devoweled_query_key = "".join('*' if char in 'aeiou' else char for char in lowercase_query)
        if devoweled_query_key in vowel_error_map:
            result_list.append(vowel_error_map[devoweled_query_key])
            continue

        # If a query fails all matching rules, it has no valid correction in the word list.

        result_list.append("")
        
    return result_list

Big(O) Analysis

Time Complexity
O(N*C + M*C) – Let N be the number of words, M be the number of queries, and C be the maximum word length. The complexity is driven by two main parts: the initial setup and the query processing. The setup phase iterates through all N words, and for each, it performs string manipulations (like lowercasing or devoweling) that take O(C) time, resulting in an O(N*C) cost to build the lookup tables. Then, for each of the M queries, we perform similar O(C) string operations and lookups in the pre-built tables. The total time complexity is the sum of these two independent stages, which approximates to (N * C) + (M * C) and simplifies to O(N*C + M*C).
Space Complexity
O(N * C) – The algorithm's space usage is determined by the three 'address books' created during the one-time setup phase from the word list. These are a set for perfect matches, a hash map for capitalization errors, and a second hash map for vowel errors. In the worst case, each of the N words in the input wordlist requires a new entry in all three data structures. Since storing each word or its variation takes space proportional to its length (let's denote the maximum word length as C), the total auxiliary space is O(N * C).

Edge Cases

CaseHow to Handle
The input `wordlist` or `queries` array is empty.The solution should return an empty array, correctly handling the base case of no work to be done.
The `wordlist` and `queries` arrays are very large, potentially causing a timeout.Using hash maps to pre-process the `wordlist` allows for constant-time lookups for each query, ensuring an efficient O(N+M) solution.
A query has multiple potential matches with different priorities (e.g., one exact, one case-insensitive).The solution must perform lookups in the strict order of priority: exact match first, then case-insensitive, then vowel-error.
A query has multiple case-insensitive or vowel-error matches in the `wordlist`.When building lookup maps, the solution only stores the first occurrence for any given key, correctly implementing the 'first match' rule.
A query does not match any word in the `wordlist` by any of the three rules.If all lookup attempts fail, the solution correctly returns an empty string for that query as specified.
The `wordlist` contains the same word with different casings, like 'Apple' and 'apple'.Separate data structures for exact and case-insensitive matches ensure that an exact query ('apple') is correctly found over a case-insensitive one ('APPLE').
Words in the inputs contain non-alphabetic characters such as punctuation or numbers.The vowel-replacement logic only transforms the five English vowels, treating all other characters as immutable consonants.
The `wordlist` or `queries` contain empty strings.An empty query correctly finds an exact match if an empty string exists in the wordlist, otherwise it correctly results in no match.