Taro Logo

Unique Word Abbreviation

Medium
Meta logo
Meta
2 views
Topics:
StringsArrays

Design a class that can determine if the abbreviation of a word is unique within a given dictionary of words. The abbreviation is formed by the first letter, the number of characters between the first and last letter, and the last letter. If the word's length is less than or equal to 2, then the word itself is the abbreviation.

The class should have two methods:

  1. ValidWordAbbr(dictionary): Constructor that initializes the class with a dictionary of words.
  2. isUnique(word): Returns true if the abbreviation of word is unique in the dictionary, otherwise returns false.

For example:

dictionary = ["deer", "door", "cake", "card"]

ValidWordAbbr(dictionary)

isUnique("dear") -> false  // "dear"'s abbreviation is "d2r" which is the same as "door"
isUnique("cart") -> true  // "cart"'s abbreviation is "c2t" and there are no other words with the same abbreviation
isUnique("cane") -> false  // "cane"'s abbreviation is "c2e" which is the same as "cake"
isUnique("make") -> true  // "make"'s abbreviation is "m2e" and there are no other words with the same abbreviation

How would you implement this class, and what would be the time and space complexity of your solution?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible lengths for the words in the dictionary and the input string?
  2. Can the dictionary contain empty strings or null values?
  3. If the abbreviation matches multiple words in the dictionary, is it still considered valid?
  4. Is the comparison case-sensitive?
  5. If no valid abbreviation exists, what should the function return?

Brute Force Solution

Approach

We're given a word and a dictionary of words. The goal is to see if the word's abbreviation is unique among the words in the dictionary. The brute force approach is to check the abbreviation against every word in the dictionary.

Here's how the algorithm would work step-by-step:

  1. First, create the abbreviation for the given word.
  2. Then, go through each word in the dictionary, one at a time.
  3. For each dictionary word, create its abbreviation.
  4. Compare the abbreviation of the given word to the abbreviation of the current dictionary word.
  5. If the abbreviations are the same and the words are different, then the given word's abbreviation is not unique. Return 'not unique'.
  6. If you go through the entire dictionary and don't find any matching abbreviations for different words, then the given word's abbreviation is unique. Return 'unique'.

Code Implementation

def is_unique_word_abbreviation_brute_force(word, dictionary):
    word_abbreviation = create_abbreviation(word)

    for dictionary_word in dictionary:
        # Iterate through the dictionary

        dictionary_word_abbreviation = create_abbreviation(dictionary_word)

        # Crucial step: check if abbr same but words different
        if word_abbreviation == dictionary_word_abbreviation and word != dictionary_word:
            return False

    # If we went through the whole dictionary and didn't find a match
    return True

def create_abbreviation(word):
    word_length = len(word)
    if word_length <= 2:
        return word
    # Abbreviate only if length > 2, as per problem

    abbreviation = word[0] + str(word_length - 2) + word[-1]

    return abbreviation

Big(O) Analysis

Time Complexity
O(n*m)Let n be the number of words in the dictionary and m be the average length of the words. The algorithm iterates through each of the n words in the dictionary. For each word, it calculates the abbreviation, which takes O(1) time, assuming the average length is bounded by a constant. The abbreviation comparison is also O(1). Thus the time complexity is dominated by the loop iterating through the n words in the dictionary, therefore resulting in O(n*m).
Space Complexity
O(1)The algorithm's space complexity is dominated by the temporary variables used to store the abbreviations. In the steps, a temporary string is created to store the abbreviation of both the given word and the dictionary word. These strings' sizes are at most a function of a single word length, and since we are using a constant number of such variables regardless of the number of words in the dictionary, the auxiliary space used remains constant. Thus, the space complexity is O(1).

Optimal Solution

Approach

The goal is to determine if a given abbreviation is unique compared to a dictionary of words. The efficient approach involves pre-processing the dictionary to store possible abbreviations and quickly check if the given abbreviation exists for other words.

Here's how the algorithm would work step-by-step:

  1. First, create a way to turn words into their abbreviations. An abbreviation usually consists of the first letter, the number of letters in between, and the last letter.
  2. Next, go through the given dictionary of words. For each word, create its abbreviation.
  3. As you create abbreviations, keep track of which words have the same abbreviation. If multiple words have the same abbreviation, mark that abbreviation as ambiguous, so that later, we know we can't use it.
  4. When we need to check if a new word's abbreviation is unique, create the abbreviation for that word.
  5. Then, check if that abbreviation exists in our recorded list of abbreviations.
  6. If the abbreviation doesn't exist, or it exists but is marked as ambiguous (meaning multiple words share it), then the abbreviation is not unique. Otherwise, it is unique.

Code Implementation

class ValidWordAbbr:

    def __init__(self, dictionary: list[str]):
        self.abbreviation_map = {}

        for word in dictionary:
            abbreviation = self.to_abbreviation(word)

            # Track words with the same abbreviation.
            if abbreviation in self.abbreviation_map:
                self.abbreviation_map[abbreviation].add(word)
            else:
                self.abbreviation_map[abbreviation] = {word}

    def isUnique(self, word: str) -> bool:
        abbreviation = self.to_abbreviation(word)

        # Check if the abbreviation exists.
        if abbreviation not in self.abbreviation_map:
            return True
        
        # Ensure the abbreviation is only used by the current word.
        return self.abbreviation_map[abbreviation] == {word}

    def to_abbreviation(self, word: str) -> str:
        word_length = len(word)

        # No abbreviation needed for short words.
        if word_length <= 2:
            return word

        first_letter = word[0]
        last_letter = word[-1]
        middle_length = word_length - 2
        return f"{first_letter}{middle_length}{last_letter}"

Big(O) Analysis

Time Complexity
O(N + K)The constructor iterates through the dictionary of N words, calculating the abbreviation for each word. The abbreviation calculation itself takes O(1) time (first letter, last letter, and length calculation). During the constructor, we also build a hashmap which has a cost of O(N) in the worst case. The isUnique method calculates the abbreviation in O(1) time and then performs a lookup in the hashmap, which takes O(1) time. Therefore, the constructor takes O(N) time, and the isUnique method takes O(1) time. Overall, pre-processing the dictionary takes O(N), and checking for uniqueness across K calls will be O(K).
Space Complexity
O(N)The solution uses a hash map (or similar data structure) to store abbreviations and the words that generate them. In the worst case, each word in the dictionary has a unique abbreviation, leading to the storage of N abbreviations, where N is the number of words in the dictionary. The hash map also stores the words associated with each abbreviation, which, in the worst-case scenario where all words have the same abbreviation, would store all N words. Therefore, the auxiliary space used is proportional to the number of words in the dictionary, resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
Empty dictionaryReturn true, because no words conflict with the abbreviation.
Word is longer than abbreviation lengthReturn false because any abbreviation is always smaller or equal to the actual word.
Word is exactly the same as the abbreviation.Return true if and only if the word is in the dictionary and dictionary size is 1.
Abbreviation is empty stringReturn false if any word in dictionary is non-empty, otherwise true.
Dictionary contains multiple words which can be abbreviated to the same abbreviation as the word.Return false because there is a conflict in abbreviations within the dictionary.
Word in dictionary is the same word.If dictionary size is 1, then the word should be valid, else there are conflicting words.
Word is nullHandle null word case by throwing IllegalArgumentException or considering it as an empty string depending on the context.
Dictionary word is nullHandle null word in dictionary by skipping that word or throwing an error depending on the desired behavior.