Taro Logo

Word Abbreviation

Hard
Asked by:
Profile picture
Profile picture
Profile picture
16 views
Topics:
StringsArrays

Given a list of words, generate all possible abbreviations for each word. A word should be abbreviated such that the first letter and the last letter remains unchanged.

An abbreviation of a word will be in the form of <first letter><number><last letter> where <number> is the count of consecutive characters being omitted. Otherwise, a word is not abbreviated if it is shorter than or equal to length 2 because there is no room for abbreviation.

Example 1:

Input: ["word", "word", "word"]
Output: ["word", "word", "word"]
Explanation:
There are three identical words so no abbreviation is done.

Example 2:

Input: ["like", "god", "internal", "me", "internet", "interval", "intension", "face", "intrusion"]
Output: ["l2e","god","internal","me","i6t","interval","i7n","face","intrusion"]

Constraints:

  • 1 <= words.length <= 400
  • 1 <= words[i].length <= 400
  • words[i] consists of lower-case English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of lengths for the words in the input list? Can a word be an empty string?
  2. Are there any duplicate words in the input list? If so, how should they be handled when generating abbreviations?
  3. If multiple abbreviations of the same minimum length are possible for a word, is any one acceptable or is there a specific abbreviation to prioritize?
  4. How should I handle cases where an abbreviation is as long as the original word (i.e., no abbreviation is actually shorter)? Should I return the original word in those cases?
  5. Are all the words in the input list guaranteed to be in lowercase letters, or do I need to handle uppercase letters or other characters?

Brute Force Solution

Approach

The brute force approach to word abbreviation involves generating every possible abbreviation for each word and then checking if the resulting set of abbreviations is valid. This means trying all combinations. We check if any two words have the same abbreviation.

Here's how the algorithm would work step-by-step:

  1. For each word, create all possible abbreviations. This means trying every possible length of prefix and suffix, abbreviating the middle portion.
  2. Store all these abbreviations for each word.
  3. Now, compare the abbreviations of all pairs of words.
  4. If any two words have the same abbreviation, the set of abbreviations is invalid, so we move on to the next possibility.
  5. If all pairs of words have different abbreviations, then we have found a valid solution.
  6. Repeat this process, trying different combinations of abbreviations for each word until a valid set of abbreviations is found.

Code Implementation

def generate_all_abbreviations(words):
    all_abbreviations = []
    for word in words:
        word_abbreviations = set()
        word_length = len(word)
        for prefix_length in range(word_length):
            for suffix_length in range(word_length - prefix_length):
                # Generate abbreviations based on prefix and suffix lengths
                middle_length = word_length - prefix_length - suffix_length
                if middle_length > 0:
                    abbreviation = word[:prefix_length] + str(middle_length) + word[word_length - suffix_length:]
                else:
                    abbreviation = word[:prefix_length] + word[word_length - suffix_length:]
                word_abbreviations.add(abbreviation)

        all_abbreviations.append(list(word_abbreviations))
    return all_abbreviations

def is_valid_abbreviation_set(abbreviations):
    for i in range(len(abbreviations)):
        for j in range(i + 1, len(abbreviations)):
            for abbreviation1 in abbreviations[i]:
                for abbreviation2 in abbreviations[j]:
                    # Check for conflicts to invalidate candidate solution
                    if abbreviation1 == abbreviation2:
                        return False

    return True

def word_abbreviation_brute_force(words):
    all_possible_abbreviations = generate_all_abbreviations(words)

    def backtrack(index, current_abbreviations):
        if index == len(words):
            # Base case: check if current abbreviation set is valid
            if is_valid_abbreviation_set(current_abbreviations):
                return current_abbreviations
            else:
                return None

        # Iterate through all possible abbreviations for the current word
        for abbreviation in all_possible_abbreviations[index]:
            result = backtrack(index + 1, current_abbreviations + [[abbreviation]])

            # If we found a valid solution, return it immediately
            if result:
                return result

        return None

    # Start the backtracking process from the first word
    result = backtrack(0, [])
    if result:
        final_abbreviations = []
        for i in range(len(words)):
          final_abbreviations.append(result[i][0])
        return final_abbreviations
    else:
        return None

Big(O) Analysis

Time Complexity
O(2^L * N^2)Generating all possible abbreviations for a word of length L takes O(2^L) time since each character can either be abbreviated or not. Given N words, we perform this abbreviation generation for each word. After generating abbreviations, we need to compare all pairs of words to check for duplicates, which takes O(N^2) time in the worst case. Thus, the overall time complexity is O(2^L * N) for generating the abbreviations and O(N^2) for comparing them, resulting in O(2^L * N + N^2). If L is significantly large, we can approximate this as O(2^L * N^2). If L is relatively small, N^2 is the driving factor for the overall complexity. The overall complexity of the algorithm is dependent on L as the total number of abbrevations grows exponential to the length of the word.
Space Complexity
O(N*2^L)The algorithm generates all possible abbreviations for each word, where N is the number of words and L is the maximum length of a word. For each word, it creates up to 2^L abbreviations as each character can either be abbreviated or not. These abbreviations are stored, leading to a space complexity of O(N*2^L). Furthermore, comparing all pairs of words requires storing these abbreviations, and the number of pairs grows with N. Thus, overall, the auxiliary space complexity is O(N*2^L).

Optimal Solution

Approach

The problem requires us to abbreviate words in a list while ensuring each abbreviation is unique. The key idea is to find the shortest possible abbreviations for each word, starting with the shortest prefix and increasing the length until it's unique across the entire word list. We start with small abbreviations and only make them longer if necessary to avoid duplicates.

Here's how the algorithm would work step-by-step:

  1. Start by making the shortest possible abbreviation for each word. This usually means using only the first letter, followed by the length and the last letter of the word.
  2. Check if any of the abbreviations you just made are exactly the same. If you find identical abbreviations, you'll need to make them longer.
  3. For any abbreviations that are the same, gradually increase the length of the part you keep from the beginning of the word. Keep expanding the abbreviation until all the abbreviations are different from each other.
  4. Make sure your abbreviation doesn't get so long that it includes the entire word, in which case you just use the whole word.
  5. Double-check all the abbreviations one last time to confirm that every word has a unique abbreviation.

Code Implementation

def abbreviate_words(words):
    abbreviations = []
    for word in words:
        if len(word) <= 3:
            abbreviations.append(word)
        else:
            abbreviations.append(word[0] + str(len(word) - 2) + word[-1])

    prefix_length = [1] * len(words)

    while True:
        word_to_abbreviation = {}
        duplicate_found = False

        for index, word in enumerate(words):
            if len(word) <= 3:
                continue

            prefix = word[:prefix_length[index]]
            if len(prefix) < len(word) - 2:
                abbreviation = prefix + str(len(word) - prefix_length[index] - 1) + word[-1]
            else:
                abbreviation = word

            if abbreviation in word_to_abbreviation:
                duplicate_found = True
                break
            else:
                word_to_abbreviation[abbreviation] = index

        if not duplicate_found:
            break

        #Increase prefix length for duplicate
        for abbreviation, index in word_to_abbreviation.items():
            indices = [index for abbreviation, index in word_to_abbreviation.items() if abbreviation == abbreviation]
            
        for index, word in enumerate(words):
            if len(word) <= 3:
                continue
            if abbreviations.count(abbreviations[index]) > 1:
                prefix_length[index] += 1

    final_abbreviations = []
    for index, word in enumerate(words):
        if len(word) <= 3:
            final_abbreviations.append(word)
        else:
            prefix = word[:prefix_length[index]]
            if len(prefix) < len(word) - 2:
                abbreviation = prefix + str(len(word) - prefix_length[index] - 1) + word[-1]
            else:
                abbreviation = word
            final_abbreviations.append(abbreviation)

    # Ensure abbreviations are still unique
    while len(set(final_abbreviations)) != len(final_abbreviations):
        for i in range(len(words)):
            if final_abbreviations.count(final_abbreviations[i]) > 1:
                if len(words[i]) > 3 and prefix_length[i] < len(words[i]):
                    prefix_length[i] += 1

        final_abbreviations = []
        for index, word in enumerate(words):
            if len(word) <= 3:
                final_abbreviations.append(word)
            else:
                prefix = word[:prefix_length[index]]
                if len(prefix) < len(word) - 2:
                    abbreviation = prefix + str(len(word) - prefix_length[index] - 1) + word[-1]
                else:
                    abbreviation = word
                final_abbreviations.append(abbreviation)

    return final_abbreviations

Big(O) Analysis

Time Complexity
O(n*m)The initial abbreviation of each word takes O(1) time, where n is the number of words in the input list. The core logic involves iteratively checking for duplicate abbreviations and increasing the prefix length until uniqueness is achieved. In the worst-case scenario, for each word, we might need to compare its abbreviation with every other abbreviation in the list, which takes O(n) time per word. Furthermore, the prefix length can increase up to the length of the word (m). Therefore, it will involve m * O(n) operation. Hence the overall time complexity can be approximated as O(n*m) where n is the number of words in the input list and m is the maximum length of any word in the input list.
Space Complexity
O(N)The algorithm creates a list of abbreviations that, in the worst case, can have the same number of elements as the input word list (N), where N is the number of words in the input list. A hash map or set might be used to check for duplicate abbreviations, potentially storing all N abbreviations, leading to O(N) space. Additional variables used are independent of the input size, so the auxiliary space is dominated by the storage of abbreviations.

Edge Cases

Empty input word list
How to Handle:
Return an empty list immediately as there are no words to abbreviate.
Word list with a single word
How to Handle:
Return the word itself as its shortest unique abbreviation.
All words in the list are identical
How to Handle:
Iteratively increase the prefix length until each word's abbreviation becomes unique, handling cases where prefix length equals word length.
Words with common prefixes, leading to many collisions
How to Handle:
Iteratively increase the prefix length until all collisions are resolved, potentially requiring long prefixes.
Very long words, potentially leading to integer overflow when calculating abbreviation length
How to Handle:
Use appropriate data types (e.g., long) or check for potential overflow during length calculation.
Words with same length but differing characters only at the end
How to Handle:
The algorithm needs to handle these collisions by increasing prefix length till collisions are resolved.
Word length less than 3
How to Handle:
Return the word itself, as abbreviation is not applicable.
Input contains null or empty strings
How to Handle:
Treat null or empty strings as valid but unique words or filter them out before processing to prevent null pointer exceptions.