Taro Logo

Valid Word Abbreviation

Easy
Meta logo
Meta
19 views
Topics:
StringsTwo Pointers

You are given a non-empty string word and an abbreviation abbr, return whether the string matches with the given abbreviation.

A string such as "word" contains only the following types of characters:

  • lowercase English letters

and abbreviation of it may be:

  • lowercase English letters
  • integers (representing the number of lowercase English letters bridged over in the original string)

Can you write a function that validates if a word can be abbreviated by the given abbreviation?

Example 1:

Input: word = "internationalization", abbr = "i12iz4n"
Output: true

Example 2:

Input: word = "apple", abbr = "a2e"
Output: false

Example 3:

Input: word = "a", abbr = "01"
Output: false

Explain your solution's time and space complexity. Make sure to handle edge cases such as leading zeros in the abbreviation.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the `abbr` string contain leading zeros in the numerical parts of the abbreviation?
  2. Can the `word` or `abbr` strings be empty or null?
  3. If the abbreviation is invalid, should I return false immediately, or are there specific criteria for determining validity beyond a simple mismatch?
  4. Are the numerical parts of the abbreviation guaranteed to be within the integer range, or should I handle potential overflow?
  5. Is the input case-sensitive, meaning 'Word' is different from 'word'?

Brute Force Solution

Approach

The brute force approach to checking if an abbreviation is valid involves systematically comparing the abbreviation to the full word. We expand the abbreviation and character-by-character compare it to the word. If they match, the abbreviation is deemed valid, otherwise its not.

Here's how the algorithm would work step-by-step:

  1. Think of the abbreviation as a set of instructions on how to build a shorter form of the original word.
  2. Carefully go through the abbreviation, piece by piece.
  3. If a piece is a number, that means to skip that many characters in the original word.
  4. If a piece is a character, check if that character matches the character at the current position in the original word.
  5. Keep going through the abbreviation and the original word, following the skip instructions and character-matching instructions.
  6. If you reach the end of both the abbreviation and the original word at the same time, without finding any mismatches, then the abbreviation is valid.
  7. If at any point a character doesn't match, or you reach the end of one before the other, then the abbreviation is not valid.

Code Implementation

def is_valid_word_abbreviation(word, abbreviation):
    word_index = 0
    abbreviation_index = 0

    while abbreviation_index < len(abbreviation) and word_index < len(word):
        if abbreviation[abbreviation_index].isdigit():
            # Handle consecutive digits by accumulating the number
            number_of_skips = 0
            while abbreviation_index < len(abbreviation) and abbreviation[abbreviation_index].isdigit():
                if abbreviation[abbreviation_index] == '0' and number_of_skips == 0:
                    return False
                number_of_skips = number_of_skips * 10 + int(abbreviation[abbreviation_index])
                abbreviation_index += 1

            # Skip characters in the word based on the parsed number
            word_index += number_of_skips

        else:
            # Compare characters if the abbreviation contains a character
            if word[word_index] != abbreviation[abbreviation_index]:
                return False

            word_index += 1
            abbreviation_index += 1

    # Check if we've reached the end of both strings
    return word_index == len(word) and abbreviation_index == len(abbreviation)

Big(O) Analysis

Time Complexity
O(m + n)Let m be the length of the abbreviation string and n be the length of the word. The algorithm iterates through the abbreviation string once. Inside the loop, it either processes a character (O(1)) or a number representing the number of characters to skip in the word. The maximum number of skips cannot exceed the word's length n. Each character in the abbreviation is processed once, and we might iterate at most the length of the word based on the number in the abbreviation. Therefore, the time complexity is proportional to the sum of the lengths of the two strings, yielding O(m + n).
Space Complexity
O(1)The algorithm described operates directly on the input strings (the word and the abbreviation) without creating any auxiliary data structures like arrays, lists, or hash maps to store intermediate results. It uses a few integer variables to keep track of the current positions in the word and the abbreviation during the comparison process. Since the number of these variables remains constant regardless of the size of the input word or abbreviation, the space complexity is O(1).

Optimal Solution

Approach

The problem requires checking if an abbreviation correctly matches a given word. The efficient strategy involves comparing the word and the abbreviation sequentially, consuming characters from both as we go. Numerical segments in the abbreviation indicate a skip in the word.

Here's how the algorithm would work step-by-step:

  1. Start comparing the abbreviation and the word from the beginning.
  2. If you encounter a number in the abbreviation, treat it as the number of characters to skip in the word.
  3. Advance the position in the word by the amount specified by the number.
  4. If you encounter a character in the abbreviation, it must match the current character in the word.
  5. If there is a mismatch, the abbreviation is invalid.
  6. If you reach the end of both the abbreviation and the word simultaneously, the abbreviation is valid.
  7. If you reach the end of either the abbreviation or the word before the other, and there are still characters left in the other, the abbreviation is invalid.

Code Implementation

def is_valid_word_abbreviation(word, abbreviation):
    word_index = 0
    abbreviation_index = 0

    while abbreviation_index < len(abbreviation) and word_index < len(word):
        if abbreviation[abbreviation_index].isdigit():
            if abbreviation[abbreviation_index] == '0':
                return False

            number_of_skips = 0

            # Build the number from consecutive digits
            while abbreviation_index < len(abbreviation) and abbreviation[abbreviation_index].isdigit():
                number_of_skips = number_of_skips * 10 + int(abbreviation[abbreviation_index])
                abbreviation_index += 1

            # Advance word index by number. Necessary to 'skip' characters.
            word_index += number_of_skips

        else:
            # Characters must match.
            if word[word_index] != abbreviation[abbreviation_index]:
                return False

            word_index += 1
            abbreviation_index += 1

    # Both strings must be exhausted simultaneously.
    return word_index == len(word) and abbreviation_index == len(abbreviation)

Big(O) Analysis

Time Complexity
O(n)Let n be the length of the word and m be the length of the abbreviation. The algorithm iterates through both the word and the abbreviation at most once. Numerical values in the abbreviation dictate skips in the word, but each character or skip is processed only once. The dominant factor is proportional to the lengths of the word and abbreviation. Thus, the time complexity is O(max(n,m)), which can be simplified to O(n) since the abbreviation cannot be longer than a representation of the number of chars in word.
Space Complexity
O(1)The algorithm iterates through the input string and abbreviation using a constant number of variables to track the current positions. It does not create any auxiliary data structures like arrays, lists, or hash maps whose size depends on the input length. Therefore, the space complexity is constant, independent of the size of the input word or abbreviation.

Edge Cases

CaseHow to Handle
Null or empty wordReturn false immediately if the word is null or empty, since there's nothing to abbreviate.
Null or empty abbreviationReturn false immediately if the abbreviation is null or empty, unless the word is also empty, then return true.
Word is shorter than abbreviationReturn false since the abbreviation cannot match a shorter word.
Abbreviation contains invalid characters (non-numeric/non-alphabetic)Return false if invalid character in abbreviation or throw an exception for robustness.
Abbreviation starts with zero followed by other digits (e.g., '02')Return false since numbers with leading zeros are typically considered invalid.
Abbreviation ends with a numberCarefully handle string indices to avoid out-of-bounds exception when processing the last numeric sequence.
Abbreviation contains a number larger than the remaining word lengthReturn false, since that number of characters will go out of bounds.
Very long strings to avoid potential integer overflow when parsing numbers in the abbreviationUse long data type to store the parsed number if concerned about int overflow.