Taro Logo

Valid Word Abbreviation

Easy
Google logo
Google
4 views
Topics:
StringsTwo Pointers

You are given a string word and an abbreviation abbr. Your task is to determine whether the given abbreviation is a valid abbreviation for the word. A valid abbreviation replaces consecutive characters in the word with a number representing the count of those characters. The abbreviation can contain lowercase English letters and digits (1-9). Note that leading zeros are not valid in the numeric counts.

For example, the word "substitution" can be abbreviated as:

  • "substitution" (no abbreviation)
  • "s10n" (replaces "ubstitutio" with 10 characters)
  • "sub4tution" (replaces "stit" with 4 characters)
  • "12" (replaces the entire word with its length)
  • "su3i7n" (replaces "bst" with 3 and "utio" with 7)

Here are some examples:

  • word = "internationalization", abbr = "i12iz4n" should return true because 'i' + 12 characters + 'i' + 'z' + 4 characters + 'n' matches the word.
  • word = "apple", abbr = "a2e" should return false because 'a' + 2 characters + 'e' does not match the word "apple".
  • word = "substitution", abbr = "12" should return true.
  • word = "substitution", abbr = "sub4u4" should return true.
  • word = "substitution", abbr = "sub0tion" should return false because "0" is not a valid number (leading zero).
  • word = "a", abbr = "01" should return false because "01" is not a valid number (leading zero).

Write a function that takes the word and the abbreviation as input and returns true if the abbreviation is valid, and false otherwise. Be sure to consider edge cases and invalid abbreviations.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the `abbr` string contain leading zeros in the numerical parts of the abbreviation?
  2. Can the `word` or `abbr` strings be empty or null?
  3. If the abbreviation is invalid, should I return false immediately, or are there specific criteria for determining validity beyond a simple mismatch?
  4. Are the numerical parts of the abbreviation guaranteed to be within the integer range, or should I handle potential overflow?
  5. Is the input case-sensitive, meaning 'Word' is different from 'word'?

Brute Force Solution

Approach

The brute force approach to checking if an abbreviation is valid involves systematically comparing the abbreviation to the full word. We expand the abbreviation and character-by-character compare it to the word. If they match, the abbreviation is deemed valid, otherwise its not.

Here's how the algorithm would work step-by-step:

  1. Think of the abbreviation as a set of instructions on how to build a shorter form of the original word.
  2. Carefully go through the abbreviation, piece by piece.
  3. If a piece is a number, that means to skip that many characters in the original word.
  4. If a piece is a character, check if that character matches the character at the current position in the original word.
  5. Keep going through the abbreviation and the original word, following the skip instructions and character-matching instructions.
  6. If you reach the end of both the abbreviation and the original word at the same time, without finding any mismatches, then the abbreviation is valid.
  7. If at any point a character doesn't match, or you reach the end of one before the other, then the abbreviation is not valid.

Code Implementation

def is_valid_word_abbreviation(word, abbreviation):
    word_index = 0
    abbreviation_index = 0

    while abbreviation_index < len(abbreviation) and word_index < len(word):
        if abbreviation[abbreviation_index].isdigit():
            # Handle consecutive digits by accumulating the number
            number_of_skips = 0
            while abbreviation_index < len(abbreviation) and abbreviation[abbreviation_index].isdigit():
                if abbreviation[abbreviation_index] == '0' and number_of_skips == 0:
                    return False
                number_of_skips = number_of_skips * 10 + int(abbreviation[abbreviation_index])
                abbreviation_index += 1

            # Skip characters in the word based on the parsed number
            word_index += number_of_skips

        else:
            # Compare characters if the abbreviation contains a character
            if word[word_index] != abbreviation[abbreviation_index]:
                return False

            word_index += 1
            abbreviation_index += 1

    # Check if we've reached the end of both strings
    return word_index == len(word) and abbreviation_index == len(abbreviation)

Big(O) Analysis

Time Complexity
O(m + n)Let m be the length of the abbreviation string and n be the length of the word. The algorithm iterates through the abbreviation string once. Inside the loop, it either processes a character (O(1)) or a number representing the number of characters to skip in the word. The maximum number of skips cannot exceed the word's length n. Each character in the abbreviation is processed once, and we might iterate at most the length of the word based on the number in the abbreviation. Therefore, the time complexity is proportional to the sum of the lengths of the two strings, yielding O(m + n).
Space Complexity
O(1)The algorithm described operates directly on the input strings (the word and the abbreviation) without creating any auxiliary data structures like arrays, lists, or hash maps to store intermediate results. It uses a few integer variables to keep track of the current positions in the word and the abbreviation during the comparison process. Since the number of these variables remains constant regardless of the size of the input word or abbreviation, the space complexity is O(1).

Optimal Solution

Approach

The problem requires checking if an abbreviation correctly matches a given word. The efficient strategy involves comparing the word and the abbreviation sequentially, consuming characters from both as we go. Numerical segments in the abbreviation indicate a skip in the word.

Here's how the algorithm would work step-by-step:

  1. Start comparing the abbreviation and the word from the beginning.
  2. If you encounter a number in the abbreviation, treat it as the number of characters to skip in the word.
  3. Advance the position in the word by the amount specified by the number.
  4. If you encounter a character in the abbreviation, it must match the current character in the word.
  5. If there is a mismatch, the abbreviation is invalid.
  6. If you reach the end of both the abbreviation and the word simultaneously, the abbreviation is valid.
  7. If you reach the end of either the abbreviation or the word before the other, and there are still characters left in the other, the abbreviation is invalid.

Code Implementation

def is_valid_word_abbreviation(word, abbreviation):
    word_index = 0
    abbreviation_index = 0

    while abbreviation_index < len(abbreviation) and word_index < len(word):
        if abbreviation[abbreviation_index].isdigit():
            if abbreviation[abbreviation_index] == '0':
                return False

            number_of_skips = 0

            # Build the number from consecutive digits
            while abbreviation_index < len(abbreviation) and abbreviation[abbreviation_index].isdigit():
                number_of_skips = number_of_skips * 10 + int(abbreviation[abbreviation_index])
                abbreviation_index += 1

            # Advance word index by number. Necessary to 'skip' characters.
            word_index += number_of_skips

        else:
            # Characters must match.
            if word[word_index] != abbreviation[abbreviation_index]:
                return False

            word_index += 1
            abbreviation_index += 1

    # Both strings must be exhausted simultaneously.
    return word_index == len(word) and abbreviation_index == len(abbreviation)

Big(O) Analysis

Time Complexity
O(n)Let n be the length of the word and m be the length of the abbreviation. The algorithm iterates through both the word and the abbreviation at most once. Numerical values in the abbreviation dictate skips in the word, but each character or skip is processed only once. The dominant factor is proportional to the lengths of the word and abbreviation. Thus, the time complexity is O(max(n,m)), which can be simplified to O(n) since the abbreviation cannot be longer than a representation of the number of chars in word.
Space Complexity
O(1)The algorithm iterates through the input string and abbreviation using a constant number of variables to track the current positions. It does not create any auxiliary data structures like arrays, lists, or hash maps whose size depends on the input length. Therefore, the space complexity is constant, independent of the size of the input word or abbreviation.

Edge Cases

CaseHow to Handle
Null or empty wordReturn false immediately if the word is null or empty, since there's nothing to abbreviate.
Null or empty abbreviationReturn false immediately if the abbreviation is null or empty, unless the word is also empty, then return true.
Word is shorter than abbreviationReturn false since the abbreviation cannot match a shorter word.
Abbreviation contains invalid characters (non-numeric/non-alphabetic)Return false if invalid character in abbreviation or throw an exception for robustness.
Abbreviation starts with zero followed by other digits (e.g., '02')Return false since numbers with leading zeros are typically considered invalid.
Abbreviation ends with a numberCarefully handle string indices to avoid out-of-bounds exception when processing the last numeric sequence.
Abbreviation contains a number larger than the remaining word lengthReturn false, since that number of characters will go out of bounds.
Very long strings to avoid potential integer overflow when parsing numbers in the abbreviationUse long data type to store the parsed number if concerned about int overflow.