Taro Logo

Expressive Words

Medium
Asked by:
Profile picture
Profile picture
11 views
Topics:
ArraysStringsTwo Pointers

Sometimes people repeat letters to represent extra feeling. For example:

  • "hello" -> "heeellooo"
  • "hi" -> "hiiii"

In these strings like "heeellooo", we have groups of adjacent letters that are all the same: "h", "eee", "ll", "ooo".

You are given a string s and an array of query strings words. A query word is stretchy if it can be made to be equal to s by any number of applications of the following extension operation: choose a group consisting of characters c, and add some number of characters c to the group so that the size of the group is three or more.

  • For example, starting with "hello", we could do an extension on the group "o" to get "hellooo", but we cannot get "helloo" since the group "oo" has a size less than three. Also, we could do another extension like "ll" -> "lllll" to get "helllllooo". If s = "helllllooo", then the query word "hello" would be stretchy because of these two extension operations: query = "hello" -> "hellooo" -> "helllllooo" = s.

Return the number of query strings that are stretchy.

Example 1:

Input: s = "heeellooo", words = ["hello", "hi", "helo"]
Output: 1
Explanation: 
We can extend "e" and "o" in the word "hello" to get "heeellooo".
We can't extend "helo" to get "heeellooo" because the group "ll" is not size 3 or more.

Example 2:

Input: s = "zzzzzyyyyy", words = ["zzyy","zy","zyy"]
Output: 3

Constraints:

  • 1 <= s.length, words.length <= 100
  • 1 <= words[i].length <= 100
  • s and words[i] consist of lowercase letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the target string and the words in the array contain characters other than lowercase English letters?
  2. Can the input array 'words' be empty, or can any of the strings within 'words' be empty strings? What about the target string 's'?
  3. What is the maximum length of the target string 's' and the strings in the 'words' array? This will help me understand potential performance implications.
  4. By 'stretched', do you mean that a character can only be stretched if it appears at least three times consecutively in the original word, or can shorter sequences also be stretched? For example, is 'aaab' a stretched version of 'ab'?
  5. If a word matches the target string, should I return a count of all matches, even if some words are duplicates?

Brute Force Solution

Approach

The goal is to find how many words from a given list are considered 'expressive' versions of a target string. We'll do this by checking each word individually against the target string to see if it qualifies as an expressive word using brute force.

Here's how the algorithm would work step-by-step:

  1. Pick the first word from the list of words.
  2. Compare this word to the target string character by character.
  3. When comparing, account for the fact that the word can stretch repeating characters in the target string.
  4. If the current word can be stretched to match the target string, mark this word as an expressive word.
  5. Repeat this process for every word in the list.
  6. Once all words are checked, count the total number of expressive words.
  7. Return the final count.

Code Implementation

def expressive_words_brute_force(target_string, words):
    expressive_count = 0

    for word in words:
        target_index = 0
        word_index = 0
        is_expressive = True

        while word_index < len(word) and target_index < len(target_string):
            if word[word_index] != target_string[target_index]:
                is_expressive = False
                break

            word_char = word[word_index]
            target_char = target_string[target_index]
            word_char_count = 0
            target_char_count = 0

            # Count consecutive occurrences of the character
            while word_index < len(word) and word[word_index] == word_char:
                word_char_count += 1
                word_index += 1

            while target_index < len(target_string) and target_string[target_index] == target_char:
                target_char_count += 1
                target_index += 1

            # Check for expressive conditions
            if word_char_count > target_char_count:
                is_expressive = False
                break

            # Target needs to have at least 3 if stretched
            if target_char_count < 3 and word_char_count != target_char_count:
                is_expressive = False
                break

        # Ensure both strings are exhausted
        if word_index != len(word) or target_index != len(target_string):
            is_expressive = False

        # Increment the expressive count if expressive is True
        if is_expressive:
            expressive_count += 1

    return expressive_count

Big(O) Analysis

Time Complexity
O(S * W)The outer loop iterates through each of the W words in the input array words. Inside this loop, we compare each word to the target string S. The comparison involves iterating through the characters of both the target string and the current word. In the worst case, this comparison might take O(S) time. Therefore, the total time complexity is O(S * W), where S is the length of the target string and W is the number of words in the input array.
Space Complexity
O(1)The provided algorithm iterates through the words and compares them against the target string character by character. It doesn't explicitly mention creating any auxiliary data structures like lists, dictionaries, or sets to store intermediate results. The comparison likely utilizes a few integer variables for indexing or counters. Therefore, the extra space used remains constant irrespective of the input size (number of words or the length of the target string), making the auxiliary space complexity O(1).

Optimal Solution

Approach

The best way to solve this problem is to compare the target string and each word, character by character. The trick is to focus on the compressed form of each string, representing consecutive repeating characters as a count.

Here's how the algorithm would work step-by-step:

  1. First, transform the target string into a simplified version where each character is followed by the number of times it repeats consecutively. For example, 'heeellooo' becomes 'h1e2l2o3'.
  2. Then, for each word, do the same transformation: convert it into a simplified, count-based representation.
  3. Now, compare the transformed target string and the transformed word. For the word to be 'expressive', two things must be true for each character group:
  4. a) The character in the target and word must be the same.
  5. b) The count of the character in the target must be at least as large as the count in the word. If the target's count is less than 3, it must be exactly equal to the word's count.
  6. If any of these conditions are not met for any character group, then the word is not expressive. Otherwise, it is.
  7. Count the number of words that are expressive according to these rules. This gives you the final result.

Code Implementation

def expressive_words(target_string, words):
    def get_compressed_string(input_string):
        compressed = []
        index = 0
        while index < len(input_string):
            character = input_string[index]
            count = 0
            while index < len(input_string) and input_string[index] == character:
                count += 1
                index += 1
            compressed.append((character, count))
        return compressed

    compressed_target = get_compressed_string(target_string)
    count_expressive_words = 0

    for word in words:
        compressed_word = get_compressed_string(word)
        is_expressive = True
        
        # Must compare compressed target and word
        if len(compressed_target) != len(compressed_word):
            continue
            
        for i in range(len(compressed_target)):
            target_character, target_count = compressed_target[i]
            word_character, word_count = compressed_word[i]

            if target_character != word_character:
                is_expressive = False
                break

            # Target count must be >= word count
            if target_count < word_count:
                is_expressive = False
                break
            
            # Target count < 3 must be equal to word count
            if target_count < 3 and target_count != word_count:
                is_expressive = False
                break

            # Target count >= 3 must be >= word count
            if target_count >= 3 and target_count < word_count:
                is_expressive = False
                break
                
        if is_expressive:
            count_expressive_words += 1
            
    return count_expressive_words

Big(O) Analysis

Time Complexity
O(S + W*max(S, L))The primary operations involve transforming the target string S of length s and each word in the words array of length W. Transforming the target string takes O(s) time. For each of the W words, transforming a word of length L takes O(L) time, and comparing the transformed word to the transformed target string also takes at most O(max(s, L)) where s is the length of the target string after compression and L is the length of the compressed word. Therefore the total time complexity is O(s + W * max(s, L)), where s is the length of the target string S, W is the number of words, and L is the length of the longest word.
Space Complexity
O(N + M)The algorithm transforms both the target string (length N) and each word (average length M) into a simplified, count-based representation. This transformation involves creating new strings or lists (depending on implementation) to store character-count pairs for both the target and the word being compared. In the worst case, the transformed representations could have a length proportional to the original lengths of the target string and the word. Therefore, the space complexity is O(N + M), where N is the length of the target string and M is the maximum length of any word in the input list of words.

Edge Cases

Null or empty string S
How to Handle:
Return 0, as there are no expressive words if the base string is empty.
Null or empty words array
How to Handle:
Return 0, as there are no words to check against S if the words array is empty.
Empty string S and empty words array
How to Handle:
Return 0, as neither the source nor the target words exist.
String S with only one character
How to Handle:
Handle words with a single character and lengths 1, 2, or 3 correctly according to the stretching rule.
Word in words longer than S
How to Handle:
Such words can never be expressive, so they should be immediately skipped/counted as not expressive.
Large input strings S and words (performance)
How to Handle:
Ensure that the character run length encoding and subsequent comparisons are done efficiently to avoid time limit exceeded errors; consider pre-processing the words.
String S with all the same character (e.g., 'aaaa')
How to Handle:
Handle the stretching of single-character words to match S's length correctly, validating if count(word) <= count(S) and count(S) >= 3.
Word in words has characters not in S
How to Handle:
Such words are automatically not expressive and should be immediately marked as such.