Taro Logo

Number of Valid Words in a Sentence

Easy
Cisco logo
Cisco
2 views
Topics:
Strings

A sentence consists of lowercase letters ('a' to 'z'), digits ('0' to '9'), hyphens ('-'), punctuation marks ('!', '.', and ','), and spaces (' ') only. Each sentence can be broken down into one or more tokens separated by one or more spaces ' '.

A token is a valid word if all three of the following are true:

  • It only contains lowercase letters, hyphens, and/or punctuation (no digits).
  • There is at most one hyphen '-'. If present, it must be surrounded by lowercase characters ("a-b" is valid, but "-ab" and "ab-" are not valid).
  • There is at most one punctuation mark. If present, it must be at the end of the token ("ab,", "cd!", and "." are valid, but "a!b" and "c.," are not valid).

Examples of valid words include "a-b.", "afad", "ba-c", "a!", and "!".

Given a string sentence, return the number of valid words in sentence.

Example 1:

Input: sentence = "cat and  dog"
Output: 3
Explanation: The valid words in the sentence are "cat", "and", and "dog".

Example 2:

Input: sentence = "!this  1-s b8d!"
Output: 0
Explanation: There are no valid words in the sentence.
"!this" is invalid because it starts with a punctuation mark.
"1-s" and "b8d" are invalid because they contain digits.

Example 3:

Input: sentence = "alice and  bob are playing stone-game10"
Output: 5
Explanation: The valid words in the sentence are "alice", "and", "bob", "are", and "playing".
"stone-game10" is invalid because it contains digits.

Constraints:

  • 1 <= sentence.length <= 1000
  • sentence only contains lowercase English letters, digits, ' ', '-', '!', '.', and ','.
  • There will be at least 1 token.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters are considered valid in a word, beyond lowercase English letters (e.g., hyphens, punctuation)?
  2. What defines a valid sentence? Can it start or end with punctuation?
  3. Can the input sentence be null or empty? What should I return in those cases?
  4. Are there any constraints on the length of a word or the entire sentence?
  5. Can a word consist solely of punctuation, and is that considered valid?

Brute Force Solution

Approach

We want to figure out how many words in a sentence are valid according to some rules. A brute force approach means we're going to check each word individually and thoroughly to see if it follows those rules.

Here's how the algorithm would work step-by-step:

  1. First, take the sentence and break it down into individual words.
  2. Now, for each word, we need to check if it's valid.
  3. To check a word, we will look at each character one by one to make sure it meets all the rules, like if it only contains letters, hyphens, and punctuation.
  4. If we find anything that violates the rules, like a number or a symbol that is not allowed, the word is invalid and we can stop checking it.
  5. If the word makes it through all the checks without violating any rules, then it is a valid word and we add it to our count.
  6. After we have checked every word in the sentence, the final count is the total number of valid words.

Code Implementation

def count_valid_words(sentence):
    words = sentence.split()
    valid_word_count = 0

    for word in words:
        is_valid = True
        for index, character in enumerate(word):
            if not (
                'a' <= character <= 'z' or
                character == '-' or
                character in ['!', '.', ',']
            ):
                is_valid = False
                break

            # Hyphens must be surrounded by letters
            if character == '-':
                if index == 0 or index == len(word) - 1:
                    is_valid = False
                    break
                if not ('a' <= word[index - 1] <= 'z' and 'a' <= word[index + 1] <= 'z'):
                    is_valid = False
                    break
            
            # Punctuation can only be at the end
            if character in ['!', '.', ',']:
                if index != len(word) - 1:
                    is_valid = False
                    break

        # Increment count if word is valid.
        if is_valid:
            valid_word_count += 1

    return valid_word_count

Big(O) Analysis

Time Complexity
O(n*m)The input is a sentence of n words, where m is the average length of a word. First, the sentence is split into n words. Then, for each word, we iterate through its characters to validate it. Thus, we have an outer loop running n times (for each word) and an inner loop running at most m times (for each character in a word). Therefore, the time complexity is O(n*m).
Space Complexity
O(W)The space complexity depends primarily on the size of the word array created in step 1. Let W be the number of words in the sentence. Although the plain English explanation does not explicitly define a data structure besides the word array, the explanation implies creation of single word strings during validation, which can be a substring of the initial word. In the worst case, if the sentence is extremely long, storing the words after splitting could take O(W) space, where W is the number of words.

Optimal Solution

Approach

The most efficient way to solve this is to check each word independently against the rules. We'll then put it all together by only incrementing the total count if all the rules apply to the word.

Here's how the algorithm would work step-by-step:

  1. Split the sentence into individual words.
  2. For each word, check if it meets the given criteria: does it only contain lowercase letters, hyphens, and punctuation?
  3. Check for any forbidden patterns like multiple hyphens in a row, or hyphens at the beginning or end of the word.
  4. Also verify there is at most one punctuation mark, and if there is one, it must be at the end of the word.
  5. If a word passes all these checks, count it as a valid word.
  6. Return the total count of valid words.

Code Implementation

def count_valid_words(sentence):
    words = sentence.split()
    valid_word_count = 0

    for word in words:
        is_valid = True
        hyphen_count = 0
        punctuation_count = 0

        for i, char in enumerate(word):
            if not ('a' <= char <= 'z' or char == '-' or char in ['!', '.', ',']):
                is_valid = False
                break

            if char == '-':
                hyphen_count += 1

                #Hyphens must have letters on both sides.
                if i == 0 or i == len(word) - 1 or not ('a' <= word[i - 1] <= 'z' and 'a' <= word[i + 1] <= 'z'):
                    is_valid = False
                    break
            elif char in ['!', '.', ',']:
                punctuation_count += 1

                #Punctuation can only be at the end.
                if i != len(word) - 1:
                    is_valid = False
                    break
        
        #More than one hyphen or punctuation makes word invalid
        if hyphen_count > 1 or punctuation_count > 1:
            is_valid = False

        #Empty words are invalid
        if not word:
            is_valid = False

        if is_valid:
            valid_word_count += 1

    return valid_word_count

Big(O) Analysis

Time Complexity
O(n)The algorithm's time complexity is determined by the number of words in the input sentence, which we'll denote as n. We iterate through each of these n words once. Within each iteration, we perform a series of checks on the characters of the word, each taking a time proportional to the length of the word. Since the total number of characters across all words is proportional to the length of the original sentence (and assuming the average word length is relatively constant), the inner checks can be considered to take constant time per word on average. Therefore, the dominant operation is iterating through the n words, resulting in an overall time complexity of O(n).
Space Complexity
O(N)The space complexity is determined by the splitting of the input sentence into individual words. In the worst-case scenario, the sentence consists of N words separated by spaces, leading to an auxiliary list of size N to store these words. The other operations (checks and validations) are performed in-place on each word, requiring only constant space. Therefore, the dominant space usage is O(N), where N is the length of the input sentence.

Edge Cases

CaseHow to Handle
Null or empty input stringReturn 0 if input is null or empty string since there are no words.
Sentence with only punctuation or special charactersReturn 0 if the sentence only contains punctuation, as no valid words exist.
Single character input string that's a valid wordCheck if a single-character word is valid according to problem constraints and return 1 if so.
Very long sentence exceeding reasonable limitsConsider performance implications; if splitting the sentence is inefficient, explore streaming or other techniques.
Words containing consecutive hyphens or hyphens at the beginning/endReject words containing consecutive hyphens or starting/ending with hyphens.
Words with multiple punctuations or invalid placementsReject words with multiple punctuations or incorrect punctuation placements as invalid.
Words containing digitsReject any words containing digits as they are defined as invalid.
Sentence with extremely long words to avoid potential performance issues.Implement checks to limit word length, or handle larger words with optimized data structures.