A sentence consists of lowercase letters ('a'
to 'z'
), digits ('0'
to '9'
), hyphens ('-'
), punctuation marks ('!'
, '.'
, and ','
), and spaces (' '
) only. Each sentence can be broken down into one or more tokens separated by one or more spaces ' '
.
A token is a valid word if all three of the following are true:
'-'
. If present, it must be surrounded by lowercase characters ("a-b"
is valid, but "-ab"
and "ab-"
are not valid)."ab,"
, "cd!"
, and "."
are valid, but "a!b"
and "c.,"
are not valid).Examples of valid words include "a-b."
, "afad"
, "ba-c"
, "a!"
, and "!"
.
Given a string sentence
, return the number of valid words in sentence
.
Example 1:
Input: sentence = "cat and dog" Output: 3 Explanation: The valid words in the sentence are "cat", "and", and "dog".
Example 2:
Input: sentence = "!this 1-s b8d!" Output: 0 Explanation: There are no valid words in the sentence. "!this" is invalid because it starts with a punctuation mark. "1-s" and "b8d" are invalid because they contain digits.
Example 3:
Input: sentence = "alice and bob are playing stone-game10" Output: 5 Explanation: The valid words in the sentence are "alice", "and", "bob", "are", and "playing". "stone-game10" is invalid because it contains digits.
Constraints:
1 <= sentence.length <= 1000
sentence
only contains lowercase English letters, digits, ' '
, '-'
, '!'
, '.'
, and ','
.1
token.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
We want to figure out how many words in a sentence are valid according to some rules. A brute force approach means we're going to check each word individually and thoroughly to see if it follows those rules.
Here's how the algorithm would work step-by-step:
def count_valid_words(sentence):
words = sentence.split()
valid_word_count = 0
for word in words:
is_valid = True
for index, character in enumerate(word):
if not (
'a' <= character <= 'z' or
character == '-' or
character in ['!', '.', ',']
):
is_valid = False
break
# Hyphens must be surrounded by letters
if character == '-':
if index == 0 or index == len(word) - 1:
is_valid = False
break
if not ('a' <= word[index - 1] <= 'z' and 'a' <= word[index + 1] <= 'z'):
is_valid = False
break
# Punctuation can only be at the end
if character in ['!', '.', ',']:
if index != len(word) - 1:
is_valid = False
break
# Increment count if word is valid.
if is_valid:
valid_word_count += 1
return valid_word_count
The most efficient way to solve this is to check each word independently against the rules. We'll then put it all together by only incrementing the total count if all the rules apply to the word.
Here's how the algorithm would work step-by-step:
def count_valid_words(sentence):
words = sentence.split()
valid_word_count = 0
for word in words:
is_valid = True
hyphen_count = 0
punctuation_count = 0
for i, char in enumerate(word):
if not ('a' <= char <= 'z' or char == '-' or char in ['!', '.', ',']):
is_valid = False
break
if char == '-':
hyphen_count += 1
#Hyphens must have letters on both sides.
if i == 0 or i == len(word) - 1 or not ('a' <= word[i - 1] <= 'z' and 'a' <= word[i + 1] <= 'z'):
is_valid = False
break
elif char in ['!', '.', ',']:
punctuation_count += 1
#Punctuation can only be at the end.
if i != len(word) - 1:
is_valid = False
break
#More than one hyphen or punctuation makes word invalid
if hyphen_count > 1 or punctuation_count > 1:
is_valid = False
#Empty words are invalid
if not word:
is_valid = False
if is_valid:
valid_word_count += 1
return valid_word_count
Case | How to Handle |
---|---|
Null or empty input string | Return 0 if input is null or empty string since there are no words. |
Sentence with only punctuation or special characters | Return 0 if the sentence only contains punctuation, as no valid words exist. |
Single character input string that's a valid word | Check if a single-character word is valid according to problem constraints and return 1 if so. |
Very long sentence exceeding reasonable limits | Consider performance implications; if splitting the sentence is inefficient, explore streaming or other techniques. |
Words containing consecutive hyphens or hyphens at the beginning/end | Reject words containing consecutive hyphens or starting/ending with hyphens. |
Words with multiple punctuations or invalid placements | Reject words with multiple punctuations or incorrect punctuation placements as invalid. |
Words containing digits | Reject any words containing digits as they are defined as invalid. |
Sentence with extremely long words to avoid potential performance issues. | Implement checks to limit word length, or handle larger words with optimized data structures. |