Given a string paragraph
and a string array of the banned words banned
, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.
The words in paragraph
are case-insensitive and the answer should be returned in lowercase.
Note that words can not contain punctuation symbols.
Example 1:
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2:
Input: paragraph = "a.", banned = [] Output: "a"
Constraints:
1 <= paragraph.length <= 1000
' '
, or one of the symbols: "!?',;."
.0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i]
consists of only lowercase English letters.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force approach to finding the most common word is like meticulously counting each word one by one. We clean up the input text, then compare each word against every other word to see how many times it appears. Finally, we return the word that shows up the most, ignoring any words on a forbidden list.
Here's how the algorithm would work step-by-step:
def most_common_word_brute_force(paragraph, banned_words):
normalized_paragraph = ''.join([character.lower() if character.isalnum() else ' ' for character in paragraph])
words = normalized_paragraph.split()
most_common_word = ''
highest_frequency = 0
for word in words:
# We must ignore any banned words as they are invalid for the final result
if word in banned_words:
continue
frequency = 0
for other_word in words:
if word == other_word:
frequency += 1
# Only update the most common word if this word has a higher frequency
if frequency > highest_frequency:
most_common_word = word
highest_frequency = frequency
return most_common_word
To find the most common word, we first need to clean up the input text and then count how often each word appears. We can efficiently keep track of these counts and ignore any words that are on a given ban list.
Here's how the algorithm would work step-by-step:
import re
def most_common_word(paragraph, banned_words):
normalized_paragraph = re.sub(r'[!?,.;]', ' ', paragraph).lower()
words = normalized_paragraph.split()
word_counts = {}
for word in words:
# We need to ignore banned words from the counts.
if word not in banned_words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
most_common_word_value = ''
max_count = 0
# Find the word with the highest count.
for word, count in word_counts.items():
if count > max_count:
# We must track both the word and its count to determine the most common.
most_common_word_value = word
max_count = count
return most_common_word_value
Case | How to Handle |
---|---|
Null or empty paragraph string | Return an empty string immediately, indicating no most common word exists. |
Null or empty banned word list | Treat as if no words are banned and proceed with the normal word counting process. |
Paragraph containing only banned words | Return an empty string since no non-banned word exists. |
Paragraph containing only punctuation or special characters | After cleaning, this will result in an empty string, so return an empty string. |
Extremely long paragraph string (memory concerns) | Ensure the word counting hashmap can scale efficiently to handle a large number of unique words, potentially using a more memory-efficient data structure if necessary. |
Paragraph with mixed-case words that are otherwise identical | Convert all words to lowercase to ensure accurate counting of words regardless of case. |
Banned words containing punctuation or special characters. | Remove the punctuation from the banned words as well, to ensure accurate comparison after the paragraph cleaning. |
Integer overflow in word count | Use a data type large enough to hold the word counts, such as a long or double, or be aware that the results may be invalid if counts exceed maximum integer size. |