Taro Logo

Count Common Words With One Occurrence

Easy
Jane Street logo
Jane Street
7 views
Topics:
ArraysStrings

Given two string arrays words1 and words2, return the number of strings that appear exactly once in each of the two arrays.

Example 1:

Input: words1 = ["leetcode","is","amazing","as","is"], words2 = ["amazing","leetcode","is"]
Output: 2
Explanation:
- "leetcode" appears exactly once in each of the two arrays. We count this string.
- "amazing" appears exactly once in each of the two arrays. We count this string.
- "is" appears in each of the two arrays, but there are 2 occurrences of it in words1. We do not count this string.
- "as" appears once in words1, but does not appear in words2. We do not count this string.
Thus, there are 2 strings that appear exactly once in each of the two arrays.

Example 2:

Input: words1 = ["b","bb","bbb"], words2 = ["a","aa","aaa"]
Output: 0
Explanation: There are no strings that appear in each of the two arrays.

Example 3:

Input: words1 = ["a","ab"], words2 = ["a","a","a","ab"]
Output: 1
Explanation: The only string that appears exactly once in each of the two arrays is "ab".

Constraints:

  • 1 <= words1.length, words2.length <= 1000
  • 1 <= words1[i].length, words2[j].length <= 30
  • words1[i] and words2[j] consists only of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the maximum lengths of `words1` and `words2`? Are there any size limitations I should be aware of?
  2. Can the strings in `words1` and `words2` contain any special characters, or are they limited to alphanumeric characters?
  3. Is case significant? Should I treat 'apple' and 'Apple' as the same word or as distinct words?
  4. If there are no common words appearing exactly once in both arrays, what should the function return?
  5. Are `words1` and `words2` guaranteed to be non-null and contain at least one element?

Brute Force Solution

Approach

The brute force method for counting common words with one occurrence is quite straightforward. We essentially check every single word in the first list against every single word in the second list to see if they match. We then count how many times each matching word appears across both lists to determine if it occurred only once.

Here's how the algorithm would work step-by-step:

  1. Take the first word from the first list.
  2. Compare this word to every word in the second list.
  3. If you find a match, remember that word.
  4. Do this for every word in the first list.
  5. Now, for each of the words you remembered, count how many times they appear in the first list and the second list combined.
  6. If a word appears exactly once in the first list and once in the second list and no other times, increase your count of common words with one occurrence.
  7. The final count is your answer.

Code Implementation

def count_common_words(first_word_list, second_word_list):
    common_words = []
    # Iterate through the first word list.
    for first_word in first_word_list:
        for second_word in second_word_list:
            if first_word == second_word:
                common_words.append(first_word)

    count_of_common_words = 0
    # Check each common word's total occurences.
    for word in common_words:
        total_occurrences = 0
        for first_word in first_word_list:
            if first_word == word:
                total_occurrences += 1

        for second_word in second_word_list:
            if second_word == word:
                total_occurrences += 1

        # Ensure word appears only once in each list
        first_list_occurrence = 0
        for first_word in first_word_list:
            if first_word == word:
                first_list_occurrence += 1

        second_list_occurrence = 0
        for second_word in second_word_list:
            if second_word == word:
                second_list_occurrence += 1

        if first_list_occurrence == 1 and second_list_occurrence == 1:
            count_of_common_words += 1

    return count_of_common_words

Big(O) Analysis

Time Complexity
O(n^2)Let n be the number of words in each of the input lists, words1 and words2. The algorithm iterates through each word in words1 (n iterations). For each word in words1, it compares it against every word in words2 (another n iterations in the worst case to find potential matches). Afterwards, the algorithm iterates through the matched words and counts their occurrences across both lists, which takes O(n) time in the worst case for each matched word. Therefore, the nested loops result in approximately n * n operations for matching and counting, simplifying to O(n^2).
Space Complexity
O(N)The brute force algorithm needs to remember the common words found between the two lists. In the worst-case scenario, all words in the first list are present in the second list, meaning we need to store N words, where N is the number of words in the first list. The algorithm then counts occurrences of each remembered word, requiring space to potentially store the counts. Thus, the auxiliary space grows linearly with the input size, giving a space complexity of O(N).

Optimal Solution

Approach

To efficiently count common words appearing exactly once in two lists, we use a counting method. We first count the occurrences of each word in both lists separately. Then, we compare the counts to find words that appear only once in each list and are common to both.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each word appears in the first list.
  2. Next, count how many times each word appears in the second list.
  3. Then, check which words are present in both lists.
  4. For each common word, verify that it only appears once in the first list and once in the second list.
  5. If a word meets both conditions (common and appearing only once in each list), increase the counter.
  6. Finally, return the total count of common words appearing once in both lists.

Code Implementation

def count_common_words(words1, words2):
    word_counts1 = {}
    word_counts2 = {}

    for word in words1:
        word_counts1[word] = word_counts1.get(word, 0) + 1

    for word in words2:
        word_counts2[word] = word_counts2.get(word, 0) + 1

    common_word_count = 0

    # Only check words present in both lists
    for word, count in word_counts1.items():
        if word in word_counts2:

            # Verify that the word appears only once in each list.
            if word_counts1[word] == 1 and word_counts2[word] == 1:
                common_word_count += 1

    return common_word_count

Big(O) Analysis

Time Complexity
O(m + n)Let m be the length of words1 and n be the length of words2. The first step involves counting word occurrences in words1, which takes O(m) time. Similarly, counting word occurrences in words2 takes O(n) time. Comparing the counts and checking the conditions (common and appearing once) involves iterating through the words in the hashmaps which takes at most O(m + n) time. Therefore, the overall time complexity is O(m + n).
Space Complexity
O(N)The algorithm uses two hash maps (or dictionaries) to count word occurrences in each list. In the worst case, all words in both lists are unique, leading to each hash map storing N key-value pairs where N is the total number of words across both lists. Therefore, the auxiliary space used is proportional to N, resulting in a space complexity of O(N).

Edge Cases

CaseHow to Handle
Both input arrays are null or emptyReturn 0, as there are no words to compare.
One array is null/empty, the other is notReturn 0, as there cannot be any common words.
Arrays contain only one word each, which are the sameReturn 1, as this single word appears exactly once in both.
Arrays contain only one word each, which are differentReturn 0, as there are no common words.
Arrays contain the same word multiple times (e.g., ['a','a','a'], ['a','a'])The counting mechanism (e.g., using HashMaps) should ensure only words appearing exactly once in both arrays are counted.
Arrays contain very large number of words (scalability)Using HashMaps provides an average time complexity of O(n+m) for counting words, where n and m are the lengths of the arrays.
Arrays contain words with different casing (e.g., 'hello' and 'Hello')Convert all words to lowercase before processing to ensure case-insensitive comparison.
Arrays contain words with leading or trailing spacesTrim leading/trailing whitespace from words before processing to avoid incorrect matching.