Taro Logo

Shortest Word Distance II

Medium
LinkedIn logo
LinkedIn
3 views
Topics:
ArraysStringsTwo PointersDynamic Programming

Design a data structure that will be initialized with a list of words. The words provided can be preprocessed, allowing for repeated queries to find the shortest distance between two distinct words.

Implement the WordDistance class:

  • WordDistance(String[] wordsDict) initializes the object with the list of words wordsDict.
  • int shortest(String word1, String word2) returns the shortest distance between word1 and word2 in the list.

Example 1:

Input
["WordDistance", "shortest", "shortest"]
[[["practice", "makes", "perfect", "coding", "makes"]], ["coding", "practice"], ["makes", "coding"]]
Output
[null, 3, 1]

Explanation
WordDistance wordDistance = new WordDistance(["practice", "makes", "perfect", "coding", "makes"]);
wordDistance.shortest("coding", "practice"); // return 3
wordDistance.shortest("makes", "coding");    // return 1

Constraints:

  • 1 <= wordsDict.length <= 3 * 104
  • 1 <= wordsDict[i].length <= 10
  • wordsDict[i] consists of lowercase English letters.
  • word1 and word2 are in wordsDict.
  • word1 != word2
  • At most 5000 calls will be made to shortest.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Will the list of words provided during initialization ever be modified after the class is created?
  2. Can the same word appear multiple times in the input word list, and if so, should all occurrences be considered?
  3. If either word1 or word2 is not found in the word list, what should the `shortest` method return? Should I return -1, positive infinity, or throw an exception?
  4. Are word1 and word2 guaranteed to be different words, or could they be the same?
  5. What is the expected size range for the word list passed to the constructor? This will help me choose the most appropriate data structure to store the word indices.

Brute Force Solution

Approach

We are given a list of words and we need to find the shortest distance between two specific words that appear in the list. The brute force method involves checking every possible pair of occurrences of the two words.

Here's how the algorithm would work step-by-step:

  1. First, find all the places where the first word appears in the list.
  2. Then, find all the places where the second word appears in the list.
  3. Next, for every place where the first word appears, calculate how far away each appearance of the second word is.
  4. Keep track of the shortest distance you find between any pair of these locations.
  5. After checking every possible pairing of the first and second words, the smallest distance you recorded is the answer.

Code Implementation

def shortest_word_distance_brute_force(words, word1, word2):
    word1_indices = []
    word2_indices = []

    # Find indices of word1
    for index, word in enumerate(words):
        if word == word1:
            word1_indices.append(index)

    # Find indices of word2
    for index, word in enumerate(words):
        if word == word2:
            word2_indices.append(index)

    minimum_distance = float('inf')

    # Iterate through all pairs of word1 and word2 indices
    for word1_index in word1_indices:
        for word2_index in word2_indices:

            #Calculate distance between the two words
            distance = abs(word1_index - word2_index)

            #Track the shortest distance found
            minimum_distance = min(minimum_distance, distance)

    return minimum_distance

Big(O) Analysis

Time Complexity
O(n*m)The algorithm first iterates through the entire word list of size n to find all occurrences of word1, which takes O(n) time. Similarly, it iterates again to find all occurrences of word2, also taking O(n) time in the worst case if all words are word1 or word2. Let's say the number of occurrences of word1 is 'k' and the number of occurrences of word2 is 'm', where k and m can each be as large as 'n' in the worst case. The algorithm then iterates through each of the 'k' occurrences of word1 and, for each of those, iterates through all 'm' occurrences of word2 to find the minimum distance. Thus, this nested loop has a time complexity of O(k*m), which in the worst case where k and m are both proportional to n, simplifies to O(n*n) or O(n^2). But in the scenario where we have k and m, the time complexity of calculating the distance is O(k*m) which can be expressed as O(n*m) where n and m are the count of the respective words.
Space Complexity
O(N)The algorithm stores the indices of all occurrences of the first and second words in two separate lists. In the worst-case scenario, both words appear frequently throughout the input list of N words. Therefore, the two lists could potentially store up to N indices in total. Hence, the auxiliary space used scales linearly with the input size N, resulting in a space complexity of O(N).

Optimal Solution

Approach

The trick is to store where each word appears in the text beforehand. Then, when we need to find the shortest distance between two words, we can quickly go through their locations and efficiently find the minimum distance, instead of searching the entire text again and again.

Here's how the algorithm would work step-by-step:

  1. First, go through the entire text and record the positions (like, position numbers) where each word appears. Create a list of positions for each unique word.
  2. When asked to find the shortest distance between two words, use the lists of positions you created earlier.
  3. Go through the positions of the first word and the positions of the second word, checking each pair of positions to see how far apart they are.
  4. Keep track of the smallest distance you find as you go through the pairs of positions.
  5. The smallest distance you tracked is the shortest distance between the two words.

Code Implementation

class WordDistance:

    def __init__(self, words_list):
        self.word_index_map = {}
        for index, word in enumerate(words_list):
            if word not in self.word_index_map:
                self.word_index_map[word] = []
            self.word_index_map[word].append(index)

    def shortest(self, word1, word2):
        # Obtain indices for both words.
        word1_indices = self.word_index_map[word1]
        word2_indices = self.word_index_map[word2]
        
        minimum_distance = float('inf')
        word1_index_pointer = 0
        word2_index_pointer = 0

        # Iterate to find the shortest distance
        while word1_index_pointer < len(word1_indices) and word2_index_pointer < len(word2_indices):
            index1 = word1_indices[word1_index_pointer]
            index2 = word2_indices[word2_index_pointer]
            
            current_distance = abs(index1 - index2)
            minimum_distance = min(minimum_distance, current_distance)

            # Move the pointer of the word with the smaller index.
            if index1 < index2:
                word1_index_pointer += 1
            else:
                word2_index_pointer += 1

        return minimum_distance

Big(O) Analysis

Time Complexity
O(m*n)The constructor iterates through the input word array of size N to build a hash map of words and their indices, which takes O(N) time. The shortest method takes two words, w1 and w2, and retrieves their index lists, which we'll call of size m and n, respectively. It then iterates through each index in the first list and compares it to each index in the second list to find the minimum difference. Therefore, this nested loop results in m * n comparisons. The shortest method's time complexity is O(m*n), where m and n are the number of occurrences of w1 and w2 respectively. In the worst case where w1 and w2 appear relatively evenly throughout the whole initial array and the occurrences of the two words are of similar magnitude, m and n will grow linearly with the size of the input array and thus the worst case can be seen as O(N^2) though that is uncommon.
Space Complexity
O(N)The space complexity is dominated by storing the indices of each word's occurrences. In the worst case, if all words in the input are the same, we will store N indices, where N is the total number of words in the input text. This storage happens within the lists associated with each word, thus creating a hashmap from words to a list of indices. Therefore the auxiliary space required grows linearly with the number of words in the input text.

Edge Cases

CaseHow to Handle
wordsDict is null or emptyThrow an IllegalArgumentException or return -1 to indicate invalid input, as no word distances can be calculated.
word1 or word2 is null or emptyThrow an IllegalArgumentException or return -1, as a valid distance cannot be computed without valid words.
word1 and word2 are the same wordFind all occurrences of the word and compute the minimum distance between adjacent occurrences; return Integer.MAX_VALUE if less than 2 occurrences.
wordsDict contains only one unique wordIf word1 and word2 are this unique word, find all occurrences and compute minimum distance; otherwise return Integer.MAX_VALUE since the non-existent word won't be found.
word1 or word2 are not present in wordsDictReturn Integer.MAX_VALUE, indicating that no valid distance exists since one or both words are missing.
wordsDict contains a very large number of words, causing memory issues when storing indicesConsider using an external data structure or database to store and retrieve indices if the wordsDict is too large to fit in memory.
Integer overflow when calculating the difference between large indicesUse long to store the distances between indices to prevent potential integer overflows during calculation.
Very long words in wordsDict, potentially affecting string comparison performanceString comparison using .equals() is efficient, but can be improved with hashing long strings if performance is critical.