Taro Logo

Kth Distinct String in an Array

#817 Most AskedEasy
Topics:
ArraysStrings

A distinct string is a string that is present only once in an array.

Given an array of strings arr, and an integer k, return the kth distinct string present in arr. If there are fewer than k distinct strings, return an empty string "".

Note that the strings are considered in the order in which they appear in the array.

Example 1:

Input: arr = ["d","b","c","b","c","a"], k = 2
Output: "a"
Explanation:
The only distinct strings in arr are "d" and "a".
"d" appears 1st, so it is the 1st distinct string.
"a" appears 2nd, so it is the 2nd distinct string.
Since k == 2, "a" is returned. 

Example 2:

Input: arr = ["aaa","aa","a"], k = 1
Output: "aaa"
Explanation:
All strings in arr are distinct, so the 1st string "aaa" is returned.

Example 3:

Input: arr = ["a","b","a"], k = 3
Output: ""
Explanation:
The only distinct string is "b". Since there are fewer than 3 distinct strings, we return an empty string "".

Constraints:

  • 1 <= k <= arr.length <= 1000
  • 1 <= arr[i].length <= 5
  • arr[i] consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected range for the length of the input array `arr` and the magnitude of `k`?
  2. Can the strings in the input array `arr` be empty or null? If so, how should I handle them?
  3. If `k` is larger than the number of distinct strings in `arr`, what should I return?
  4. Are we concerned about the case sensitivity of the strings when determining distinctness (e.g., should "apple" and "Apple" be considered distinct)?
  5. Can the input array `arr` contain null values? If so, how should I handle that?

Brute Force Solution

Approach

The goal is to find the string that appears only once in a list, and is the Kth such unique string. The brute force way to do this is to check each string in the list one by one and see how many times it shows up.

Here's how the algorithm would work step-by-step:

  1. Go through the entire list of strings, starting from the very first one.
  2. For each string, compare it with every other string in the list, including itself.
  3. Count how many times the string is identical to other strings on the list.
  4. If the string appears exactly once, add it to a new list of unique strings.
  5. Repeat this process for every string in the original list.
  6. Once you've gone through all the strings, you'll have a new list containing only the unique strings, in the order they appeared in the original list.
  7. Find the Kth item in the list of unique strings. If the list doesn't have K items, then there is no Kth unique string.

Code Implementation

def find_kth_distinct_string(string_array, k_value):
    unique_strings = []

    for current_string in string_array:
        string_frequency = 0

        for other_string in string_array:
            if current_string == other_string:
                string_frequency += 1

        # Filter out strings that appear more than once.
        if string_frequency == 1:

            unique_strings.append(current_string)

    # Check if k_value is within the bounds of unique_strings
    if k_value > 0 and k_value <= len(unique_strings):

        return unique_strings[k_value - 1]
    
    #Return empty string if not found
    return ""

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n strings in the input array. For each string, it compares it to every other string in the array to determine its frequency. This inner comparison also takes O(n) time. Therefore, the nested iterations result in a time complexity of O(n * n). Thus, the overall time complexity of the algorithm is O(n²).
Space Complexity
O(N)The auxiliary space complexity is O(N) because a new list of unique strings is created. In the worst-case scenario, all N strings in the input array are distinct, meaning the new list could potentially store all N strings. Therefore, the space required for the new list grows linearly with the number of strings in the input array. No other significant auxiliary data structures are used.

Optimal Solution

Approach

To efficiently find the Kth unique word, we use a counting trick. The key idea is to count how many times each word appears, and then go through the list to find the Kth word that appears only once.

Here's how the algorithm would work step-by-step:

  1. First, count how many times each word appears in the list.
  2. Then, go through the list of words in the order they were given.
  3. Keep track of the number of unique words we've seen so far.
  4. If we find a word that appears only once, increase our unique word count.
  5. If the unique word count matches K, then that word is the answer.
  6. If we get to the end of the list and haven't found the Kth unique word, then there isn't one.

Code Implementation

def find_kth_distinct_string(words, k_value):
    word_counts = {}
    for word in words:
        word_counts[word] = word_counts.get(word, 0) + 1

    distinct_word_count = 0
    for word in words:
        # Only consider words that appear exactly once
        if word_counts[word] == 1:

            distinct_word_count += 1

            # Check if current word is the Kth distinct one
            if distinct_word_count == k_value:
                return word

    # Kth distinct word not found
    return ""

Big(O) Analysis

Time Complexity
O(n)The algorithm first iterates through the array of n strings to count the frequency of each string using a hash map, which takes O(n) time on average assuming the hash function distributes keys evenly. Next, it iterates through the array again, checking the frequency count of each string which is an O(1) lookup in the hash map. Therefore, the second loop is also O(n). The dominant factor is O(n) + O(n), which simplifies to O(n).
Space Complexity
O(N)The algorithm uses a hash map (or dictionary) to store the frequency of each word in the input array. In the worst-case scenario, all N words in the array are distinct, requiring the hash map to store N key-value pairs. Therefore, the auxiliary space used by the hash map grows linearly with the input size N. No other significant auxiliary space is used.

Edge Cases

Null or empty input array
How to Handle:
Return an empty string or null to signify no kth distinct string exists, or throw an IllegalArgumentException if required.
k is zero or negative
How to Handle:
Throw an IllegalArgumentException as k must be a positive integer.
k is larger than the number of distinct strings in the array
How to Handle:
Return an empty string or null to indicate no kth distinct string exists.
Array contains only duplicate strings
How to Handle:
The algorithm should return null or an empty string since there are no distinct strings, based on specification.
Very large array size that could lead to memory issues with hash map
How to Handle:
Consider using a more memory-efficient data structure or limiting the array size if memory becomes a constraint.
Array contains very long strings, affecting string comparison performance
How to Handle:
String comparison performance is inherent to the language; consider that extremely long strings will affect runtime.
k equals the number of distinct strings
How to Handle:
The kth distinct string will be the last distinct string found, which is handled correctly by iterating through the array once.
Integer overflow potential when calculating the count or index
How to Handle:
Use appropriate data types (e.g., long) to prevent integer overflow when calculating counts or indices.
0/1037 completed