Taro Logo

Number of Distinct Substrings in a String

Medium
Intuit logo
Intuit
3 views
Topics:
Strings

Given an integer array nums and an integer k, you can choose a subsequence of the array whose sum is less than or equal to k.

Return the maximum size of a subsequence that you can take.

Example 1:

Input: nums = [1,2,3,4,5], k = 10
Output: 4
Explanation: The subsequence [1, 2, 3, 4] has a sum of 10, which is less than or equal to k = 10.

Example 2:

Input: nums = [4,3,1,1,3,3,2], k = 8
Output: 5
Explanation: The subsequence [1, 1, 2, 3, 1] has a sum of 8, which is less than or equal to k = 8.

Constraints:

  • 1 <= nums.length <= 105
  • 1 <= nums[i] <= 1000
  • 1 <= k <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string `s` be empty or null? If so, what should I return?
  2. What is the maximum length of the input string `s`? This will help me consider potential memory or performance constraints.
  3. Is the string `s` case-sensitive? For example, should 'abc' and 'Abc' be considered distinct substrings?
  4. By 'distinct substrings', do you mean that I should count the same substring appearing at different positions in the string only once?
  5. Are there any specific character sets that I should expect in the string `s` (e.g., only lowercase letters, ASCII characters, Unicode characters)?

Brute Force Solution

Approach

The brute force method for finding distinct substrings means we'll look at absolutely everything. We will generate every possible substring and then see which ones are unique.

Here's how the algorithm would work step-by-step:

  1. First, we need to think about all the different substrings we can make from the original string.
  2. Start by taking each single letter from the string, one at a time. Each letter is a substring.
  3. Next, consider all pairs of consecutive letters. These are also substrings.
  4. Continue this process, considering substrings of length three, four, and so on, all the way up to the entire string itself.
  5. Now that we have a complete list of all possible substrings, we need to identify the unique ones.
  6. Compare each substring in our list to every other substring in the list.
  7. If two substrings are exactly the same, then it is not unique and we only count it once.
  8. After comparing every substring with every other substring, we will be left with a count of all the distinct, or unique, substrings.

Code Implementation

def number_of_distinct_substrings_brute_force(input_string):
    substrings = []
    string_length = len(input_string)

    # Generate all possible substrings
    for substring_length in range(1, string_length + 1):
        for starting_index in range(string_length - substring_length + 1):
            substring = input_string[starting_index:starting_index + substring_length]
            substrings.append(substring)

    distinct_substrings = set()
    # Use a set to efficiently track distinct substrings
    for substring_one in substrings:

        distinct_substrings.add(substring_one)

    return len(distinct_substrings)

Big(O) Analysis

Time Complexity
O(n^3)Generating all possible substrings takes O(n^2) time because for each starting position (n possibilities), we can have substring lengths from 1 to n (giving another n factor). Comparing each substring to every other substring to find the distinct ones takes O(n^2) as well, since we potentially have n^2 substrings to compare. Each comparison of two substrings can take up to O(n) time since we need to compare the characters. Therefore, the overall time complexity is O(n^2) * O(n), resulting in O(n^3).
Space Complexity
O(N^2)The brute force approach generates all possible substrings. The number of substrings can be up to N(N+1)/2, where N is the length of the original string. These substrings are stored to identify the unique ones. Therefore, the space required to store these substrings can grow quadratically with the input size. This means the auxiliary space complexity is O(N^2).

Optimal Solution

Approach

To count all the different substrings efficiently, we'll use a special structure called a Trie. A Trie helps us avoid checking substrings we've already seen, making the process much faster than checking everything individually.

Here's how the algorithm would work step-by-step:

  1. Start with an empty Trie. Think of a Trie like a tree where each branch represents a character in a string.
  2. Go through the original string, character by character.
  3. For each character, consider all the substrings that start from that character.
  4. For each substring, try to add it to the Trie. If the substring is already in the Trie, don't add it again.
  5. To add a substring, follow the existing branches of the Trie. If a branch for the next character doesn't exist, create a new branch.
  6. Every time you create a new branch, it means you've found a new, distinct substring.
  7. At the end, the total number of branches you've created is the number of distinct substrings in the original string.

Code Implementation

class TrieNode:
    def __init__(self):
        self.children = {}

class Trie:
    def __init__(self):
        self.root = TrieNode()
        self.count = 0

    def insert(self, string):
        node = self.root
        for char in string:
            if char not in node.children:
                node.children[char] = TrieNode()
                self.count += 1

            node = node.children[char]

def number_of_distinct_substrings(input_string):
    trie = Trie()
    string_length = len(input_string)
    
    for i in range(string_length):
        # Consider all substrings starting from index i.
        for j in range(i, string_length):
            substring = input_string[i:j+1]
            trie.insert(substring)

    #The count stores the total number of distinct substrings
    return trie.count

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through the string of length n, considering all substrings starting at each index. For each starting index, the algorithm potentially considers substrings of length up to n. Inserting each substring into the Trie takes time proportional to the substring's length, which can be up to n. Therefore, the overall time complexity is dominated by the nested loops implied by considering all possible substrings, resulting in approximately n * n/2 operations. This simplifies to O(n²).
Space Complexity
O(N^2)The Trie data structure, used to store the distinct substrings, is the primary driver of space complexity. In the worst-case scenario (e.g., when the string consists of all distinct characters), every substring will be unique and stored in the Trie. The number of possible substrings of a string of length N is N*(N+1)/2, which simplifies to N^2 + N /2. Therefore, the Trie could potentially store a number of nodes proportional to the sum of lengths of all possible substrings, leading to a space complexity of O(N^2).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn 0 if the input string is null or empty to avoid NullPointerException or incorrect counts.
Input string with only one characterReturn 1 since the string itself is the only distinct substring.
Input string with all identical characters (e.g., 'aaaa')The solution should correctly count 'a', 'aa', 'aaa', 'aaaa' as distinct substrings.
Very long input string (approaching memory limits)Consider using a memory-efficient data structure, such as a Trie or a HashSet with appropriate size, and be mindful of substring generation.
String containing special characters or UnicodeEnsure that the substring comparison and hashing (if used) are compatible with the character encoding.
String with repeating patterns (e.g., 'ababab')The solution should efficiently handle repeating patterns and avoid overcounting substrings.
Case sensitivity (if applicable)Clarify with the interviewer whether the comparison should be case-sensitive or case-insensitive and adjust the substring comparison accordingly by using toLowerCase() or toUpperCase().
Integer overflow in substring length calculation or hashcode generationIf calculating the number of substrings based on length, be cautious of potential integer overflow and use appropriate data types or modular arithmetic.