Taro Logo

Longest Substring with At Most K Distinct Characters

Medium
Apple logo
Apple
2 views
Topics:
StringsSliding WindowsTwo Pointers

Given a string s and an integer k, find the length of the longest substring of s that contains at most k distinct characters.

For example:

  1. s = "eceba", k = 2. The longest substring is "ece" with length 3.
  2. s = "aa", k = 1. The longest substring is "aa" with length 2.
  3. s = "abcadef", k = 3. The longest substring is "abc" or "bca" or "cad" or "ade" or "def" with length 3.
  4. s = "", k = 2. The answer is 0 since the string is empty.

Explain your approach, analyze the time and space complexity, and handle edge cases.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the range of possible values for `k` (the maximum number of distinct characters allowed)? Can `k` be zero or negative?
  2. What is the data type of the input string? Is it guaranteed to be non-null and contain only ASCII characters?
  3. If the input string is empty, or if `k` is zero and the string is not empty, what should I return?
  4. What defines a 'substring' in this context? Should it be contiguous?
  5. If multiple substrings of the same maximum length satisfy the condition, is it acceptable to return any one of them?

Brute Force Solution

Approach

The brute force approach to finding the longest substring with at most K distinct characters involves checking every possible substring within the given string. We examine each substring to see if it meets the distinct character criteria, and keep track of the longest valid substring found. It's like trying every possible window size and position to see which one fits the rules and is the biggest.

Here's how the algorithm would work step-by-step:

  1. Start with a small piece of the string, just the first character.
  2. See if this small piece has 'K' or fewer different characters.
  3. If it does, remember its length.
  4. Now, look at the first two characters. Does this piece have 'K' or fewer different characters?
  5. If it does, and it's longer than what you remembered before, remember this new length instead.
  6. Keep adding characters one at a time to the beginning piece and checking the number of different characters and length.
  7. Once you've checked all possible lengths starting from the first character, move on and do the same thing starting from the second character, then the third, and so on.
  8. After checking every single possible piece of the string in this way, you'll know the length of the longest piece that meets the requirement of having at most 'K' different characters.

Code Implementation

def longest_substring_with_at_most_k_distinct_characters_brute_force(input_string, max_distinct_characters):
    longest_substring_length = 0

    for starting_index in range(len(input_string)):
        for ending_index in range(starting_index, len(input_string)):
            substring = input_string[starting_index:ending_index + 1]

            # Count distinct chars
            distinct_characters = set(substring)

            # Check if distinct characters are less or equal to max
            if len(distinct_characters) <= max_distinct_characters:

                # Update if it's the longest substring
                longest_substring_length = max(longest_substring_length, len(substring))

    return longest_substring_length

Big(O) Analysis

Time Complexity
O(n²)The brute force approach iterates through all possible substrings of the input string of length n. For each starting index i, it considers all substrings starting at i, which requires iterating up to n-i times. This leads to a nested loop structure where the outer loop runs n times and the inner loop, on average, runs n/2 times. Therefore, the total number of operations is proportional to n * (n/2), which simplifies to O(n²).
Space Complexity
O(K)The brute force approach, as described, requires storing the distinct characters within each substring. In the worst-case scenario, a substring could contain up to K distinct characters. Therefore, a data structure (implicitly a set or hash map) of size K might be needed to keep track of these distinct characters for each substring being evaluated. Consequently, the auxiliary space complexity is O(K), where K is the maximum number of distinct characters allowed.

Optimal Solution

Approach

To efficiently find the longest section of text with at most K unique characters, we use a 'window' that expands and contracts. This window represents a possible section of the text, and we adjust its size based on the number of unique characters inside it.

Here's how the algorithm would work step-by-step:

  1. Start with a window at the beginning of the text.
  2. Expand the window to the right, one character at a time.
  3. Keep track of how many different characters are inside the window.
  4. If the number of different characters goes above the limit (K), shrink the window from the left until you're back within the limit.
  5. While adjusting the window, always remember the length of the longest section you've seen so far.
  6. Move the window along the entire text, expanding and shrinking as needed, and updating the longest section's length.
  7. In the end, the longest length recorded is the answer.

Code Implementation

def longest_substring_with_at_most_k_distinct_characters(text_string, max_distinct_characters):
    window_start = 0
    max_length = 0
    character_frequency = {}

    for window_end in range(len(text_string)):
        end_character = text_string[window_end]
        if end_character not in character_frequency:
            character_frequency[end_character] = 0
        character_frequency[end_character] += 1

        # Shrink the window if num distinct chars > k
        while len(character_frequency) > max_distinct_characters:
            start_character = text_string[window_start]
            character_frequency[start_character] -= 1

            # Remove the character if its frequency becomes zero
            if character_frequency[start_character] == 0:
                del character_frequency[start_character]
            window_start += 1

        # Update the max length
        max_length = max(max_length, window_end - window_start + 1)

    return max_length

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n using a sliding window. The outer loop (expanding the window) runs at most n times. The inner loop (shrinking the window) might seem like it adds complexity, but in the worst case, each character is added and removed from the window only once. Therefore, the shrinking process, in aggregate, also runs at most n times. The number of distinct characters within the window is tracked using a hash map (or similar data structure) which provides O(1) average time complexity for insertion, deletion, and lookup. Thus, the overall time complexity is dominated by the single pass through the string, resulting in O(n).
Space Complexity
O(K)The algorithm utilizes a data structure (likely a hash map or an array) to track the count of distinct characters within the sliding window. In the worst-case scenario, where the longest substring contains K distinct characters, this data structure will store K character counts. Therefore, the space required is proportional to K, the maximum number of distinct characters allowed. The input size N (the length of the text) does not directly influence this auxiliary space.

Edge Cases

CaseHow to Handle
Null or empty input stringReturn 0 immediately as there are no substrings to consider.
k is 0 and input string is not emptyReturn 0 immediately since no substring can have 0 distinct characters.
String with length 1Return 1 if k >= 1, otherwise return 0.
String with all identical charactersReturn the length of the string if k >= 1, otherwise 0.
k is larger than the number of distinct characters in the stringReturn the length of the string as the entire string is a valid substring.
Long input string to test for time complexitySliding window approach should scale linearly with input string length.
String contains only distinct charactersThe longest substring will be of length k or length of string which ever is smaller.
Large k value that might cause integer overflowK is defined as an integer; however, we don't use k in multiplication, so overflow not an issue. But validation for negative numbers should be included.