Longest Substring with At Most Two Distinct Characters

Medium

Google

3 views

Topics:

StringsTwo PointersSliding Windows

Given a string s, find the length of the longest substring that contains at most two distinct characters.

Example 1:

Input: s = "eceba" Output: 3 Explanation: The longest substring with at most two distinct characters is "ece", which has a length of 3.

Example 2:

Input: s = "ccaabbb" Output: 5 Explanation: The longest substring with at most two distinct characters is "ccaab", which has a length of 5.

Clarifications:

What should I return if the input string is empty?
Are we only considering ASCII characters, or can the string contain Unicode characters?
Can the input string contain null or whitespace characters?
What is the expected time and space complexity?

Write a function that takes a string s as input and returns the length of the longest substring with at most two distinct characters. The function should be optimized for performance. Consider edge cases such as empty strings or strings with only one or two distinct characters. Discuss time and space complexity.

Longest Substring with At Most Two Distinct Characters

Medium

Google

3 views

Topics:

StringsTwo PointersSliding Windows

Given a string s, find the length of the longest substring that contains at most two distinct characters.

Example 1:

Input: s = "eceba" Output: 3 Explanation: The longest substring with at most two distinct characters is "ece", which has a length of 3.

Example 2:

Input: s = "ccaabbb" Output: 5 Explanation: The longest substring with at most two distinct characters is "ccaab", which has a length of 5.

Clarifications:

What should I return if the input string is empty?
Are we only considering ASCII characters, or can the string contain Unicode characters?
Can the input string contain null or whitespace characters?
What is the expected time and space complexity?

Write a function that takes a string s as input and returns the length of the longest substring with at most two distinct characters. The function should be optimized for performance. Consider edge cases such as empty strings or strings with only one or two distinct characters. Discuss time and space complexity.

Solution

Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

What is the expected data type of the input string and can it be null or empty?
What characters can the input string contain (e.g., only ASCII, Unicode)?
If no substring exists with at most two distinct characters, what should I return (e.g., null, empty string, 0)?
In the context of this problem, is a substring required to be contiguous?
If multiple substrings of the same maximum length exist, is any one of them acceptable as a valid output?

Brute Force Solution

Approach

The brute force method for this problem means we will look at every single possible section (substring) of the given text. We will check each one to see if it follows the rule of having at most two different letters and keep track of the longest section we find that follows the rule.

Here's how the algorithm would work step-by-step:

Start by considering the very first letter of the text as the beginning of a potential section.
Then, look at the first two letters as a section. Then, the first three, and so on, each time extending the section by one letter until we reach the end of the text.
For each of these sections, check how many different letters it contains. If it has more than two, it doesn't meet our rule, so we ignore it and move on.
If a section has two or fewer different letters, we compare its length to the length of the longest valid section we've found so far. If it's longer, we remember this new section as the longest one.
After checking all sections starting with the first letter, we repeat the process starting with the second letter of the text, then the third, and so on, until we've considered every possible starting position.
Once we've checked all possible sections of the text, the longest valid section we remembered is the answer.

Code Implementation

def longest_substring_with_at_most_two_distinct_characters_brute_force(input_string):
    longest_substring_length = 0

    for starting_index in range(len(input_string)): 
        for ending_index in range(starting_index, len(input_string)): 
            substring = input_string[starting_index:ending_index + 1]

            # This ensures that we're only looking at distinct letters within the current substring
            distinct_characters = set(substring)

            # We only care about substrings that satisfy the two distinct characters rule
            if len(distinct_characters) <= 2:
                current_substring_length = len(substring)

                # Find out if our current substring is bigger than the currently known max
                if current_substring_length > longest_substring_length:
                    longest_substring_length = current_substring_length

    return longest_substring_length

Big(O) Analysis

Time Complexity

O(n³) – The algorithm iterates through all possible substrings of the input string of length n. The outer loop iterates n times, determining the starting position of the substring. The inner loop iterates up to n times, determining the ending position of the substring. Inside the inner loop, we need to check if the substring contains at most two distinct characters, which requires iterating through the substring of up to n length in the worst case to count character frequencies. Therefore, the time complexity is O(n * n * n), which simplifies to O(n³).

Space Complexity

O(1) – The brute force approach described involves iterating through all possible substrings and checking each for validity. The plain English explanation only describes comparing the length of the current substring to the longest valid substring found so far. This requires storing a constant number of variables, such as the length of the longest substring or the substring itself. The amount of extra memory used does not scale with the input string's length N, and therefore the auxiliary space complexity is O(1).

Optimal Solution

Approach

We want to find the longest chunk of text that contains no more than two different letters. Instead of checking every possible chunk, we'll use a 'sliding window' approach, expanding and shrinking our view to efficiently find the best one.

Here's how the algorithm would work step-by-step:

Imagine you have a window that you can move across the text. Start with a small window at the beginning.
Expand the window to the right, one letter at a time.
As you expand, keep track of how many different letters are inside the window.
If the window contains more than two different letters, shrink the window from the left until you have only two different letters again.
Keep track of the size of the longest window you've seen so far.
Repeat expanding and shrinking the window until you reach the end of the text.
The size of the longest window you tracked is the answer.

Code Implementation

def longest_substring_with_at_most_two_distinct_characters(input_string):
    window_start = 0
    max_length = 0
    character_frequency = {}

    for window_end in range(len(input_string)): 
        right_character = input_string[window_end]
        if right_character not in character_frequency:
            character_frequency[right_character] = 0
        character_frequency[right_character] += 1

        # Shrink the window until we have at most two distinct characters

        while len(character_frequency) > 2:
            left_character = input_string[window_start]
            character_frequency[left_character] -= 1

            # Clean up characters with 0 frequency to maintain accuracy
            if character_frequency[left_character] == 0:
                del character_frequency[left_character]
            window_start += 1

        # Update the max length if the current window is longer
        current_window_length = window_end - window_start + 1
        max_length = max(max_length, current_window_length)

    return max_length

Big(O) Analysis

Time Complexity

O(n) – The algorithm iterates through the input string of size 'n' using a sliding window. The outer 'expansion' of the window happens at most 'n' times, once for each character. The inner 'shrinking' of the window, while it appears as a loop, only advances the left pointer linearly with the right pointer. Because the left and right pointers each advance at most 'n' times, the overall complexity is O(n + n), which simplifies to O(n).

Space Complexity

O(1) – The algorithm uses a sliding window approach, keeping track of the count of distinct characters within the window. It requires a hash map (or similar data structure) to store the character counts. Since the problem specifies at most two distinct characters, the hash map will store at most two key-value pairs, making the space used constant, irrespective of the input string length N. Therefore, the space complexity is O(1).

Edge Cases

Case	How to Handle
Null or empty input string	Return 0, indicating an empty substring.
String with only one distinct character	Return the length of the string, as the entire string is a valid substring.
String with exactly two distinct characters repeated many times	Ensure the algorithm correctly identifies the entire string as the longest substring.
Long string to check for performance issues	Sliding window approach is O(n) and handles large strings efficiently without excessive memory usage.
String with more than two distinct characters, but a long substring with only two	The algorithm should correctly identify and return the longest substring adhering to the two distinct character constraint.
Input string contains non-ASCII characters	The sliding window and character counting mechanism should work correctly with Unicode characters.
String starting or ending with the same character	The sliding window correctly handles the start/end by expanding and contracting as required.
String with all same characters	The entire string is a valid substring with at most two distinct characters and should be returned.