Taro Logo

Longest Substring Without Repeating Characters

#3 Most AskedMedium
52 views
Topics:
StringsTwo PointersSliding Windows

Given a string s, find the length of the longest substring without duplicate characters.

Example 1:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

Example 2:

Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Constraints:

  • 0 <= s.length <= 5 * 104
  • s consists of English letters, digits, symbols and spaces.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the character set of the input string 's'? For example, can it contain only ASCII characters, Unicode characters, or is it limited to alphanumeric characters?
  2. What are the constraints on the length of the input string 's'? For instance, can it be empty, and what is the maximum possible length?
  3. Are we looking for the length of the longest substring, or the substring itself?
  4. What should be returned if the input string 's' is empty?
  5. Does the definition of 'repeating characters' consider case sensitivity (e.g., 'a' and 'A' are different) or case insensitivity (e.g., 'a' and 'A' are the same)?

Brute Force Solution

Approach

This problem asks for the longest sequence of characters within a given text where no character repeats. The brute force approach is like meticulously examining every single possible sequence of characters to see if it's valid and then finding the longest one.

Here's how the algorithm would work step-by-step:

  1. Start at the very beginning of the text and consider a sequence of just one character.
  2. Check if this sequence is valid (meaning no repeating characters). If it is, remember its length.
  3. Now, try a sequence of two characters starting from the beginning.
  4. Check if this two-character sequence is valid. If it is, and it's longer than any valid sequence found so far, remember its length.
  5. Continue this process, extending the sequence by one character at a time, always starting from the original beginning.
  6. For each extended sequence, check if it's valid. If it is, and it's the longest valid one seen yet, update your record of the longest length.
  7. Once you've tried all possible sequences starting from the very first character, move to the second character in the text.
  8. Repeat the entire process: try sequences of one character, then two, then three, and so on, starting from this new position.
  9. Keep doing this for every possible starting point in the text.
  10. Throughout this entire exhaustive search, always keep track of the absolute longest valid sequence of characters you've encountered.

Code Implementation

def length_of_longest_substring(input_string):
    maximum_length_found = 0

    # Iterate through every possible starting point for a substring
    for starting_index in range(len(input_string)):
        current_substring_characters = set()
        current_length = 0

        # For each starting point, try extending the substring one character at a time
        for ending_index in range(starting_index, len(input_string)):
            current_character = input_string[ending_index]

            # If the current character is already in our set, this substring is no longer valid
            if current_character in current_substring_characters:
                break
            
            # Add the unique character to our set and increment the current substring length
            current_substring_characters.add(current_character)
            current_length += 1

            # Update the overall maximum length if the current valid substring is longer
            if current_length > maximum_length_found:
                maximum_length_found = current_length

    return maximum_length_found

Big(O) Analysis

Time Complexity
O(n³)The brute force approach described iterates through all possible start points of a substring (n possibilities). For each start point, it then iterates through all possible end points to form a substring (up to n possibilities). Within each substring generation, it checks for repeating characters, which can take up to O(n) time for each substring. This nesting of three operations, each potentially dependent on the input size n, results in a cubic time complexity, approximated by n * n * n operations, simplifying to O(n³).
Space Complexity
O(1)The provided brute-force approach iterates through all possible substrings. It might implicitly use a set or hash map to check for repeating characters within a substring, but the maximum size of such a structure would be bounded by the size of the character set (e.g., 26 for lowercase English letters, or 128 for ASCII), not by the input string length N. Therefore, the auxiliary space used is constant, regardless of the input size.

Optimal Solution

Approach

The core idea is to keep track of the characters we've seen within the current sequence. As we explore longer sequences, if we encounter a character already in our current sequence, we need to adjust the start of our sequence to exclude the previous occurrence of that character and everything before it. This way, we efficiently expand and shrink our view of valid sequences.

Here's how the algorithm would work step-by-step:

  1. Imagine you have a window that slides over the text.
  2. This window represents the current sequence of characters you are examining to see if it has any repeating characters.
  3. As you slide the window to the right, you add the new character to the window.
  4. If the new character is already inside your current window, it means you've found a repeat.
  5. When a repeat is found, you need to shrink the window from the left side until the repeating character is no longer inside.
  6. After each adjustment of the window (either expanding or shrinking), you check the size of the current window.
  7. You keep a record of the largest window size you've encountered so far.
  8. Continue this process until the window has slid across the entire text.
  9. The largest window size recorded is the length of the longest sequence without repeating characters.

Code Implementation

def length_of_longest_substring(input_string):
    longest_substring_length = 0
    window_start_index = 0
    characters_in_window = {}

    for window_end_index, current_character in enumerate(input_string):
        if current_character in characters_in_window and characters_in_window[current_character] >= window_start_index:
            # When a repeating character is found, we need to move the window's start.
            # This ensures the window only contains unique characters.
            window_start_index = characters_in_window[current_character] + 1

        # Update the last seen index of the current character.
        characters_in_window[current_character] = window_end_index

        # The current window size represents a valid substring without repeating characters.
        current_window_size = window_end_index - window_start_index + 1

        # We maintain the maximum length found so far.
        longest_substring_length = max(longest_substring_length, current_window_size)

    # After iterating through the entire string, we have the overall longest length.
    return longest_substring_length

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input string of length n once. Each character is visited by the right pointer of the sliding window at most once, and by the left pointer of the sliding window at most once. Operations within the loop, such as checking for character existence in a set or map and updating the window boundaries, take amortized constant time. Therefore, the total time complexity is linear with respect to the input size n.
Space Complexity
O(min(n, m))The algorithm uses a data structure, typically a hash map or a set, to keep track of characters within the current sliding window. In the worst case, this structure will store all unique characters present in the input string. If the alphabet size (m) is smaller than the string length (n), the space used is bounded by O(m). Otherwise, it's bounded by O(n). Therefore, the auxiliary space complexity is O(min(n, m)), where n is the length of the input string and m is the size of the character set.

Edge Cases

Input string is empty
How to Handle:
An empty string has no substrings, so the length of the longest substring without repeating characters is 0.
Input string has only one character
How to Handle:
A single character string is itself the longest substring without repeating characters, so the length is 1.
Input string contains all unique characters
How to Handle:
The entire string will be the longest substring, and its length will be returned.
Input string contains all identical characters
How to Handle:
The longest substring without repeating characters will have a length of 1, considering only a single instance of the repeated character.
Input string is very long
How to Handle:
A sliding window approach with a hash map (or set) provides an efficient O(n) time complexity solution, scaling well with large inputs.
Input string contains a mix of repeating and non-repeating characters
How to Handle:
The sliding window will dynamically expand and shrink, correctly identifying the longest valid substring.
Input string contains special characters, numbers, or spaces
How to Handle:
The chosen data structure (e.g., hash map or set) should handle any valid character type present in the string.
Multiple longest substrings of the same length exist
How to Handle:
The algorithm will find and return the length of one of these longest substrings, not necessarily all of them.
0/1037 completed