Taro Logo

Longest Substring Without Repeating Characters

Medium
Meta logo
Meta
2 views
Topics:
StringsSliding Windows

Given a string s, find the length of the longest substring without duplicate characters.

For example:

  • s = "abcabcbb". The answer is "abc", with a length of 3.
  • s = "bbbbb". The answer is "b", with a length of 1.
  • s = "pwwkew". The answer is "wke", with a length of 3. Note that the answer must be a substring; "pwke" is a subsequence and not a substring.

How would you approach this problem? What is the time and space complexity of your solution? Are there any edge cases to consider?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the expected character set of the input string `s`? Is it ASCII, extended ASCII, or Unicode?
  2. Can the input string `s` be null or empty?
  3. If the input string is empty, what should the function return?
  4. Are we only concerned with the *length* of the longest substring, or do we also need to return the substring itself?
  5. If multiple substrings of the same maximum length exist, is it sufficient to return the length of any one of them?

Brute Force Solution

Approach

The brute force method finds the longest substring without repeating characters by examining every possible substring. It involves checking each substring to see if it meets the criteria. We continue until the longest valid substring is found.

Here's how the algorithm would work step-by-step:

  1. Start by considering the first character of the string as a potential substring.
  2. Then, add the next character and check if the new, longer substring contains any repeating characters.
  3. If there are repeating characters, discard that substring and move on.
  4. If there are no repeating characters, record the length of the valid substring.
  5. Repeat this process by adding one character at a time until you reach the end of the original string, always checking for repeating characters.
  6. Next, start from the second character of the original string and repeat the same process: checking every possible substring that begins with the second character.
  7. Continue this process, each time starting from the next character in the original string, until you have checked all possible substrings.
  8. After checking all substrings, compare the lengths of all the valid substrings you recorded.
  9. The longest valid substring found is your answer.

Code Implementation

def longest_substring_without_repeating_characters_brute_force(input_string):
    longest_substring_length = 0
    
    for starting_index in range(len(input_string)): # Iterate through possible starting positions

        for ending_index in range(starting_index, len(input_string)): # Iterate through possible ending positions
            substring = input_string[starting_index:ending_index + 1]

            # Check if the current substring has repeating characters
            if len(set(substring)) == len(substring):

                # Update the length if this substring is the longest so far
                longest_substring_length = max(longest_substring_length, len(substring))

    return longest_substring_length

Big(O) Analysis

Time Complexity
O(n³)The algorithm iterates through all possible substrings of the input string. The outer loop iterates `n` times (where n is the length of the string), selecting the starting position of the substring. The inner loop iterates up to `n` times to extend the substring. Inside the inner loop, checking for repeating characters within the substring can take up to `n` operations in the worst case. Therefore, the total time complexity is proportional to n * n * n, which simplifies to O(n³).
Space Complexity
O(1)The provided plain English explanation describes a brute-force approach that iterates through all possible substrings. It does not explicitly mention any auxiliary data structures used to store intermediate results or track visited characters. The algorithm primarily relies on comparing characters within the original string, suggesting minimal additional memory. Therefore, the space complexity is constant, as it uses a fixed amount of extra memory regardless of the input string's length N.

Optimal Solution

Approach

The optimal strategy avoids unnecessary work by remembering the last time we saw each letter. This lets us quickly move the start of our substring when we encounter a duplicate, skipping over many invalid substrings all at once.

Here's how the algorithm would work step-by-step:

  1. Imagine you're walking through the string, one letter at a time.
  2. Keep track of the longest substring you've seen so far that doesn't have any repeating letters.
  3. As you walk, remember the last spot where you saw each letter. Think of it like having a little notepad for each letter.
  4. If you come across a letter you've seen before, check where you saw it last.
  5. If that last spot is inside your current substring, it means you have a repeating letter!
  6. Instead of starting over from the beginning, you can jump ahead. Move the starting point of your substring to just after that last spot. This removes the repetition and avoids rechecking a lot of letters.
  7. Keep going, updating your longest substring as you find bigger ones and jumping ahead when needed.
  8. When you reach the end, you'll have found the longest substring without repeating letters!

Code Implementation

def length_of_longest_substring(input_string):
    start_index = 0
    max_length = 0
    character_index_map = {}

    for end_index, character in enumerate(input_string):
        #If char is present and within current window
        if character in character_index_map and start_index <= character_index_map[character]:

            # Move the start of window
            start_index = character_index_map[character] + 1

        else:
            # Update max length if needed
            max_length = max(max_length, end_index - start_index + 1)

        character_index_map[character] = end_index

    return max_length

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the string once using a single loop of n iterations, where n is the length of the string. Inside the loop, the operations performed (checking the last seen index of a character and updating the start of the substring) take constant time, O(1), due to the efficient lookup provided by the hash map/dictionary. Therefore, the overall time complexity is directly proportional to the length of the string, resulting in O(n).
Space Complexity
O(min(m, N))The algorithm utilizes a data structure, essentially a "notepad", to store the last seen position of each character. In the worst-case scenario, this notepad would store the last seen position for all possible characters in the input string, where m is the size of the character set. However, the notepad cannot store more entries than the length of the string itself, denoted by N. Thus, the space complexity is proportional to the minimum of m and N, representing the number of distinct characters in the input or the length of the input string, whichever is smaller. This space is used to keep track of the last seen index for each character encountered within the string. Consequently, the auxiliary space complexity is O(min(m, N)).

Edge Cases

CaseHow to Handle
Null or Empty StringReturn 0, as an empty string has no substring.
String with a single characterReturn 1, as a single character string has a substring of length 1.
String with all identical characters (e.g., 'aaaa')The sliding window will increment its right pointer and update the starting index of the window when a duplicate is found, ultimately returning 1.
String with many repeating characters (e.g., 'abababab')The algorithm will dynamically adjust the window based on the locations of duplicates, ensuring correct substring length tracking.
String with no repeating characters (e.g., 'abcdefg')The sliding window will expand to the entire string and return the length of the string.
String with ASCII characters or Unicode characters.The sliding window/hashmap must be able to handle the full range of characters provided in the string.
Long string to check for performance and potential memory overflowThe sliding window approach provides O(n) time complexity and uses constant space for character storage.
String contains only whitespace charactersThe algorithm should correctly handle whitespace characters the same as any other character, either returning 1 if all are identical, or the length of the substring if unique spaces are present.