Longest Substring Without Repeating Characters

Medium
7 years ago

Alright, let's dive into a coding problem. Your task is to write a function that finds the length of the longest substring without repeating characters within a given string.

For example, if the input string is "abcabcbb", the longest substring without repeating characters is "abc", which has a length of 3. Your function should return 3 in this case.

Here are a few more test cases to illustrate the requirements:

  • Input: "bbbbb" Output: 1 (The longest substring is "b")

  • Input: "pwwkew" Output: 3 (The longest substring is "wke". Note that the answer must be a substring, "pwke" is a subsequence and not a substring.)

  • Input: "" Output: 0

  • Input: "dvdf" Output: 3 (The longest substring is "vdf")

Can you describe your approach to solving this problem, and then provide the code for your solution?

Sample Answer

Longest Substring Without Repeating Characters

This is a classic string problem often encountered in technical interviews. I'll start with a straightforward, albeit less efficient, approach and then move on to the optimal solution.

1. Naive (Brute Force) Solution

The most basic approach is to generate all possible substrings of the given string and check each one to see if it contains repeating characters. If a substring has no repeating characters, we compare its length to the current longest substring found. If it's longer, we update the longest substring.

Here's how it works:

  1. Iterate through all possible starting positions of substrings (from index 0 to n-1, where n is the string length).
  2. For each starting position, iterate through all possible ending positions (from the starting position to n-1).
  3. Extract the substring defined by the start and end positions.
  4. Check if the substring contains any repeating characters. This can be done by iterating through the substring and using a Set (or a similar data structure) to track the characters seen so far. If we encounter a character already in the Set, the substring contains repeating characters.
  5. If the substring has no repeating characters, compare its length with the length of the current longest substring without repeating characters. Update the longest substring if necessary.

Example:

Let's say the input string is "abcabcbb".

The algorithm would generate substrings like "a", "ab", "abc", "abca", "abcab", "abcabc", "abcabcb", "abcabcbb", "b", "bc", "bca", ..., "bb". It would then check each of these for repeating characters and keep track of the longest one found so far without repetition.

Code (Python):

python def longest_substring_without_repeating_characters_naive(s): n = len(s) longest = ""

for i in range(n):
    for j in range(i, n):
        substring = s[i:j+1]
        if all_unique(substring):
            if len(substring) > len(longest):
                longest = substring
return longest

def all_unique(s): char_set = set() for char in s: if char in char_set: return False char_set.add(char) return True

2. Optimal Solution: Sliding Window

A more efficient approach is to use a sliding window. We maintain a window (a contiguous portion of the string) and expand it to the right until we encounter a repeating character. When we find a repeating character, we shrink the window from the left until the repeating character is no longer in the window.

Here's how the sliding window approach works:

  1. Use two pointers, left and right, to define the sliding window s[left:right+1]. Initially, both pointers start at 0.
  2. Use a Set (or a similar data structure like a dictionary or hash map) to keep track of the characters currently present in the window.
  3. Iterate through the string, moving the right pointer one step at a time.
  4. If the character at s[right] is not in the Set, add it to the Set and update the maximum length of the substring encountered so far.
  5. If the character at s[right] is already in the Set, it means we have a repeating character. To resolve this, we move the left pointer one step at a time, removing characters from the Set as we go, until the repeating character s[right] is no longer in the Set. Then, we add s[right] to the Set.
  6. Continue iterating until the right pointer reaches the end of the string.

Example:

Let's trace the example "abcabcbb" again:

  • left = 0, right = 0, window = "a", set = {"a"}, max_length = 1
  • left = 0, right = 1, window = "ab", set = {"a", "b"}, max_length = 2
  • left = 0, right = 2, window = "abc", set = {"a", "b", "c"}, max_length = 3
  • left = 0, right = 3, window = "abca", 'a' is in set, so move left. Remove 'a', set = {"b", "c"}, left = 1. Add 'a', set = {"b", "c", "a"}, max_length = 3
  • left = 1, right = 4, window = "bca", set = {"b", "c", "a"}, max_length = 3
  • left = 1, right = 5, window = "bcab", 'b' is in set, so move left. Remove 'b', set = {"c", "a"}, left = 2. Add 'b', set = {"c", "a", "b"}, max_length = 3
  • left = 2, right = 6, window = "cba", set = {"c", "a", "b"}, max_length = 3
  • left = 2, right = 7, window = "cbab", 'b' is in set, so move left. Remove 'c', set = {"a", "b"}, left = 3. Remove 'b', set = {"a"}, left = 4. Add 'b', set = {"a", "b"}, max_length = 3

The final result is 3.

Code (Python):

python def longest_substring_without_repeating_characters(s): n = len(s) left = 0 right = 0 max_length = 0 char_set = set()

while right < n:
    if s[right] not in char_set:
        char_set.add(s[right])
        max_length = max(max_length, right - left + 1)
        right += 1
    else:
        char_set.remove(s[left])
        left += 1

return max_length

3. Big(O) Run-time Complexity

  • Naive Solution: The naive solution has a time complexity of O(n^3) in the worst case, where n is the length of the string. This is because we iterate through all possible substrings (O(n^2)) and then check each substring for repeating characters (O(n)). The all_unique function takes O(n) in the worst case.
  • Optimal Solution (Sliding Window): The sliding window approach has a time complexity of O(n), where n is the length of the string. Although we have nested loops (the while loop and the while loop inside the else block), the left pointer only moves forward. In the worst-case scenario, both left and right pointers traverse the string at most once. The set.add() and set.remove() operations take O(1) time on average.

4. Big(O) Space Complexity

  • Naive Solution: The naive solution uses O(n) space in the worst case to store the longest substring found so far and the char_set.
  • Optimal Solution (Sliding Window): The sliding window approach uses O(min(m, n)) space, where n is the length of the string and m is the size of the character set (e.g., 26 for lowercase English letters, 128 for ASCII, or larger for Unicode). The space is used to store the characters in the Set. In the worst case, all characters in the string are unique, and the Set will store all of them. However, the size of the set is limited by the size of the character set used in the string.

5. Edge Cases

  • Empty String: If the input string is empty (""), the algorithm should return 0 because there is no substring.
  • String with Only Repeating Characters: If the input string consists of only repeating characters (e.g., "bbbbbb"), the algorithm should return 1 because the longest substring without repeating characters will be a single character.
  • String with a Single Character: If the string contains only one character, the algorithm should return 1.
  • String with all unique characters: If the string has all unique characters (e.g. "abcdefg") then the algorithm should return the length of the string.

6. Code Considerations

  • The code provided here is in Python, but the same logic can be applied to other programming languages (e.g., Java, C++, JavaScript).
  • The choice of data structure for the Set (or equivalent) is important. Hash-based sets or dictionaries provide O(1) average time complexity for insertion and deletion, which contributes to the overall O(n) time complexity of the sliding window algorithm.
  • While Python makes string slicing easy (e.g., s[i:j+1]), it's important to be mindful of its potential impact on performance, especially in time-critical applications.