Alright, let's dive into a coding problem. Your task is to write a function that finds the length of the longest substring without repeating characters within a given string.
For example, if the input string is "abcabcbb", the longest substring without repeating characters is "abc", which has a length of 3. Your function should return 3 in this case.
Here are a few more test cases to illustrate the requirements:
Input: "bbbbb" Output: 1 (The longest substring is "b")
Input: "pwwkew" Output: 3 (The longest substring is "wke". Note that the answer must be a substring, "pwke" is a subsequence and not a substring.)
Input: "" Output: 0
Input: "dvdf" Output: 3 (The longest substring is "vdf")
Can you describe your approach to solving this problem, and then provide the code for your solution?
This is a classic string problem often encountered in technical interviews. I'll start with a straightforward, albeit less efficient, approach and then move on to the optimal solution.
The most basic approach is to generate all possible substrings of the given string and check each one to see if it contains repeating characters. If a substring has no repeating characters, we compare its length to the current longest substring found. If it's longer, we update the longest substring.
Here's how it works:
Set
(or a similar data structure) to track the characters seen so far. If we encounter a character already in the Set
, the substring contains repeating characters.Example:
Let's say the input string is "abcabcbb".
The algorithm would generate substrings like "a", "ab", "abc", "abca", "abcab", "abcabc", "abcabcb", "abcabcbb", "b", "bc", "bca", ..., "bb". It would then check each of these for repeating characters and keep track of the longest one found so far without repetition.
Code (Python):
python def longest_substring_without_repeating_characters_naive(s): n = len(s) longest = ""
for i in range(n):
for j in range(i, n):
substring = s[i:j+1]
if all_unique(substring):
if len(substring) > len(longest):
longest = substring
return longest
def all_unique(s): char_set = set() for char in s: if char in char_set: return False char_set.add(char) return True
A more efficient approach is to use a sliding window. We maintain a window (a contiguous portion of the string) and expand it to the right until we encounter a repeating character. When we find a repeating character, we shrink the window from the left until the repeating character is no longer in the window.
Here's how the sliding window approach works:
left
and right
, to define the sliding window s[left:right+1]
. Initially, both pointers start at 0.Set
(or a similar data structure like a dictionary or hash map) to keep track of the characters currently present in the window.right
pointer one step at a time.s[right]
is not in the Set
, add it to the Set
and update the maximum length of the substring encountered so far.s[right]
is already in the Set
, it means we have a repeating character. To resolve this, we move the left
pointer one step at a time, removing characters from the Set
as we go, until the repeating character s[right]
is no longer in the Set
. Then, we add s[right]
to the Set
.right
pointer reaches the end of the string.Example:
Let's trace the example "abcabcbb" again:
left
= 0, right
= 0, window = "a", set = {"a"}, max_length = 1left
= 0, right
= 1, window = "ab", set = {"a", "b"}, max_length = 2left
= 0, right
= 2, window = "abc", set = {"a", "b", "c"}, max_length = 3left
= 0, right
= 3, window = "abca", 'a' is in set, so move left. Remove 'a', set = {"b", "c"}, left = 1. Add 'a', set = {"b", "c", "a"}, max_length = 3left
= 1, right
= 4, window = "bca", set = {"b", "c", "a"}, max_length = 3left
= 1, right
= 5, window = "bcab", 'b' is in set, so move left. Remove 'b', set = {"c", "a"}, left = 2. Add 'b', set = {"c", "a", "b"}, max_length = 3left
= 2, right
= 6, window = "cba", set = {"c", "a", "b"}, max_length = 3left
= 2, right
= 7, window = "cbab", 'b' is in set, so move left. Remove 'c', set = {"a", "b"}, left = 3. Remove 'b', set = {"a"}, left = 4. Add 'b', set = {"a", "b"}, max_length = 3The final result is 3.
Code (Python):
python def longest_substring_without_repeating_characters(s): n = len(s) left = 0 right = 0 max_length = 0 char_set = set()
while right < n:
if s[right] not in char_set:
char_set.add(s[right])
max_length = max(max_length, right - left + 1)
right += 1
else:
char_set.remove(s[left])
left += 1
return max_length
all_unique
function takes O(n) in the worst case.while
loop and the while
loop inside the else
block), the left
pointer only moves forward. In the worst-case scenario, both left
and right
pointers traverse the string at most once. The set.add()
and set.remove()
operations take O(1) time on average.longest
substring found so far and the char_set
.Set
. In the worst case, all characters in the string are unique, and the Set
will store all of them. However, the size of the set is limited by the size of the character set used in the string.Set
(or equivalent) is important. Hash-based sets or dictionaries provide O(1) average time complexity for insertion and deletion, which contributes to the overall O(n) time complexity of the sliding window algorithm.s[i:j+1]
), it's important to be mindful of its potential impact on performance, especially in time-critical applications.