Longest Substring Without Repeating Characters

04 September 2024 — approx 4 min read.

Given a string s, find the length of the longest substring without repeating characters.

Problem Statement

LeetCode-3: Given a string s, find the length of the longest substring without repeating characters.

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.


Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Approach

The brute-force approach is to consider all possible substrings and check if they have repeating characters. This approach has a time complexity of $O(n^3)$.

We can use a sliding window approach to solve this problem in $O(2n) \approx O(n)$ time complexity. We will use a HashSet to store the characters in the current window. We will keep track of the start and end of the window. If the character at the end of the window is not in the HashSet, we will add it to the HashSet and update the maximum length of the substring.

If the character is already in the HashSet, we will remove the character at the start of the window and update the start of the window.

Implementation

Here is a Java implementation of the above approach:

public int lengthOfLongestSubstring(String s) {
    Set<Character> uniq = new HashSet<>();
    int start = 0;
    int end = 0;
    int max = 0;

    while(end < s.length()){
        while(uniq.contains(s.charAt(end))){
            uniq.remove(s.charAt(start));
            start++;
        }
        uniq.add(s.charAt(end));
        max = Math.max(max, end-start + 1);
        end++;
    }
    
    return max;
}

Complexity Analysis

The time complexity of this approach is $O(2n) \approx O(n)$ as we may iterate over all the characters twice. The space complexity is $O(min(n, m))$ where $n$ is the length of the string and $m$ is the size of the character set. The space complexity is $O(n)$ in the worst case when all characters are unique.

Follow-up

Can we solve this problem in $O(n)$ time complexity?

If we look closely, once we have a repeating character, we will keep removing characters from the start of the window until we remove the repeating character.

Which means, we are not using the information that we have already checked the characters between the start and end of the window.

We can use a HashMap to store the index of the characters. If we find a repeating character, we will update the start of the window to the index of the repeating character + 1.

Here is the updated implementation:

public int lengthOfLongestSubstring(String s) {
    Map<Character, Integer> characterIndexMapping = new HashMap<>();
    int max = 0;
    int start = 0;
    int end = 0;
    
    while(end < s.length()){
        if(characterIndexMapping.containsKey(s.charAt(end))){
            start = characterIndexMapping.get(s.charAt(end)) + 1;
        }
        max = Math.max(max, end - start + 1);
        characterIndexMapping.put(s.charAt(end), end);
        end++;
    }
    
    return max;
}

But this implementation has a major flaw. Consider the input abba.

# the first row for each iteration represents
# the start of loop, and next row represents
# the end of loop.

#   char start end   max mapping
1   a    0     0     0   {}
1   a    0     0     1   {a=0}

2   b    0     1     1   {a=0}
2   b    0     1     2   {a=0, b=1}

3   b    0     2     2   {a=0, b=1}
3   b    2     2     2   {a=0, b=2}

4   a    2     3     2   {a=0, b=2}
4   a    1     3     3   {a=3, b=2}

When we reach the second b, we will update the start of the window to 2. Now when we reach the second a, we will update the start of the window to 1 (previous index of a + 1) which is incorrect as we have already checked the characters between 1 and 2.

To fix this, we need to update the start of the window to max(start, characterIndexMapping.get(s.charAt(end)) + 1) i.e. we need to update the start of the window to the maximum of the current start and the index of the repeating character + 1.