Taro Logo

Remove All Occurrences of a Substring

Medium
Asked by:
Profile picture
Profile picture
Profile picture
Profile picture
+4
11 views
Topics:
StringsTwo Pointers

Given two strings s and part, perform the following operation on s until all occurrences of the substring part are removed:

  • Find the leftmost occurrence of the substring part and remove it from s.

Return s after removing all occurrences of part.

A substring is a contiguous sequence of characters in a string.

Example 1:

Input: s = "daabcbaabcbc", part = "abc"
Output: "dab"
Explanation: The following operations are done:
- s = "daabcbaabcbc", remove "abc" starting at index 2, so s = "dabaabcbc".
- s = "dabaabcbc", remove "abc" starting at index 4, so s = "dababc".
- s = "dababc", remove "abc" starting at index 3, so s = "dab".
Now s has no occurrences of "abc".

Example 2:

Input: s = "axxxxyyyyb", part = "xy"
Output: "ab"
Explanation: The following operations are done:
- s = "axxxxyyyyb", remove "xy" starting at index 4 so s = "axxxyyyb".
- s = "axxxyyyb", remove "xy" starting at index 3 so s = "axxyyb".
- s = "axxyyb", remove "xy" starting at index 2 so s = "axyb".
- s = "axyb", remove "xy" starting at index 1 so s = "ab".
Now s has no occurrences of "xy".

Constraints:

  • 1 <= s.length <= 1000
  • 1 <= part.length <= 1000
  • s​​​​​​ and part consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string or the substring be empty or null?
  2. Are the input strings case-sensitive? Should the removal be case-sensitive as well?
  3. If the substring appears multiple times consecutively (e.g., "abab" with substring "ab"), should all occurrences be removed?
  4. What should be returned if the substring is not found in the input string?
  5. What is the maximum length of the input string and the substring? Are there any memory constraints?

Brute Force Solution

Approach

The brute force approach for removing all occurrences of a smaller string from a larger string involves repeatedly searching for the smaller string within the larger string. If we find the smaller string, we remove it. We keep doing this until the smaller string can't be found anymore.

Here's how the algorithm would work step-by-step:

  1. Look through the big string to see if the small string appears inside it.
  2. If the small string is found, cut it out of the big string.
  3. Now, go back to the beginning and repeat the process by checking if the small string is still inside the modified big string.
  4. Keep doing this cutting-out-and-checking process over and over.
  5. When you get to a point where you can't find the small string in the big string anymore, you're done.

Code Implementation

def remove_all_occurrences_brute_force(main_string, substring):
    while True:
        # Find the index of the substring
        index = main_string.find(substring)

        # If substring is not found, exit the loop
        if index == -1:
            break

        # Remove the substring from the main string
        main_string = main_string[:index] + main_string[index + len(substring):]

    return main_string

Big(O) Analysis

Time Complexity
O(n*m)Let n be the length of the big string and m be the length of the small string (substring). In the worst-case scenario, the small string might appear many times in the big string. The while loop continues as long as the small string is found. Inside the loop, the indexOf operation takes O(n) time to search for the small string. If the small string is found, we remove it, which could also take O(n) time in some implementations due to string concatenation or copying. In the worst case, each indexOf call finds an occurrence. The maximum number of times we can find the small string is approximately n/m. Thus the overall time complexity is approximately (n/m) * n = n²/m. However, if the replace operation also costs O(n) then the overall time complexity will tend towards O(n * (n/m)) which simplifies to O(n²/m), or can be approximated by O(n*m) as the indexOf is using an algorithm like KMP.
Space Complexity
O(1)The brute force approach, as described, modifies the original string in place. No additional data structures like arrays, lists, or hashmaps are used to store intermediate results or visited locations. While string slicing might create temporary string objects, these are short-lived and their memory is quickly reclaimed. Therefore, the algorithm uses a constant amount of extra space, independent of the input string's size.

Optimal Solution

Approach

We want to efficiently remove all instances of a specific smaller text (the substring) from a larger text (the main string). The trick is to carefully search for the substring and remove it whenever we find it, updating the main string as we go.

Here's how the algorithm would work step-by-step:

  1. Repeatedly search the main string for the substring.
  2. If the substring is found, remove it from the main string.
  3. After removing, update the main string to reflect the change.
  4. Keep doing this until the substring no longer exists within the main string.
  5. The final updated main string is the result.

Code Implementation

def remove_all_occurrences(main_string, substring_to_remove):
    while True:
        index_of_substring = main_string.find(substring_to_remove)

        # Exit loop if substring is not found.
        if index_of_substring == -1:
            break

        # Remove the substring by slicing the main string.
        main_string = main_string[:index_of_substring] + main_string[index_of_substring + len(substring_to_remove):]

        # Update main_string after removing a substring instance.

    return main_string

Big(O) Analysis

Time Complexity
O(n*m)Let n be the length of the main string and m be the length of the substring. The while loop iterates as long as the substring is found in the main string. In each iteration, the find operation takes O(n) time in the worst case to search for the substring. If the substring is found, the erase operation, which removes the substring, also takes O(n) time. In the worst-case scenario, the substring appears multiple times, and each time we remove it, it could potentially shift the remaining part of the main string. Consider a simplified example where the main string and substring are almost identical except the substring is one character shorter than the string. In this case the find and erase operations could happen on the order of n/m times. Hence the time complexity is O(n*m).
Space Complexity
O(1)The algorithm described in the plain English explanation repeatedly modifies the main string in-place and searches for the substring within it. It doesn't mention creation of any auxiliary data structures like arrays, lists, or hash maps to store intermediate results or substring indices. Therefore, the auxiliary space used is constant, independent of the input string's length (N).

Edge Cases

CaseHow to Handle
Empty string or substringIf either string or substring is empty, return the original string to avoid infinite loops or unexpected behavior.
Substring longer than the main stringReturn the original string immediately as the substring cannot be found.
Substring not found in the main stringReturn the original string as no replacements need to be made.
Substring is the same as the main stringReturn an empty string, indicating that the entire string was removed.
Overlapping occurrences of the substringThe algorithm should correctly remove all overlapping occurrences by iterating and removing sequentially.
Very long strings to check for performanceConsider using efficient string searching algorithms (e.g., KMP) to avoid quadratic time complexity.
Main string consisting only of the substringThe result should be an empty string, which signifies that everything was removed.
Null string or substring inputThrow an IllegalArgumentException or return null (depending on requirements) if either input is null.