Taro Logo

Add Bold Tag in String

Medium
Gusto logo
Gusto
2 views
Topics:
ArraysStringsTwo Pointers

You are given a string s and an array of strings words. You should add a bold tag <b> and </b> to all the substrings of s that are present in words.

Return s after adding the bold tags.

You can return any valid answer. It is guaranteed that adding the tags won't change the length of the string.

Example 1:

Input: s = "abcxyz123", words = ["abc","xyz"]
Output: "<b>abc</b><b>xyz</b>123"
Explanation: The strings "abc", and "xyz" exist in the string s, so they are surrounded by appropriate bold tags.

Example 2:

Input: s = "aaabbcc", words = ["aaa","aab","bc"]
Output: "<b>aaabbc</b>c"

Constraints:

  • 1 <= s.length <= 500
  • 0 <= words.length <= 100
  • 1 <= words[i].length <= 50
  • s and words[i] consist of lowercase English letters and digits.
  • All the values of words are unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input string `s` or the `words` array be empty or null?
  2. What is the maximum length of the input string `s` and the strings within the `words` array?
  3. If a word in `words` appears multiple times in `s`, should all occurrences be bolded?
  4. Are the words in the `words` array case-sensitive, or should the matching be case-insensitive?
  5. If the bolded segments overlap, should they be merged into a single, larger bolded segment?

Brute Force Solution

Approach

The brute force method for adding bold tags involves checking every possible position in the input string to see if any of the words we're looking for start at that position. For each position, we'll need to determine if we should add a bold tag.

Here's how the algorithm would work step-by-step:

  1. Look at the very first character in the main string.
  2. Check if any of the words in the list of words we're looking for start at that first character.
  3. If a word starts there, remember that position needs a bold tag.
  4. Move to the second character in the main string.
  5. Again, check if any of the words start at this second character.
  6. If they do, remember this position needs a bold tag too.
  7. Keep moving one character at a time through the entire main string, repeating the word-checking process.
  8. Once you've gone through the entire string, you'll know exactly which positions need bold tags.
  9. Finally, go through the string again and add the actual bold tags at those positions, creating the final output.

Code Implementation

def add_bold_tag_brute_force(string, words):
    string_length = len(string)
    bold_indices = [False] * string_length

    # Mark indices that need to be bolded
    for index in range(string_length):
        for word in words:
            if string[index:].startswith(word):

                # Mark indices corresponding to the matched word as True
                for word_index in range(len(word)): 
                    bold_indices[index + word_index] = True

    merged_string = ""
    bold_tag_open = False

    for index in range(string_length):
        if bold_indices[index] and not bold_tag_open:

            # Add open tag if current index is bold and tag is closed
            merged_string += "<b>"
            bold_tag_open = True

        elif not bold_indices[index] and bold_tag_open:

            # Close tag if current index is not bold and tag is open
            merged_string += "</b>"
            bold_tag_open = False

        merged_string += string[index]

    # Close tag if it's still open at the end of the string
    if bold_tag_open:
        merged_string += "</b>"

    return merged_string

Big(O) Analysis

Time Complexity
O(n*m*k)The algorithm iterates through the input string 's' of length 'n'. For each character in 's', it checks if any word in the 'words' array (of length 'm') starts at that position. Checking if a word starts at a given position requires comparing the substring of 's' starting at that position with each word in 'words', which takes O(k) time, where 'k' is the average length of the words in 'words'. Therefore, the overall time complexity is O(n*m*k).
Space Complexity
O(N)The algorithm creates a boolean array, let's call it 'bold', of size N, where N is the length of the input string. This array is used to remember which positions in the input string need to be bolded. The size of the 'bold' array scales linearly with the length of the input string. Therefore, the auxiliary space complexity is O(N).

Optimal Solution

Approach

The core idea is to identify the portions of the string that need to be bolded by tracking which characters match the bold word list. Then, we merge overlapping or adjacent bold sections to minimize the number of bold tags we need to insert. Finally, we walk through the string and insert the necessary bold tags.

Here's how the algorithm would work step-by-step:

  1. First, we need to know which letters of the string should be bolded. We go through each word in the list of words to bold and see if it appears in the main string. If a word does appear, we mark all the letters in that word's appearance as needing to be bolded.
  2. Next, we might have overlapping or adjacent sections that are marked as bold. To simplify things, we merge any bold sections that are touching or overlapping into single, larger bold sections.
  3. Now, we go through the string one letter at a time. When we reach the start of a bold section, we insert an opening bold tag. When we reach the end of a bold section, we insert a closing bold tag.
  4. The final string now contains the original text with the appropriate bold tags inserted to match the required words.

Code Implementation

def add_bold_tag(string, dictionary):
    string_length = len(string)
    bold_flags = [False] * string_length

    for word in dictionary:
        for i in range(string_length - len(word) + 1):
            if string[i:i+len(word)] == word:
                # Mark characters as bold if word is found
                for j in range(i, i + len(word)):
                    bold_flags[j] = True

    merged_intervals = []
    i = 0
    while i < string_length:
        if not bold_flags[i]:
            i += 1
            continue

        start = i
        while i < string_length and bold_flags[i]:
            i += 1
        end = i

        # Merge overlapping intervals into a single interval
        if merged_intervals and start <= merged_intervals[-1][1]:
            merged_intervals[-1] = (merged_intervals[-1][0], max(end, merged_intervals[-1][1]))
        else:
            merged_intervals.append((start, end))

    result = ""
    current_index = 0
    for start, end in merged_intervals:
        result += string[current_index:start]
        result += "<b>"
        result += string[start:end]
        result += "</b>"
        current_index = end

    #Append the remainder of the string
    result += string[current_index:]

    return result

Big(O) Analysis

Time Complexity
O(n*m*k)Let n be the length of the input string, m be the number of words in the `words` array, and k be the maximum length of a word in the `words` array. Step 1 involves iterating through each word in the `words` array (m words). For each word, we search for its occurrences within the input string. String searching can take O(n*k) time in the worst case for each word. Therefore the overall complexity is dominated by O(m*n*k) for step 1, as finding all occurrences of the word in the string can take time proportional to the string length times the length of the word. Step 2 iterates through the boolean array of length n at most once, so it will be O(n). Step 3 will also take O(n). Therefore the total time complexity is O(n*m*k) + O(n) + O(n) which simplifies to O(n*m*k).
Space Complexity
O(N)The algorithm uses a boolean array, of size N (where N is the length of the input string), to store which characters need to be bolded. While merging the intervals and inserting the tags, we are operating on a stringbuilder which in worst case will store the input string and the tags which depend on the length of the input string. Therefore, the auxiliary space used is proportional to the length of the input string, N, resulting in O(N) space complexity.

Edge Cases

CaseHow to Handle
Null string or null array of wordsReturn the original string if either the string or the words array are null to avoid NullPointerException.
Empty string or empty array of wordsReturn the original string if the input string is empty or if the array of words is empty as there is nothing to bold.
Words array contains empty stringsIgnore empty strings in the words array to prevent unexpected behavior in the matching logic.
Words array contains duplicate stringsThe algorithm should correctly handle duplicate words, potentially bolding the same region multiple times (which is acceptable).
Overlapping words in the stringThe algorithm needs to merge overlapping bold tags to produce a minimal and correct result.
Very long input string and a large number of wordsEnsure the time complexity of the solution is efficient (e.g., avoid brute-force search) to handle large inputs without timing out.
Words contain special characters or regular expression metacharactersEscape special characters in the words before using them in regular expression matching to avoid incorrect results.
Input string or words contain Unicode charactersEnsure the string comparison and manipulation are Unicode-aware to correctly handle non-ASCII characters.