Taro Logo

Remove Comments

Medium
Hudson River Trading logo
Hudson River Trading
0 views
Topics:
ArraysStrings

Given a C++ program, remove comments from it. The program source is an array of strings source where source[i] is the ith line of the source code. This represents the result of splitting the original source code string by the newline character '\n'.

In C++, there are two types of comments, line comments, and block comments.

  • The string "//" denotes a line comment, which represents that it and the rest of the characters to the right of it in the same line should be ignored.
  • The string "/*" denotes a block comment, which represents that all characters until the next (non-overlapping) occurrence of "*/" should be ignored. (Here, occurrences happen in reading order: line by line from left to right.) To be clear, the string "/*/" does not yet end the block comment, as the ending would be overlapping the beginning.

The first effective comment takes precedence over others.

  • For example, if the string "//" occurs in a block comment, it is ignored.
  • Similarly, if the string "/*" occurs in a line or block comment, it is also ignored.

If a certain line of code is empty after removing comments, you must not output that line: each string in the answer list will be non-empty.

There will be no control characters, single quote, or double quote characters.

  • For example, source = "string s = "/* Not a comment. */";" will not be a test case.

Also, nothing else such as defines or macros will interfere with the comments.

It is guaranteed that every open block comment will eventually be closed, so "/*" outside of a line or block comment always starts a new comment.

Finally, implicit newline characters can be deleted by block comments. Please see the examples below for details.

After removing the comments from the source code, return the source code in the same format.

Example 1:

Input: source = ["/*Test program */", "int main()", "{ ", "  // variable declaration ", "int a, b, c;", "/* This is a test", "   multiline  ", "   comment for ", "   testing */", "a = b + c;", "}"]
Output: ["int main()","{ ","  ","int a, b, c;","a = b + c;","}"]
Explanation: The line by line code is visualized as below:
/*Test program */
int main()
{ 
  // variable declaration 
int a, b, c;
/* This is a test
   multiline  
   comment for 
   testing */
a = b + c;
}
The string /* denotes a block comment, including line 1 and lines 6-9. The string // denotes line 4 as comments.
The line by line output code is visualized as below:
int main()
{ 
  
int a, b, c;
a = b + c;
}

Example 2:

Input: source = ["a/*comment", "line", "more_comment*/b"]
Output: ["ab"]
Explanation: The original source string is "a/*comment\nline\nmore_comment*/b", where we have bolded the newline characters.  After deletion, the implicit newline characters are deleted, leaving the string "ab", which when delimited by newline characters becomes ["ab"].

Constraints:

  • 1 <= source.length <= 100
  • 0 <= source[i].length <= 80
  • source[i] consists of printable ASCII characters.
  • Every open block comment is eventually closed.
  • There are no single-quote or double-quote in the input.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of each string in the source array?
  2. Can a line contain both a block comment (/* ... */) and a line comment (// ...)? If so, what's the order of precedence if they overlap?
  3. What should the output be if the input source array is null or empty?
  4. Are block comments allowed to span multiple lines?
  5. If a '/*' sequence is encountered without a corresponding '*/' on the same or subsequent lines, should the remainder of the code be considered a comment until the end of the source, or is that considered an error condition?

Brute Force Solution

Approach

The brute force approach to removing comments involves meticulously examining the code line by line and character by character. We systematically check for the start and end markers of both block and line comments. This exhaustive search guarantees that all comments, according to the language's rules, are identified and removed.

Here's how the algorithm would work step-by-step:

  1. Go through the entire code, one piece at a time.
  2. Look for anything that signals the start of a comment. This could be a specific symbol or a combination of symbols.
  3. Once you find a comment start, determine what kind of comment it is (single-line or multi-line).
  4. If it's a single-line comment, ignore everything on that line after the comment start signal.
  5. If it's a multi-line comment, ignore everything from the comment start signal until you find the comment end signal.
  6. Continue going through the rest of the code, repeating these steps until you've examined every piece.
  7. Anything that wasn't inside a comment gets put together to form the final result.

Code Implementation

def remove_comments_brute_force(program_code):
    in_block_comment = False
    result = []
    index = 0

    while index < len(program_code):
        # Check for start of block comment
        if program_code[index:index + 2] == '/*':
            in_block_comment = True
            index += 2

        # If inside a block comment, look for the end
        elif in_block_comment:
            if program_code[index:index + 2] == '*/':
                # Found end of block comment, resume normal parsing
                in_block_comment = False
                index += 2
            else:
                index += 1

        # Check for start of line comment
        elif program_code[index:index + 2] == '//':
            # Skip the rest of the line
            index = program_code.find('
', index) 
            if index == -1:
                break
            index += 1

        else:
            # Append character if not in comment
            result.append(program_code[index])
            index += 1

    return "".join(result)

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the code character by character, which we can define as having a size of n. In the worst-case scenario, where there are very long single-line comments or multi-line comments that span a significant portion of the input, each character still only needs to be visited and checked a constant number of times to determine if it's the start or end of a comment. Thus, the time complexity grows linearly with the input size n, resulting in O(n).
Space Complexity
O(N)The algorithm iterates through the code, character by character, building the result. In the worst-case scenario where there are no comments to remove, or only very small comments, the algorithm effectively copies the entire code into a new string or list. This new string or list stores the comment-free code, requiring auxiliary space proportional to the input size N, where N is the total length of the input code. Therefore, the space complexity is O(N).

Optimal Solution

Approach

The goal is to process text and strip out comments. We need to carefully go through the text character by character, keeping track of whether we're inside a comment or not, and only keeping characters outside of comments.

Here's how the algorithm would work step-by-step:

  1. Go through the text one character at a time.
  2. Keep track of whether you're currently inside a single-line comment or a multi-line comment.
  3. If you find the start of a single-line comment, ignore everything until the end of that line.
  4. If you find the start of a multi-line comment, ignore everything until you find the end of the multi-line comment.
  5. If you're not inside any comment, add the current character to the result.
  6. Be careful to handle edge cases like nested comments or improperly formatted comments correctly.

Code Implementation

def remove_comments(program_text):
    result = ''
    in_single_line_comment = False
    in_multi_line_comment = False
    index = 0

    while index < len(program_text):
        if in_single_line_comment:
            if program_text[index] == '
':
                in_single_line_comment = False
                result += '
'
            index += 1
        elif in_multi_line_comment:
            if program_text[index:index + 2] == '*/':
                in_multi_line_comment = False
                index += 2
            else:
                index += 1
        else:
            if program_text[index:index + 2] == '//':
                # Begin single-line comment
                in_single_line_comment = True
                index += 2

            elif program_text[index:index + 2] == '/*':
                # Begin multi-line comment
                in_multi_line_comment = True
                index += 2

            else:
                # Append current character to result.
                result += program_text[index]
                index += 1

    return result

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the input text character by character once. For each character, it performs a constant amount of work: checking for comment start/end sequences and appending to the result if not inside a comment. Therefore, the runtime is directly proportional to the length of the input text, denoted as n, making it O(n).
Space Complexity
O(N)The algorithm constructs a result string by adding characters outside of comments. In the worst-case scenario, there are no comments, and the entire input text is added to the result. Therefore, the space required for the result string can grow linearly with the input size N, where N is the length of the input text. No other significant auxiliary space is used besides this result string. Thus, the space complexity is O(N).

Edge Cases

CaseHow to Handle
Null or empty input source arrayReturn an empty list immediately to handle the invalid input gracefully.
Empty source code line (empty string)Treat it as a line with no comments and add it directly to the result if no multiline comment is active.
Single line comment at the very beginning of the lineEntire line should be ignored after encountering the //.
Multiline comment at the very beginning of a lineStart ignoring content until the closing tag '*/' is found.
Multiline comment that spans multiple lines.Maintain a flag to indicate if inside a multiline comment and process each line accordingly.
Single line comment inside a multiline comment.It is ignored, multiline comment takes precedence until the closing '*/' is found.
Multiline comment followed immediately by a single-line comment on the same line.After closing the multiline comment, process the remaining part of the line considering the single-line comment if present.
Overlapping comments like '/* ... // ... */'Multiline comments have precedence, so '//' should be ignored inside a multiline comment.