Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.
A code snippet is valid if all the following rules hold:
<TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.< is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).<![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.Example 1:
Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>" Output: true Explanation: The code is wrapped in a closed tag : <DIV> and </DIV>. The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag. So TAG_CONTENT is valid, and then the code is valid. Thus return true.
Example 2:
Input: code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" Output: true Explanation: We first separate the code into : start_tag|tag_content|end_tag. start_tag -> "<DIV>" end_tag -> "</DIV>" tag_content could also be separated into : text1|cdata|text2. text1 -> ">> ![cdata[]] " cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>" text2 -> "]]>>]" The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.
Example 3:
Input: code = "<A> <B> </A> </B>" Output: false Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.
Constraints:
1 <= code.length <= 500code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
We want to check if a piece of text that uses special tags is valid. The brute force way is to meticulously go through the text and try to match every opening tag with a closing tag.
Here's how the algorithm would work step-by-step:
def is_valid_tag(code):
code_length = len(code)
index = 0
tags = []
if code_length == 0:
return False
while index < code_length:
if code[index] == '<':
if index + 1 < code_length and code[index + 1] == '!':
#Handle CDATA section
if code[index + 2:index + 9] == '[CDATA[':
close_cdata_index = code.find(']]>', index + 9)
if close_cdata_index == -1:
return False
index = close_cdata_index + 3
else:
return False
elif index + 1 < code_length and code[index + 1] == '/':
#Closing tag
end_index = code.find('>', index + 2)
if end_index == -1:
return False
tag_name = code[index + 2:end_index]
if not tags:
return False
if tags[-1] == tag_name:
tags.pop()
else:
return False
index = end_index + 1
else:
#Opening tag
end_index = code.find('>', index + 1)
if end_index == -1:
return False
tag_name = code[index + 1:end_index]
# Tag name validation
if not (1 <= len(tag_name) <= 9 and tag_name.isupper()):
return False
tags.append(tag_name)
index = end_index + 1
else:
#This represents text content.
index += 1
#Ensure no tags remain open
if tags:
return False
#Need to have at least one tag
first_tag_end = code.find('>')
if first_tag_end == -1:
return False
# Must have at least one valid tag
last_tag_start = code.rfind('<')
if last_tag_start == -1:
return False
return TrueThe challenge is to check if a piece of text, marked up with tags, is valid HTML-like code. The efficient way is to go through the text piece by piece, keeping track of what tags are currently open, and making sure everything closes in the right order and follows the rules.
Here's how the algorithm would work step-by-step:
def tag_validator(code):
code = code.replace('<![CDATA[', ' ').replace(']]>', ' ')
if not code:
return True
if not code.startswith('<') or not code.endswith('>'):
return False
stack_of_tags = []
index = 0
while index < len(code):
if code[index:index + 2] == '</':
end_index = code.find('>', index + 2)
if end_index == -1:
return False
tag_name = code[index + 2:end_index]
if not stack_of_tags or stack_of_tags[-1] != tag_name:
return False
stack_of_tags.pop()
index = end_index + 1
elif code[index] == '<':
if code[index + 1] == '!':
return False
end_index = code.find('>', index + 1)
if end_index == -1:
return False
tag_name = code[index + 1:end_index]
if not (1 <= len(tag_name) <= 9 and tag_name.isupper()):
return False
stack_of_tags.append(tag_name)
index = end_index + 1
else:
# Text content; move to next tag.
index += 1
# Stack must be empty for valid tag structure.
return not stack_of_tags| Case | How to Handle |
|---|---|
| Empty input string | Return false immediately as an empty string cannot represent a valid XML document with a root tag. |
| String with only whitespace | Return false as whitespace alone doesn't form a valid XML structure. |
| Unclosed root tag (e.g., <ABC>) | The stack should not be empty at the end, indicating an unclosed root tag, and should return false. |
| Unmatched closing tag (e.g., </ABC>) | The stack should not encounter a closing tag that doesn't match the top of the stack, indicating a tag mismatch, and should return false. |
| Invalid tag name (e.g., <123>) | Tag name validation should check for invalid characters or leading digits and return false if found. |
| Nested CDATA sections | The CDATA parsing logic should ensure nested CDATA sections are not allowed and return false if detected. |
| Code containing only CDATA | The code should be invalidated because a root tag is required, even if CDATA is present. |
| Tag with characters after the closing tag | If any characters exist after the closing root tag, the solution should return false. |