Your task is to parse the attached pseudo-html file into a data frame.
The file contains a list of SMARTS rules and descriptions, organized
as follows. The HTML tagging is not well-formed, so run tests until
the pandas dataframe looks right.
Rules are organised hierarchically:
H2 tags starting with a number are main topics
H2 without a number introduce subtopics (ignore badly formatted html)
H3 are sub-sub-topics
DT has rule name
DD rule contents (smarts)
if further DDs are encountered, they are comments
if several patterns appear in the DD separated by BR, split them