Spaces:

tonigi
/

moliety2

Running

App Files Files Community

tonigi commited on Feb 20, 2025

Commit

a12e7b4

1 Parent(s): 6d7b849

move to data dir

Browse files

Files changed (7) hide show

app.py +1 -1
SMARTS_InteLigand.txt → data/SMARTS_InteLigand.txt +0 -0
data/daylight_smarts.yml +622 -0
data/smarts_examples.html +1353 -0
data/smarts_examples.txt +1272 -0
daylight-smarts.csv +0 -254
rawgroups.txt +0 -1145

app.py CHANGED Viewed

@@ -28,7 +28,7 @@ compiled_patterns = {name: Chem.MolFromSmarts(smart)
 def load_interligand_moieties():
     moieties = {}
     try:
-        with open("SMARTS_InteLigand.txt", "r") as f:
             for line in f:
                 line = line.strip()
                 if not line or line.startswith("#"):

 def load_interligand_moieties():
     moieties = {}
     try:
+        with open("data/SMARTS_InteLigand.txt", "r") as f:
             for line in f:
                 line = line.strip()
                 if not line or line.startswith("#"):

SMARTS_InteLigand.txt → data/SMARTS_InteLigand.txt RENAMED Viewed

File without changes

data/daylight_smarts.yml ADDED Viewed

	@@ -0,0 +1,622 @@

+groups:
+  - name: "2. Functional Groups by Element"
+    subgroups:
+      - name: "C"
+        subsubgroups:
+          - name: "alkane"
+            rules:
+              - name: "Alkyl Carbon"
+                smarts: "[CX4]"
+          - name: "alkene (-ene)"
+            rules:
+              - name: "Allenic Carbon"
+                smarts: "[$([CX2](=C)=C)]"
+              - name: "Vinylic Carbon"
+                smarts: "[$([CX3]=[CX3])]"
+                comment: "Ethenyl carbon"
+          - name: "alkyne (-yne)"
+            rules:
+              - name: "Acetylenic Carbon"
+                smarts: "[$([CX2]#C)]"
+          - name: "arene (Ar , aryl-, aromatic hydrocarbons)"
+            rules:
+              - name: "Arene"
+                smarts: "c"
+      - name: "C & O"
+        subsubgroups:
+          - name: "carbonyl"
+            rules:
+              - name: "Carbonyl group. Low specificity"
+                smarts: "[CX3]=[OX1]"
+                comment: "Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide."
+              - name: "Carbonyl group"
+                smarts: "[$([CX3]=[OX1]),$([CX3+]-[OX1-])]"
+                comment: "Hits either resonance structure"
+              - name: "Carbonyl with Carbon"
+                smarts: "[CX3](=[OX1])C"
+                comment: "Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester."
+              - name: "Carbonyl with Nitrogen."
+                smarts: "[OX1]=CN"
+                comment: "Hits amide, carbamic acid/ester, poly peptide"
+              - name: "Carbonyl with Oxygen."
+                smarts: "[CX3](=[OX1])O"
+                comment: "Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride  Won't hit aldehyde or ketone."
+              - name: "Acyl Halide"
+                smarts: "[CX3](=[OX1])[F,Cl,Br,I]"
+                comment: "acid halide, -oyl halide"
+              - name: "Aldehyde"
+                smarts: "[CX3H1](=O)[#6]"
+                comment: "-al"
+              - name: "Anhydride"
+                smarts: "[CX3](=[OX1])[OX2][CX3](=[OX1])"
+              - name: "Amide"
+                smarts: "[NX3][CX3](=[OX1])[#6]"
+                comment: "-amide"
+              - name: "Amidinium"
+                smarts: "[NX3][CX3]=[NX3+]"
+              - name: "Carbamate."
+                smarts: "[NX3,NX4+][CX3](=[OX1])[OX2,OX1-]"
+                comment: "Hits carbamic esters, acids, and zwitterions"
+              - name: "Carbamic ester"
+                smarts: "[NX3][CX3](=[OX1])[OX2H0]"
+              - name: "Carbamic acid."
+                smarts: "[NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]"
+                comment: "Hits carbamic acids and zwitterions."
+              - name: "Carboxylate Ion."
+                smarts: "[CX3](=O)[O-]"
+                comment: "Hits conjugate bases of carboxylic, carbamic, and carbonic acids."
+              - name: "Carbonic Acid or Carbonic Ester"
+                smarts: "[CX3](=[OX1])(O)O"
+                comment: "Carbonic Acid, Carbonic Ester, or combination"
+              - name: "Carbonic Acid or Carbonic Acid-Ester"
+                smarts: "[CX3](=[OX1])([OX2])[OX2H,OX1H0-1]"
+                comment: "Hits acid and conjugate base. Won't hit carbonic acid diester"
+              - name: "Carbonic Ester (carbonic acid diester)"
+                smarts: "C[OX2][CX3](=[OX1])[OX2]C"
+                comment: "Won't hit carbonic acid or combination carbonic acid/ester"
+              - name: "Carboxylic acid"
+                smarts: "[CX3](=O)[OX2H1]"
+                comment: "-oic acid, COOH"
+              - name: "Carboxylic acid or conjugate base."
+                smarts: "[CX3](=O)[OX1H0-,OX2H1]"
+              - name: "Cyanamide"
+                smarts: "[NX3][CX2]#[NX1]"
+              - name: "Ester Also hits anhydrides"
+                smarts: "[#6][CX3](=O)[OX2H0][#6]"
+                comment: "won't hit formic anhydride."
+              - name: "Ketone"
+                smarts: "[#6][CX3](=O)[#6]"
+                comment: "-one"
+          - name: "ether"
+            rules:
+              - name: "Ether"
+                smarts: "[OD2]([#6])[#6]"
+      - name: "H"
+        subsubgroups:
+          - name: "hydrogen atoms"
+            rules:
+              - name: "Hydrogen Atom"
+                smarts: "[H]"
+                comment: "Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]"
+              - name: "Not a Hydrogen Atom"
+                smarts: "[!#1]"
+                comment: "Hits SMILES that are not hydrogen atoms."
+              - name: "Proton"
+                smarts: "[H+]"
+                comment: "Hits positively charged hydrogen atoms: [H+]"
+          - name: "hydrogen count"
+            rules:
+              - name: "Mono-Hydrogenated Cation"
+                smarts: "[+H]"
+                comment: "Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]"
+              - name: "Not Mono-Hydrogenated"
+                smarts: "[!H] or [!H1]"
+                comment: "Hits atoms that don't have exactly one attached hydrogen."
+      - name: "N"
+        subsubgroups:
+          - name: "amide  mine (-amino)"
+            rules:
+              - name: "Primary or secondary amine, not amide."
+                smarts: "[NX3;H2,H1;!$(NC=O)]"
+                comment: "Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that '&' is the default operator and is higher precedence than ',' which is higher precedence than ';'. Will hit cyanamides and thioamides"
+              - name: "Enamine"
+                smarts: "[NX3][CX3]=[CX3]"
+              - name: "Primary amine, not amide."
+                smarts: "[NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6]"
+                comment: "Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)"
+              - name: "Two primary or secondary amines"
+                smarts: "[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]"
+                comment: "Here we use the disconnection symbol ('.') to match two separate unbonded identical patterns."
+              - name: "Enamine or Aniline Nitrogen"
+                smarts: "[NX3][$(C=C),$(cc)]"
+          - name: "amino acids"
+            rules:
+              - name: "Generic amino acid: low specificity."
+                smarts: "[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]"
+                comment: "For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
+              - name: "Dipeptide group. generic amino acid: low specificity."
+                smarts: "[NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]"
+                comment: "Won't hit pro or gly. Hits acids and conjugate bases."
+              - name: "Amino Acid"
+                smarts: "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]"
+                comment: "Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}"
+          - name: "amino acid side chains"
+            rules:
+              - name: "Alanine side chain"
+                smarts: "[CH3X4]"
+              - name: "Arginine side chain."
+                smarts: "[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]"
+                comment: "Hits acid and conjugate base."
+              - name: "Aspargine side chain."
+                smarts: "[CH2X4][CX3](=[OX1])[NX3H2]"
+                comment: "Also hits Gln side chain when used alone."
+              - name: "Aspartate (or Aspartic acid) side chain."
+                smarts: "[CH2X4][CX3](=[OX1])[OH0-,OH]"
+                comment: "Hits acid and conjugate base. Also hits Glu side chain when used alone."
+              - name: "Cysteine side chain."
+                smarts: "[CH2X4][SX2H,SX1H0-]"
+                comment: "Hits acid and conjugate base"
+              - name: "Glutamate (or Glutamic acid) side chain."
+                smarts: "[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]"
+                comment: "Hits acid and conjugate base."
+              - name: "Glycine"
+                smarts: "[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]"
+              - name: "Histidine side chain."
+                smarts: "[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1"
+                comment: "Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H])."
+              - name: "Isoleucine side chain"
+                smarts: "[CHX4]([CH3X4])[CH2X4][CH3X4]"
+              - name: "Leucine side chain"
+                smarts: "[CH2X4][CHX4]([CH3X4])[CH3X4]"
+              - name: "Lysine side chain."
+                smarts: "[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]"
+                comment: "Acid and conjugate base"
+              - name: "Methionine side chain"
+                smarts: "[CH2X4][CH2X4][SX2][CH3X4]"
+              - name: "Phenylalanine side chain"
+                smarts: "[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1"
+              - name: "Proline"
+                smarts: "[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]"
+              - name: "Serine side chain"
+                smarts: "[CH2X4][OX2H]"
+              - name: "Thioamide"
+                smarts: "[NX3][CX3]=[SX1]"
+              - name: "Threonine side chain"
+                smarts: "[CHX4]([CH3X4])[OX2H]"
+              - name: "Tryptophan side chain"
+                smarts: "[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12"
+              - name: "Tyrosine side chain."
+                smarts: "[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1"
+                comment: "Acid and conjugate base"
+              - name: "Valine side chain"
+                smarts: "[CHX4]([CH3X4])[CH3X4]"
+          - name: "azide (-azido)"
+            rules:
+              - name: "Azide group."
+                smarts: "[$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]"
+                comment: "Hits any atom with an attached azide."
+              - name: "Azide ion."
+                smarts: "[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]"
+                comment: "Hits N in azide ion"
+          - name: "azo"
+            rules:
+              - name: "Nitrogen."
+                smarts: "[#7]"
+                comment: "Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of 'azo'"
+              - name: "Azo Nitrogen. Low specificity."
+                smarts: "[NX2]=N"
+                comment: "Hits diazene, azoxy and some diazo structures"
+              - name: "Azo Nitrogen.diazene"
+                smarts: "[NX2]=[NX2]"
+                comment: "(diaza alkene)"
+              - name: "Azoxy Nitrogen."
+                smarts: "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]"
+              - name: "Diazo Nitrogen"
+                smarts: "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]"
+              - name: "Azole."
+                smarts: "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]"
+                comment: "5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S)  subclasses are furo-, thio-, pyrro-  (replace CH o' furfuran, thiophene, pyrrol w/ N)"
+          - name: "hydrazine"
+            rules:
+              - name: "Hydrazine H2NNH2"
+                smarts: "[NX3][NX3]"
+          - name: "hydrazone"
+            rules:
+              - name: "Hydrazone C=NNH2"
+                smarts: "[NX3][NX2]=[*]"
+          - name: "imine"
+            rules:
+              - name: "Substituted imine"
+                smarts: "[CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]"
+                comment: "Schiff base"
+              - name: "Substituted or un-substituted imine"
+                smarts: "[$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]"
+              - name: "Iminium"
+                smarts: "[NX3+]=[CX3]"
+          - name: "imide"
+            rules:
+              - name: "Unsubstituted dicarboximide"
+                smarts: "[CX3](=[OX1])[NX3H][CX3](=[OX1])"
+              - name: "Substituted dicarboximide"
+                smarts: "[CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])"
+              - name: "Dicarboxdiimide"
+                smarts: "[CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])"
+          - name: "nitrate"
+            rules:
+              - name: "Nitrate group"
+                smarts: "[$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]"
+                comment: "Also hits nitrate anion"
+              - name: "Nitrate Anion"
+                smarts: "[$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]"
+          - name: "nitrile"
+            rules:
+              - name: "Nitrile"
+                smarts: "[NX1]#[CX2]"
+              - name: "Isonitrile"
+                smarts: "[CX1-]#[NX2+]"
+          - name: "nitro"
+            rules:
+              - name: "Nitro group."
+                smarts: "[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]"
+                comment: "Hits both forms."
+              - name: "Two Nitro groups"
+                smarts: "[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]"
+          - name: "nitroso"
+            rules:
+              - name: "Nitroso-group"
+                smarts: "[NX2]=[OX1]"
+          - name: "n-oxide"
+            rules:
+              - name: "N-Oxide"
+                smarts: "[$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]"
+                comment: "Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate."
+      - name: "O"
+        subsubgroups:
+          - name: "hydroxyl (includes alcohol, phenol)"
+            rules:
+              - name: "Hydroxyl"
+                smarts: "[OX2H]"
+              - name: "Hydroxyl in Alcohol"
+                smarts: "[#6][OX2H]"
+              - name: "Hydroxyl in Carboxylic Acid"
+                smarts: "[OX2H][CX3]=[OX1]"
+              - name: "Hydroxyl in H-O-P-"
+                smarts: "[OX2H]P"
+              - name: "Enol"
+                smarts: "[OX2H][#6X3]=[#6]"
+              - name: "Phenol"
+                smarts: "[OX2H][cX3]:[c]"
+              - name: "Enol or Phenol"
+                smarts: "[OX2H][$(C=C),$(cc)]"
+              - name: "Hydroxyl_acidic"
+                smarts: "[$([OH]-*=[!#6])]"
+                comment: "An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids."
+          - name: "peroxide"
+            rules:
+              - name: "Peroxide groups."
+                smarts: "[OX2,OX1-][OX2,OX1-]"
+                comment: "Also hits anions."
+      - name: "P"
+        subsubgroups:
+          - name: "phosphoric compounds"
+            rules:
+              - name: "Phosphoric_acid groups."
+                smarts: "[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]"
+                comment: "Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di-  esters on linear triphosphoric acid and longer)."
+              - name: "Phosphoric_ester groups."
+                smarts: "[$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]"
+                comment: "Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups."
+      - name: "S"
+        subsubgroups:
+          - name: "thio groups ( thio-, thi-, sulpho-, mercapto- )"
+            rules:
+              - name: "Carbo-Thiocarboxylate"
+                smarts: "[S-][CX3](=S)[#6]"
+              - name: "Carbo-Thioester"
+                smarts: "S([#6])[CX3](=O)[#6]"
+              - name: "Thio analog of carbonyl"
+                smarts: "[#6X3](=[SX1])([!N])[!N]"
+                comment: "Where S replaces O.  Not a thioamide."
+              - name: "Thiol, Sulfide or Disulfide Sulfur"
+                smarts: "[SX2]"
+              - name: "Thiol"
+                smarts: "[#16X2H]"
+              - name: "Sulfur with at-least one hydrogen."
+                smarts: "[#16!H0]"
+              - name: "Thioamide"
+                smarts: "[NX3][CX3]=[SX1]"
+          - name: "sulfide"
+            rules:
+              - name: "Sulfide"
+                smarts: "[#16X2H0]"
+                comment: "-alkylthio  Won't hit thiols. Hits disulfides."
+              - name: "Mono-sulfide"
+                smarts: "[#16X2H0][!#16]"
+                comment: "alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides."
+              - name: "Di-sulfide"
+                smarts: "[#16X2H0][#16X2H0]"
+                comment: "Won't hit thiols. Won't hit mono-sulfides."
+              - name: "Two Sulfides"
+                smarts: "[#16X2H0][!#16].[#16X2H0][!#16]"
+                comment: "Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides."
+          - name: "sulfinate"
+            rules:
+              - name: "Sulfinate"
+                smarts: "[$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]"
+                comment: "Won't hit Sulfinic Acid.  Hits Both Depiction Forms."
+              - name: "Sulfinic Acid"
+                smarts: "[$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]"
+                comment: "Won't hit substituted Sulfinates.  Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate)."
+          - name: "sulfone"
+            rules:
+              - name: "Sulfone.  Low specificity."
+                smarts: "[$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]"
+                comment: "Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms."
+              - name: "Sulfone. High specificity."
+                smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]"
+                comment: "Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms."
+              - name: "Sulfonic acid.  High specificity."
+                smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]"
+                comment: "Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base.  Hits Both Depiction Forms. Hits Arene sulfonic acids."
+              - name: "Sulfonate"
+                smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]"
+                comment: "(sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted).  Hits Both Depiction Forms."
+              - name: "Sulfonamide."
+                smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]"
+                comment: "Only hits carbo- sulfonamide. Hits Both Depiction Forms."
+              - name: "Carbo-azosulfone"
+                smarts: "[SX4](C)(C)(=O)=N"
+                comment: "Partial N-Analog of Sulfone"
+              - name: "Sulfonamide"
+                smarts: "[$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]"
+                comment: "(sulf drugs)  Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms."
+          - name: "sulfoxide"
+            rules:
+              - name: "Sulfoxide Low specificity."
+                smarts: "[$([#16X3]=[OX1]),$([#16X3+][OX1-])]"
+                comment: "( sulfinyl, thionyl )   Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones."
+              - name: "Sulfoxide High specificity"
+                smarts: "[$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]"
+                comment: "(sulfinyl , thionyl)  Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms. Won't hit sulfones."
+          - name: "sulfate"
+            rules:
+              - name: "Sulfate"
+                smarts: "[$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]"
+                comment: "(sulfuric acid monoester)  Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms."
+              - name: "Sulfuric acid ester (sulfate ester)  Low specificity."
+                smarts: "[$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]"
+                comment: "Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms."
+              - name: "Sulfuric Acid Diester."
+                smarts: "[$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]"
+                comment: "Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms."
+          - name: "sulfamate"
+            rules:
+              - name: "Sulfamate."
+                smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]"
+                comment: "Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms."
+              - name: "Sulfamic Acid."
+                smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]"
+                comment: "Hits acid and conjugate base. Hits Both Depiction Forms."
+          - name: "sulfene"
+            rules:
+              - name: "Sulfenic acid."
+                smarts: "[#16X2][OX2H,OX1H0-]"
+                comment: "Hits acid and conjugate base."
+              - name: "Sulfenate."
+                smarts: "[#16X2][OX2H0]"
+      - name: "X"
+        subsubgroups:
+          - name: "halide (-halo -fluoro -chloro -bromo -iodo)"
+            rules:
+              - name: "Any carbon attached to any halogen"
+                smarts: "[#6][F,Cl,Br,I]"
+              - name: "Halogen"
+                smarts: "[F,Cl,Br,I]"
+              - name: "Three_halides groups"
+                smarts: "[F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]"
+                comment: "Hits SMILES that have three halides."
+          - name: "acyl halide"
+            rules:
+              - name: "Acyl Halide"
+                smarts: "[CX3](=[OX1])[F,Cl,Br,I]"
+                comment: "(acid halide, -oyl halide)"
+  - name: "3. Gross Structual Features"
+    subgroups:
+      - name: "Chirality"
+        rules:
+          - name: "Specified chiral carbon."
+            smarts: "[$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]"
+            comment: "Matches carbons whose chirality is specified (clockwise or anticlockwise)  Will not match molecules whose chirality is unspecified but that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e. implicit H)."
+          - name: "\"No-conflict\" chiral match"
+            smarts: "C[C@?](F)(Cl)Br"
+            comment: "Will match molecules with chiralities as specified or unspecified."
+          - name: "\"No-conflict\" chiral match where an H is present"
+            smarts: "C[C@?H](Cl)Br"
+            comment: "Will match molecules with chiralities as specified or unspecified."
+      - name: "Orbital Configuration"
+        rules:
+          - name: "sp2 cationic carbon"
+            smarts: "[$([cX2+](:*):*)]"
+            comment: "Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital"
+          - name: "Aromatic sp2 carbon."
+            smarts: "[$([cX3](:*):*),$([cX2+](:*):*)]"
+            comment: "The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)"
+          - name: "Any sp2 carbon."
+            smarts: "[$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]"
+            comment: "The first recursive SMARTS matches carbons that are three-connected and aromatic.  The second case matches two-connected aromatic carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital).  The third case matches three-connected non-aromatic carbons (alkenes). The fourth case matches non-aromatic cationic alkene carbons."
+          - name: "Any sp2 nitrogen."
+            smarts: "[$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]"
+            comment: "Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified),  either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), respectively."
+          - name: "Explicit Hydrogen on sp2-Nitrogen"
+            smarts: "[$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]"
+            comment: "(H must be an isotope or ion)"
+          - name: "sp3 nitrogen"
+            smarts: "[$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)]"
+            comment: "One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded."
+          - name: "Explicit Hydrogen on an sp3 N."
+            smarts: "[$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])]"
+            comment: "One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)"
+          - name: "sp2 N in N-Oxide"
+            smarts: "[$([$([NX3]=O),$([NX3+][O-])])]"
+          - name: "sp3 N in N-Oxide   Exclusive:"
+            smarts: "[$([$([NX4]=O),$([NX4+][O-])])]"
+            comment: "Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O."
+          - name: "sp3 N in N-Oxide Inclusive:"
+            smarts: "[$([$([NX4]=O),$([NX4+][O-,#0])])]"
+            comment: "Hits if O could be present. Hits if * if used in place of O in smiles."
+      - name: "Connectivity"
+        rules:
+          - name: "Quaternary Nitrogen"
+            smarts: "[$([NX4+]),$([NX4]=*)]"
+            comment: "Hits non-aromatic Ns."
+          - name: "Tricoordinate S double bonded to N."
+            smarts: "[$([SX3]=N)]"
+          - name: "S double-bonded to Carbon"
+            smarts: "[$([SX1]=[#6])]"
+            comment: "Hits terminal (1-connected S)"
+          - name: "Triply bonded N"
+            smarts: "[$([NX1]#*)]"
+          - name: "Divalent Oxygen"
+            smarts: "[$([OX2])]"
+      - name: "Chains & Branching"
+        rules:
+          - name: "Unbranched_alkane groups."
+            smarts: "[R0;D2][R0;D2][R0;D2][R0;D2]"
+            comment: "Only hits alkanes (single-bond chains).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched)."
+          - name: "Unbranched_chain groups."
+            smarts: "[R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]"
+            comment: "Hits any bond (single, double, triple).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched)."
+          - name: "Long_chain groups."
+            smarts: "[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]"
+            comment: "Aliphatic chains at-least 8 members long."
+          - name: "Atom_fragment"
+            smarts: "[!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]"
+            comment: "(CLOGP definition) A fragment atom is a not an isolating carbon"
+          - name: "Carbon_isolating"
+            smarts: "[$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]"
+            comment: "This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromatic hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom."
+          - name: "Terminal S bonded to P"
+            smarts: "[$([SX1]~P)]"
+          - name: "Nitrogen on -N-C=N-"
+            smarts: "[$([NX3]C=N)]"
+          - name: "Nitrogen on -N-N=C-"
+            smarts: "[$([NX3]N=C)]"
+          - name: "Nitrogen on -N-N=N-"
+            smarts: "[$([NX3]N=N)]"
+          - name: "Oxygen in -O-C=N-"
+            smarts: "[$([OX2]C=N)]"
+      - name: "Rotation"
+        rules:
+          - name: "Rotatable bond"
+            smarts: "[!$(*#*)&!D1]-!@[!$(*#*)&!D1]"
+            comment: "An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to an equivalent atom. Note that logical operators can be applied to bonds (\"-&!@\"). Here, the overall SMARTS consists of two atoms and one bond. The bond is \"site and not ring\". *#* any atom triple bonded to any atom.  By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which would be considered rotatable without this specification."
+      - name: "Cyclic Features"
+        rules:
+          - name: "Bicyclic"
+            smarts: "[$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]"
+            comment: "Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms."
+          - name: "Ortho"
+            smarts: "*-!:aa-!:*"
+            comment: "Ortho-substituted ring"
+          - name: "Meta"
+            smarts: "*-!:aaa-!:*"
+            comment: "Meta-substituted ring"
+          - name: "Para"
+            smarts: "*-!:aaaa-!:*"
+            comment: "Para-substituted ring"
+          - name: "Acylic-bonds"
+            smarts: "*!@*"
+          - name: "Single bond and not in a ring"
+            smarts: "*-!@*"
+          - name: "Non-ring atom"
+            smarts: "[R0] or [!R]"
+          - name: "Macrocycle groups."
+            smarts: "[r;!r3;!r4;!r5;!r6;!r7]"
+            comment: "Macrocycle groups."
+          - name: "S in aromatic 5-ring with lone pair"
+            smarts: "[sX2r5]"
+          - name: "Aromatic 5-Ring O with Lone Pair"
+            smarts: "[oX2r5]"
+          - name: "N in 5-sided aromatic ring"
+            smarts: "[nX2r5]"
+          - name: "Spiro-ring center"
+            smarts: "[X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6"
+          - name: "N in 5-ring arom"
+            smarts: "[$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])]"
+            comment: "anion"
+          - name: "CIS or TRANS double bond in a ring"
+            smarts: "*/,\\[R]=;@[R]/,\\*"
+            comment: "An isomeric SMARTS consisting of four atoms and three bonds."
+          - name: "CIS or TRANS double or aromatic bond in a ring"
+            smarts: "*/,\\[R]=,:;@[R]/,\\*"
+          - name: "Unfused benzene ring"
+            smarts: "[cR1]1[cR1][cR1][cR1][cR1][cR1]1"
+            comment: "To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:"
+          - name: "Multiple non-fused benzene rings"
+            smarts: "[cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1"
+            comment: "To find multiple non-fused benzene rings"
+          - name: "Fused benzene rings"
+            smarts: "c12ccccc1cccc2"
+  - name: "4. Meta-SMARTS"
+    subgroups:
+      - name: "Amino Acids"
+        rules:
+          - name: "Generic amino acid: low specificity."
+            smarts: "[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]"
+            comment: "For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
+          - name: "A.A. Template for 20 standard a.a.s"
+            smarts: "[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]"
+            comment: "Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get 'any standard a.a.' Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
+          - name: "Proline"
+            smarts: "[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]"
+          - name: "Glycine"
+            smarts: "[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]"
+          - name: "Other a.a."
+            smarts: "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]"
+            comment: "Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
+      - name: "Recursive or Multiple"
+        rules:
+          - name: "Ortho"
+            smarts: "[SMARTS_expression]-!:aa-!:[SMARTS_expression]"
+          - name: "Meta"
+            smarts: "[SMARTS_expression]-!:aaa-!:[SMARTS_expression]"
+          - name: "Para"
+            smarts: "[SMARTS_expression]-!:aaaa-!:[SMARTS_expression]"
+          - name: "Hydrogen"
+            smarts: "[$([#1][SMARTS_expression])]"
+            comment: "Hydrogen must be explicit i.e. an isotope or charged"
+          - name: "Nitrogen"
+            smarts: "[$([#7][SMARTS_expression])]"
+          - name: "Oxygen"
+            smarts: "[$([#8][SMARTS_expression])]"
+          - name: "Fluorine"
+            smarts: "[$([#9][SMARTS_expression])]"
+          - name: "Two possible groups"
+            smarts: "[$(SMARTS_expression_A),$(SMARTS_expression_B)]"
+            comment: "Hits atoms in either environment or group of interest, A or B."
+      - name: "Tools & Tricks"
+        rules:
+          - name: "Any carbon aromatic or non-aromatic"
+            smarts: "[#6] or [c,C]"
+          - name: "SMILES wildcard"
+            smarts: "[#0]"
+            comment: "This SMARTS hits the SMILES *"
+          - name: "Factoring"
+            smarts: "[OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]"
+            comment: "Factor out common atomic expressions in the recursive SMARTS. May improve human readability."
+          - name: "High-precidence 'and'"
+            smarts: "[N&X4&+,N&X3&+0] or [NX4+,NX3+0]"
+            comment: "High-precidence 'and' (&) is the default operator. 'Or' (,) is higher precidence than & and low-precidence 'and' (;) is lower precidence than &."
+  - name: "5. Electron & Proton Features"
+    subgroups:
+      - name: "Acids & Bases"
+        rules:
+          - name: "Acid"
+            smarts: "[!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]"
+            comment: "Proton donor"
+          - name: "Carboxylic acid"
+            smarts: "[CX3](=O)[OX2H1]"
+            comment: "(-oic acid, COOH)"
+          - name: "Carboxylic acid or conjugate base."
+            smarts: "[CX3](=O)[OX..."
+            comment: "The file is truncated beyond this point."

data/smarts_examples.html ADDED Viewed

	@@ -0,0 +1,1353 @@

+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+   "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+    <title>Daylight&gt;SMARTS Examples</title>
+    <link rel="stylesheet" href="/b.css" type="text/css">
+    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+</head>
+<body>
+<table width=750 cellpadding=0 cellspacing=0 border=0>
+    <tr>
+    <td align=center> <iframe src="/iframes/header2.html" name="iframe4" width="745" height="170"
+         scrolling="no" frameborder="0"></iframe></td>
+    </tr>
+</table>
+<table width=750 cellpadding=15>
+    <tr><td class="border-bot">
+<center><h1>SMARTS Examples
+</h1></center>
+<a name="TOP"></a><h2>Table of Contents</h2>
+  <a href="#INTRO">1. Introduction</a><br>
+  <a href="#GROUP">2. Functional Groups by Element</a><br>
+  <a href="#STRUCTUAL">2. Gross Structual Features</a><br>
+  <a href="#META">4. Meta-SMARTS</a><br>
+    <a href="#E-">5. Electron &amp; Proton Features</a><br>
+<a href="#BREAK">6. Breakdown of Complex SMARTS</a><br>
+  <a href="#EXMPL">7. Interesting Example SMARTS</a><br>
+<br>
+<a NAME="INTRO"></a>
+<H2>
+1. Introduction
+</H2>
+When using SMARTS to do searches, it is often helpful to have
+   example queries from which to start.  This document contains
+   many potentially useful example SMARTS which may be used to
+   perform searches. templates, examples and ideas.
+<br><br>
+These SMARTS have been tested, but they may still contain errors.
+   Please send corrections, improvements, additions, and questions to
+   <A HREF="mailto:support@daylight.com">support@daylight.com.</A>
+<br><br>
+<a NAME="GROUP"></a>
+<H2>
+2. Functional Groups by Element
+</H2>
+<table border=1 COLS=8 WIDTH="750"><tr>
+   <td align=center><a href="#C">C</a></td>
+   <td align=center><a href="#CO">C&amp;O</a></td>
+   <td align=center><a href="#H">H</a></td>
+   <td align=center><a href="#N">N</a></td>
+   <td align=center><a href="#O">O</a></td>
+   <td align=center><a href="#P">P</a></td>
+   <td align=center><a href="#S">S</a></td>
+   <td align=center><a href="#X">X</a></td></tr>
+</table><br>
+<a NAME="C"></a><h2></a>C</h2>
+<h3> alkane </h3><dl>
+<p><dt> Alkyl Carbon
+   <dd> [CX4]</p></dl><br>
+<h3> alkene (-ene) </h3><dl>
+<p><dt> Allenic Carbon
+   <dd> [$([CX2](=C)=C)]
+<p><dt> Vinylic Carbon
+   <dd> [$([CX3]=[CX3])]
+   <dd> Ethenyl carbon </p></dl><br>
+<h3> alkyne (-yne) </h3><dl>
+<p><dt> Acetylenic Carbon
+   <dd> [$([CX2]#C)]</p></dl><br>
+<h3> arene (Ar , aryl-, aromatic hydrocarbons) </h3><dl>
+<p><dt> Arene
+   <dd> c </p></dl><br>
+<a NAME="CO"></a><h2>C &amp; O</h2>
+<h3>carbonyl</h3><dl>
+<p><dt> Carbonyl group. Low specificity
+   <dd> [CX3]=[OX1]
+   <dd> Hits carboxylic acid, ester, ketone, aldehyde, carbonic
+        acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
+<p><dt> Carbonyl group
+   <dd> [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
+   <dd> Hits either resonance structure
+<p><dt> Carbonyl with Carbon
+   <dd> [CX3](=[OX1])C
+   <dd> Hits aldehyde, ketone, carboxylic acid (except formic), anhydride
+        (except formic), acyl halides (acid halides). Won't hit carbamic
+        acid/ester, carbonic acid/ester.
+<p><dt> Carbonyl with Nitrogen.
+   <dd> [OX1]=CN
+   <dd> Hits amide, carbamic acid/ester, poly peptide
+<p><dt> Carbonyl with Oxygen.
+   <dd> [CX3](=[OX1])O
+   <dd> Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid
+        or ester, anhydride  Won't hit aldehyde or ketone.
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> acid halide, -oyl halide
+<p><dt> Aldehyde
+   <dd> [CX3H1](=O)[#6]
+   <dd> -al
+<p><dt> Anhydride
+   <dd> [CX3](=[OX1])[OX2][CX3](=[OX1])
+<p><dt> Amide
+   <dd> [NX3][CX3](=[OX1])[#6]
+   <dd> -amide
+<p><dt> Amidinium
+   <dd> [NX3][CX3]=[NX3+]
+<p><dt> Carbamate.
+   <dd> [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
+   <dd> Hits carbamic esters, acids, and zwitterions
+<p><dt> Carbamic ester
+   <dd> [NX3][CX3](=[OX1])[OX2H0]
+<p><dt> Carbamic acid.
+   <dd> [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
+   <dd> Hits carbamic acids and zwitterions.
+<p><dt> Carboxylate Ion.
+   <dd> [CX3](=O)[O-]
+   <dd> Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
+<p><dt> Carbonic Acid or Carbonic Ester
+   <dd> [CX3](=[OX1])(O)O
+   <dd> Carbonic Acid, Carbonic Ester, or combination
+<p><dt> Carbonic Acid or Carbonic Acid-Ester
+   <dd> [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
+   <dd> Hits acid and conjugate base. Won't hit carbonic acid diester
+<p><dt> Carbonic Ester (carbonic acid diester)
+   <dd> C[OX2][CX3](=[OX1])[OX2]C
+   <dd> Won't hit carbonic acid or combination carbonic acid/ester
+<p><dt> Carboxylic acid
+   <dd> [CX3](=O)[OX2H1]
+   <dd> -oic acid, COOH
+<p><dt> Carboxylic acid or conjugate base.
+   <dd> [CX3](=O)[OX1H0-,OX2H1]
+<p><dt> Cyanamide
+   <dd> [NX3][CX2]#[NX1]
+<p><dt> Ester Also hits anhydrides
+   <dd> [#6][CX3](=O)[OX2H0][#6]
+   <dd> won't hit formic anhydride.
+<p><dt> Ketone
+   <dd> [#6][CX3](=O)[#6]
+   <dd> -one </p></dl><br>
+<h3> ether</h3><dl>
+<p><dt> Ether
+   <dd> [OD2]([#6])[#6]</p></dl><br>
+<a NAME="H"></a><h2></a>H</h2>
+<h3> hydrogen atoms</h3><dl>
+<p><dt> Hydrogen Atom
+   <dd> [H]
+   <dd> Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
+<p><dt> Not a Hydrogen Atom
+   <dd> [!#1]
+   <dd> Hits SMILES that are not hydrogen atoms.
+<p><dt> Proton
+   <dd> [H+]
+   <dd> Hits positively charged hydrogen atoms: [H+]</p></dl><br>
+<h3> hydrogen count</h3><dl>
+<p><dt> Mono-Hydrogenated Cation
+   <dd> [+H]
+   <dd> Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
+<p><dt> Not Mono-Hydrogenated
+   <dd> [!H] or [!H1]
+   <dd> Hits atoms that don't have exactly one attached hydrogen.</p></dl><br>
+<a NAME="N"></a><h2>N</h2>
+<h3> amide </b> see carbonyl</p><br>
+mine (-amino) </h3><dl>
+<p><dt> Primary or secondary amine, not amide.
+   <dd> [NX3;H2,H1;!$(NC=O)]
+   <dd> Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 &amp; H1 respectively).  Also note that "&amp;" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
+<p><dt> Enamine
+   <dd> [NX3][CX3]=[CX3]
+<p><dt> Primary amine, not amide.
+   <dd> [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
+<p><dt> Two primary or secondary amines
+   <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
+   <dd> Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
+<p><dt> Enamine or Aniline Nitrogen
+   <dd> [NX3][$(C=C),$(cc)]</p></dl><br>
+<h3> amino acids</h3><dl>
+<p><dt> Generic amino acid: low specificity.
+   <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
+   <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
+<p><dt> Dipeptide group. generic amino acid: low specificity.
+   <dd> [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
+   <dd> Won't hit pro or gly. Hits acids and conjugate bases.
+<p><dt> Amino Acid
+   <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
+   <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/i
+n polypeptides (internal, or terminal). {e.g. usage:  Alanine side chain is  [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([
+CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}</p></dl><br>
+<h3> amino acid side chains</h3><dl>
+<p><dt> Alanine side chain
+   <dd> [CH3X4]
+<p><dt> Arginine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
+   <dd> Hits acid and conjugate base.
+<p><dt> Aspargine side chain.
+   <dd> [CH2X4][CX3](=[OX1])[NX3H2]
+   <dd> Also hits Gln side chain when used alone.
+<p><dt> Aspartate (or Aspartic acid) side chain.
+   <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
+<p><dt> Cysteine side chain.
+   <dd> [CH2X4][SX2H,SX1H0-]
+   <dd> Hits acid and conjugate base
+<p><dt> Glutamate (or Glutamic acid) side chain.
+   <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base.
+<p><dt> Glycine
+   <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
+<p><dt> Histidine side chain.
+   <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
+   <dd> Hits acid &amp; conjugate base for either Nitrogen.  Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
+2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
+<p><dt> Isoleucine side chain
+   <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
+<p><dt> Leucine side chain
+   <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
+<p><dt> Lysine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
+   <dd> Acid and conjugate base
+<p><dt> Methionine side chain
+   <dd> [CH2X4][CH2X4][SX2][CH3X4]
+<p><dt> Phenylalanine side chain
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
+<p><dt> Proline
+   <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> Serine side chain
+   <dd> [CH2X4][OX2H]
+<p><dt> Thioamide
+   <dd> [NX3][CX3]=[SX1]
+<p><dt> Threonine side chain
+   <dd> [CHX4]([CH3X4])[OX2H]
+<p><dt> Tryptophan side chain
+   <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
+<p><dt> Tyrosine side chain.
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
+   <dd> Acid and conjugate base
+<p><dt> Valine side chain
+   <dd> [CHX4]([CH3X4])[CH3X4]
+<p><dt> Alanine side chain
+   <dd> [CH3X4]
+<p><dt> Arginine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
+   <dd> Hits acid and conjugate base.
+<p><dt> Aspargine side chain.
+   <dd> [CH2X4][CX3](=[OX1])[NX3H2]
+   <dd> Also hits Gln side chain when used alone.
+<p><dt> Aspartate (or Aspartic acid) side chain.
+   <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
+<p><dt> Cysteine side chain.
+   <dd> [CH2X4][SX2H,SX1H0-]
+   <dd> Hits acid and conjugate base
+<p><dt> Glutamate (or Glutamic acid) side chain.
+   <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base.
+<p><dt> Glycine
+   <dd> N[CX4H2][CX3](=[OX1])[O,N]
+<p><dt> Histidine side chain.
+   <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
+   <dd> Hits acid &amp; conjugate base for either Nitrogen.  Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
+2-connected without any Hs)] where there is a second-neighbor who is [3-connected
+<p><dt> Isoleucine side chain
+   <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
+<p><dt> Leucine side chain
+   <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
+<p><dt> Lysine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
+   <dd> Acid and conjugate base
+<p><dt> Methionine side chain
+   <dd> [CH2X4][CH2X4][SX2][CH3X4]
+<p><dt> Phenylalanine side chain
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
+<p><dt> Proline
+   <dd> N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
+<p><dt> Serine side chain
+   <dd> [CH2X4][OX2H]
+<p><dt> Threonine side chain
+   <dd> [CHX4]([CH3X4])[OX2H]
+<p><dt> Tryptophan side chain
+   <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
+<p><dt> Tyrosine side chain.
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
+   <dd> Acid and conjugate base
+<p><dt> Valine side chain
+   <dd> [CHX4]([CH3X4])[CH3X4]</p></dl><br>
+<h3> azide (-azido) </h3><dl>
+<p><dt> Azide group.
+   <dd> [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
+   <dd> Hits any atom with an attached azide.
+<p><dt> Azide ion.
+   <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
+   <dd> Hits N in azide ion</p></dl><br>
+<h3> azo </h3><dl>
+<p><dt> Nitrogen.
+   <dd> [#7]
+   <dd> Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
+<p><dt> Azo Nitrogen. Low specificity.
+   <dd> [NX2]=N
+   <dd> Hits diazene, azoxy and some diazo structures
+<p><dt> Azo Nitrogen.diazene
+   <dd> [NX2]=[NX2]
+   <dd> (diaza alkene)
+<p><dt> Azoxy Nitrogen.
+   <dd> [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
+<p><dt> Diazo Nitrogen
+   <dd> [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
+<p><dt> Azole.
+   <dd> [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
+   <dd> 5 member aromatic heterocycle w/ 2double bonds. contains N &amp; another non C (N,O,S)  subclasses are furo-, thio-, pyrro-  (replace
+CH o' furfuran, thiophene, pyrrol w/ N)</p></dl><br>
+<h3> hydrazine</h3><dl>
+<p><dt> Hydrazine H2NNH2
+   <dd> [NX3][NX3]</p></dl><br>
+<h3> hydrazone </h3><dl>
+<p><dt> Hydrazone C=NNH2
+   <dd> [NX3][NX2]=[*]</p></dl><br>
+<h3> imine </h3><dl>
+<p><dt> Substituted imine
+   <dd> [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
+   <dd> Schiff base
+<p><dt> Substituted or un-substituted imine
+   <dd> [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
+<p><dt> Iminium
+   <dd> [NX3+]=[CX3]</p></dl><br>
+<h3> imide </h3><dl>
+<p><dt> Unsubstituted dicarboximide
+   <dd> [CX3](=[OX1])[NX3H][CX3](=[OX1])
+<p><dt> Substituted dicarboximide
+   <dd> [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
+<p><dt> Dicarboxdiimide
+   <dd> [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])</p></dl><br>
+<h3> nitrate </h3><dl>
+<p><dt> Nitrate group
+   <dd> [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
+   <dd> Also hits nitrate anion
+<p><dt> Nitrate Anion
+   <dd> [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]</p></dl><br>
+<h3> nitrile </h3><dl>
+<p><dt> Nitrile
+   <dd> [NX1]#[CX2]
+<p><dt> Isonitrile
+   <dd> [CX1-]#[NX2+]</p></dl><br>
+<h3> nitro </h3><dl>
+<p><dt> Nitro group.
+   <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
+<p><dt> Two Nitro groups
+   <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]</p></dl><br>
+<h3> nitroso </h3><dl>
+<p><dt> Nitroso-group
+   <dd> [NX2]=[OX1]</p></dl><br>
+<h3> n-oxide </h3><dl>
+<p><dt> N-Oxide
+   <dd> [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
+   <dd> Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.</p></dl><br>
+<a NAME="O"></a><h2>O</h2>
+<h3> hydroxyl (includes alcohol, phenol) </h3><dl>
+<p><dt> Hydroxyl
+   <dd> [OX2H]
+<p><dt> Hydroxyl in Alcohol
+   <dd> [#6][OX2H]
+<p><dt> Hydroxyl in Carboxylic Acid
+   <dd> [OX2H][CX3]=[OX1]
+<p><dt> Hydroxyl in H-O-P-
+   <dd> [OX2H]P
+<p><dt> Enol
+   <dd> [OX2H][#6X3]=[#6]
+<p><dt> Phenol
+   <dd> [OX2H][cX3]:[c]
+<p><dt> Enol or Phenol
+   <dd> [OX2H][$(C=C),$(cc)]
+<p><dt>  Hydroxyl_acidic
+   <dd> [$([OH]-*=[!#6])]
+   <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous,
+halogen and nitrogen oxyacids.</p></dl><br>
+<h3> peroxide </h3><dl>
+<p><dt> Peroxide groups.
+   <dd> [OX2,OX1-][OX2,OX1-]
+   <dd> Also hits anions.</p></dl><br>
+<a NAME="P"></a><h2>P</h2>
+<h3> phosphoric compounds </h3><dl>
+<p><dt> Phosphoric_acid groups.
+   <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
+   <dd> Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides.  Doesn't hit monophosphoric acid anhydride
+esters (including acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid
+ and longer, di-  esters on linear triphosphoric acid and longer).
+<p><dt> Phosphoric_ester groups.
+   <dd> [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
+   <dd> Hits both depiction forms.  Doesn't hit non-ester phosphoric_acid groups.</p></dl><br>
+<a NAME="S"></a><h2>S</h2>
+<h3>thio groups ( thio-, thi-, sulpho-, mercapto- )</h3><dl>
+<p><dt> Carbo-Thiocarboxylate
+   <dd> [S-][CX3](=S)[#6]
+<p><dt> Carbo-Thioester
+   <dd> S([#6])[CX3](=O)[#6]
+<p><dt> Thio analog of carbonyl
+   <dd> [#6X3](=[SX1])([!N])[!N]
+   <dd> Where S replaces O.  Not a thioamide.
+<p><dt> Thiol, Sulfide or Disulfide Sulfur
+   <dd> [SX2]
+<p><dt> Thiol
+   <dd> [#16X2H]
+<p><dt> Sulfur with at-least one hydrogen.
+   <dd> [#16!H0]
+<p><dt> Thioamide
+   <dd> [NX3][CX3]=[SX1]</p></dl><br>
+<h3>sulfide</h3><dl>
+<p><dt> Sulfide
+   <dd> [#16X2H0]
+   <dd> -alkylthio  Won't hit thiols. Hits disulfides.
+<p><dt> Mono-sulfide
+   <dd> [#16X2H0][!#16]
+   <dd> alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
+<p><dt> Di-sulfide
+   <dd> [#16X2H0][#16X2H0]
+   <dd> Won't hit thiols. Won't hit mono-sulfides.
+<p><dt> Two Sulfides
+   <dd> [#16X2H0][!#16].[#16X2H0][!#16]
+   <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.</p></dl><br>
+<h3>sulfinate</h3><dl>
+<p><dt> Sulfinate
+   <dd> [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
+   <dd> Won't hit Sulfinic Acid.  Hits Both Depiction Forms.
+<p><dt> Sulfinic Acid
+   <dd> [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
+   <dd> Won't hit substituted Sulfinates.  Hits Both Depiction Forms.
+        Hits acid and conjugate base (sulfinate).</p></dl><br>
+<h3>sulfone</h3><dl>
+<p><dt> Sulfone.  Low specificity.
+   <dd> [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
+   <dd> Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- &amp; di- esters, sulfamic
+acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
+<p><dt> Sulfone. High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
+   <dd> Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms.
+<p><dt> Sulfonic acid.  High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
+   <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules).
+        Hits acid and conjugate base.  Hits Both Depiction Forms. Hits Arene sulfonic acids.
+<p><dt> Sulfonate
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
+   <dd> (sulfonic ester) Only hits carbon-substituted sulfur
+        (Oxygen may be herteroatom-substituted).  Hits Both Depiction Forms.
+<p><dt> Sulfonamide.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
+   <dd> Only hits carbo- sulfonamide. Hits Both Depiction Forms.
+<p><dt> Carbo-azosulfone
+   <dd> [SX4](C)(C)(=O)=N
+   <dd> Partial N-Analog of Sulfone
+<p><dt> Sulfonamide
+   <dd> [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
+   <dd> (sulf drugs)  Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfoxide</h3><dl>
+<p><dt> Sulfoxide Low specificity.
+   <dd> [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
+   <dd> ( sulfinyl, thionyl )   Analog of carbonyl where S replaces C.
+        Hits all sulfoxides, including heteroatom-substituted sulfoxides,
+        dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids...
+        Hits Both Depiction Forms. Won't hit sulfones.
+<p><dt> Sulfoxide High specificity
+   <dd> [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
+   <dd> (sulfinyl , thionyl)  Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides
+        (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms. Won't hit sulfones.</p></dl><br>
+<h3>sulfate</h3><dl>
+<p><dt> Sulfate
+   <dd> [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
+   <dd> (sulfuric acid monoester)  Only hits when oxygen is carbon-substituted.
+        Hits acid and conjugate base. Hits Both Depiction Forms.
+<p><dt> Sulfuric acid ester (sulfate ester)  Low specificity.
+   <dd> [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
+   <dd> Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates).
+        Hits acid and conjugate base. Hits Both Depiction Forms.
+<p><dt> Sulfuric Acid Diester.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
+   <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfamate</h3><dl>
+<p><dt> Sulfamate.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
+   <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
+<p><dt> Sulfamic Acid.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
+   <dd> Hits acid and conjugate base. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfene</h3><dl>
+<p><dt> Sulfenic acid.
+   <dd> [#16X2][OX2H,OX1H0-]
+   <dd> Hits acid and conjugate base.
+<p><dt> Sulfenate.
+   <dd> [#16X2][OX2H0]</p></dl><br>
+<a NAME="X"></a><h2>X</h2>
+<h3> halide (-halo -fluoro -chloro -bromo -iodo) </h3><dl>
+<p><dt> Any carbon attached to any halogen
+   <dd> [#6][F,Cl,Br,I]
+<p><dt> Halogen
+   <dd> [F,Cl,Br,I]
+<p><dt> Three_halides groups
+   <dd> [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
+   <dd> Hits SMILES that have three halides.</p></dl><br>
+<h3> acyl halide </h3><dl>
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> (acid halide, -oyl halide)</p></dl><br>
+<a NAME="STRUCTUAL"></a>
+<H2>
+  3. Gross Structual Features
+</H2><br><br>
+<table BORDER COLS=6 WIDTH="750" NOSAVE ><tr>
+   <td align=center><a href="#CHIRALITY">Chirality</a></td>
+   <td align=center><a href="#ORBITAL">Orbital Configuration</a></td>
+   <td align=center><a href="#CONNECT">Connectivity</a></td>
+   <td align=center><a href="#CHAIN"> Chains &amp; Branching</a></td>
+   <td align=center><a href="#ROTATE">Rotation</a></td>
+   <td align=center><a href="#CYCLE">Cyclic Features</a></td>
+</table><br><br>
+<a NAME="CHIRALITY"></a><h2>Chirality</h2>
+<dl>
+<p><dt> Specified chiral carbon.
+   <dd> [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
+   <dd> Matches carbons whose chirality is specified (clockwise or anticlockwise)  Will not match molecules whose chirality is unspecified b
+ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i
+mplicit H).
+<p><dt> "No-conflict" chiral match
+   <dd> C[C@?](F)(Cl)Br
+   <dd> Will match molecules with chiralities as specified or unspecified.
+<p><dt> "No-conflict" chiral match where an H is present
+   <dd> C[C@?H](Cl)Br
+   <dd> Will match molecules with chiralities as specified or unspecified.</p></dl><br>
+<a NAME="ORBITAL"></a><h2>Orbital Configuration</h2>
+<dl>
+<p><dt> sp2 cationic carbon
+   <dd> [$([cX2+](:*):*)]
+   <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
+<p><dt> Aromatic sp2 carbon.
+   <dd> [$([cX3](:*):*),$([cX2+](:*):*)]
+   <dd> The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with
+ a free electron in a non-bonding sp2 hybrid orbital)
+<p><dt> Any sp2 carbon.
+   <dd> [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
+   <dd> The first recursive SMARTS matches carbons that are three-connected and aromatic.  The second case matches two-connected aromatic ca
+rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital).  The third case matches three-connected non-aromatic carbons (
+alkenes). The fourth case matches non-aromatic cationic alkene carbons.
+<p><dt> Any sp2 nitrogen.
+   <dd> [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
+   <dd> Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free
+pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair
+of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist
+in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic
+3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are
+specified),  either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen),
+rspectively.
+<p><dt> Explicit Hydrogen on sp2-Nitrogen
+   <dd> [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
+   <dd> (H must be an isotope or ion)
+<p><dt> sp3 nitrogen
+   <dd> [$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)]
+   <dd> One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
+<p><dt> Explicit Hydrogen on an sp3 N.
+   <dd> [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)])]
+   <dd> One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
+<p><dt> sp2 N in N-Oxide
+   <dd> [$([$([NX3]=O),$([NX3+][O-])])]
+<p><dt> sp3 N in N-Oxide   Exclusive:
+   <dd> [$([$([NX4]=O),$([NX4+][O-])])]
+   <dd> Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
+<p><dt> sp3 N in N-Oxide Inclusive:
+   <dd> [$([$([NX4]=O),$([NX4+][O-,#0])])]
+   <dd> Hits if O could be present. Hits if * if used in place of O in smiles.</p></dl><br>
+<a NAME="CONNECT"></a><h2>Connectivity</h2>
+<dl>
+<p><dt> Quaternary Nitrogen
+   <dd> [$([NX4+]),$([NX4]=*)]
+   <dd> Hits non-aromatic Ns.
+<p><dt> Tricoordinate S double bonded to N.
+   <dd> [$([SX3]=N)]
+<p><dt> S double-bonded to Carbon
+   <dd> [$([SX1]=[#6])]
+   <dd> Hits terminal (1-connected S)
+<p><dt> Triply bonded N
+   <dd> [$([NX1]#*)]
+<p><dt> Divalent Oxygen
+   <dd> [$([OX2])]</p></dl><br>
+<a NAME="CHAIN"></a><h2>Chains &amp; Branching </h2>
+<dl>
+<p><dt> Unbranched_alkane groups.
+   <dd> [R0;D2][R0;D2][R0;D2][R0;D2]
+   <dd> Only hits alkanes (single-bond chains).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
+ (e.g. halide substituted chains count as branched).
+<p><dt> Unbranched_chain groups.
+   <dd> [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
+   <dd> Hits any bond (single, double, triple).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
+ (e.g. halide substituted chains count as branched).
+<p><dt> Long_chain groups.
+   <dd> [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
+   <dd> Aliphatic chains at-least 8 members long.
+<p><dt> Atom_fragment
+   <dd> [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
+   <dd> (CLOGP definition) A fragment atom is a not an isolating carbon
+<p><dt> Carbon_isolating
+   <dd> [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
+   <dd> This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati
+c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
+<p><dt> Terminal S bonded to P
+   <dd> [$([SX1]~P)]
+<p><dt> Nitrogen on -N-C=N-
+   <dd> [$([NX3]C=N)]
+<p><dt> Nitrogen on -N-N=C-
+   <dd> [$([NX3]N=C)]
+<p><dt> Nitrogen on -N-N=N-
+   <dd> [$([NX3]N=N)]
+<p><dt> Oxygen in -O-C=N-
+   <dd> [$([OX2]C=N)] </p></dl><br>
+<a NAME="ROTATE"></a><h2>Rotation</h2>
+<dl>
+<p><dt> Rotatable bond
+   <dd> [!$(*#*)&amp;!D1]-!@[!$(*#*)&amp;!D1]
+   <dd> An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note
+that logical operators can be applied to bonds ("-&amp;!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site
+and not ring". *#* any atom triple bonded to any atom.  By enclosing this SMARTS in parentheses and preceding with $, this enables us to
+use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo
+be considered rotatable without this specification.</p></dl><br>
+<a NAME="CYCLE"></a><h2>Cyclic Features</h2>
+<dl>
+<p><dt> Bicyclic
+   <dd> [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
+   <dd> Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
+<p><dt> Ortho
+   <dd> *-!:aa-!:*
+   <dd> Ortho-substituted ring
+<p><dt> Meta
+   <dd> *-!:aaa-!:*
+   <dd> Meta-substituted ring
+<p><dt> Para
+   <dd> *-!:aaaa-!:*
+   <dd> Para-substituted ring
+<p><dt> Acylic-bonds
+   <dd> *!@*
+<p><dt> Single bond and not in a ring
+   <dd> *-!@*
+<p><dt> Non-ring atom
+   <dd> [R0] or [!R]
+<p><dt> Macrocycle groups.
+   <dd> [r;!r3;!r4;!r5;!r6;!r7]
+<p><dt> S in aromatic 5-ring with lone pair
+   <dd> [sX2r5]
+<p><dt> Aromatic 5-Ring O with Lone Pair
+   <dd> [oX2r5]
+<p><dt> N in 5-sided aromatic ring
+   <dd> [nX2r5]
+<p><dt> Spiro-ring center
+   <dd> [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
+<p><dt> N in 5-ring arom
+   <dd> [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
+<p><dt> CIS or TRANS double bond in a ring
+   <dd> */,\[R]=;@[R]/,\*
+   <dd> An isomeric SMARTS consisting of four atoms and three bonds.
+<p><dt> CIS or TRANS double or aromatic bond in a ring
+   <dd> */,\[R]=,:;@[R]/,\*
+<p><dt> Unfused benzene ring
+   <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1
+   <dd> To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
+<p><dt> Multiple non-fused benzene rings
+   <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
+<p><dt> Fused benzene rings
+   <dd> c12ccccc1cccc2</p></dl><br>
+<a NAME="META"></a>
+<H2>
+   4. Meta-SMARTS
+</H2><br><br>
+<table BORDER COLS=3 WIDTH="750" NOSAVE ><tr>
+   <td align=center><a href="#AA">Amino Acids </a></td>
+   <td align=center><a href="#RECUR"> Recursive or Multiple </a></td>
+   <td align=center><a href="#TOOL">Tools &amp;Tricks </a></td>
+</table><br><br>
+<a NAME="AA"></a><h2>Amino Acids</h2>
+<dl>
+<p><dt> Generic amino acid: low specificity.
+   <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
+   <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases.  Hits single a.a.s and specific residues
+w/in polypeptides (internal, or terminal).
+<p><dt>  A.A. Template for 20 standard a.a.s
+   <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
+   <dd> Pro, Gly, Other.  Replace * w/  the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases.
+Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
+<p><dt> Proline
+   <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> Glycine
+   <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
+<p><dt> Other a.a.
+   <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
+   <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
+or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/i
+polypeptides (internal, or terminal).<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usage:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Alanine side chain is  [CH3X4] <br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> 18_standard_aa_side_chains.
+      <dd> ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),<br>
+$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
+$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
+$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
+<dd>Can be any of the standard 18 (Pro &amp; Gly are treated separately) Hits acids and conjugate bases.
+<p><dt> N in Any_standard_amino_acid.
+      <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]<br>
+(=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]<br>
+(=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$<br>
+([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
+$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
+$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
+$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
+<dd> Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together).  A gen
+eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e
+. "standard" refers to the side chains).  (Pro, Gly, or 18 normal a.a.s.)  Hits single a.a.s and specific residues w/in polypeptides (intern
+al, or terminal).
+<p><dt> Non-standard amino acid.
+   <dd> [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),<br>
+$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>
+$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),<br>
+$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]<br>
+(=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:<br>
+[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),<br>
+$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),<br>
+$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
+$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
+   <dd> Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains).  Won't hit amino acids that are
+ non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.);
+!$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).</p></dl><br>
+<a NAME="RECUR"></a><h2>Recursive or Multiple </h2>
+<h3> Recursive SMARTS: Atoms connected to particular SMARTS</h3><dl>
+<p><dt> Ortho
+   <dd>[SMARTS_expression]-!:aa-!:[SMARTS_expression]
+<p><dt> Meta
+   <dd> [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
+<p><dt> Para
+   <dd> [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
+<p><dt> Hydrogen
+   <dd> [$([#1][SMARTS_expression])]
+   <dd> Hydrogen must be explicit i.e. an isotope or charged
+<p><dt> Nitrogen
+   <dd> [$([#7][SMARTS_expression])]
+<p><dt> Oxygen
+   <dd> [$([#8][SMARTS_expression])]
+<p><dt> Fluorine
+   <dd> [$([#9][SMARTS_expression])]</p></dl><br>
+<h3> Recursive SMARTS: Multiple groups</h3><dl>
+<p><dt> Two possible groups
+   <dd> [$(SMARTS_expression_A),$(SMARTS_expression_B)]
+   <dd> Hits atoms in either environment or group of interest, A or B.<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$(
+[NX1]#[NX2+]-[NX1-2])])]
+<p><dt> Recursive SMARTS
+   <dd> [$([atom_that_gets_hit][other_atom][other_atom])]
+   <dd> Hits first atom within parenthesis
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;[$([CX3]=[OX1])] hits Carbonyl Carbon
+        &nbsp;&nbsp;&nbsp;&nbsp;[$([OX1]=[CX3])] hits Carbonyl Oxygen </p></dl><br>
+<h3>   Single only, Double only, Single or Double</h3><dl>
+<p><dt> Sulfide
+   <dd> [#16X2H0]
+   <dd> (-alkylthio)  Won't hit thiols. Hits disulfides too.
+<p><dt> Mono-sulfide
+   <dd> [#16X2H0][!#16]
+   <dd> (alkylthio- or alkoxy-) R-S-R  Won't hit thiols. Won't hit disulfides.
+<p><dt> Di-sulfide
+   <dd> [#16X2H0][#16X2H0]
+   <dd> Won't hit thiols. Won't hit mono-sulfides.
+<p><dt> Two sulfides
+   <dd> [#16X2H0][!#16].[#16X2H0][!#16]
+   <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
+<p><dt> Acid/conj-base
+   <dd> [OX2H,OX1H0-]
+   <dd> Hits acid and conjugate base. acid/base
+<p><dt> Non-acid Oxygen
+   <dd> [OX2H0]
+<p><dt> Acid/base
+   <dd> [H1,H0-]
+   <dd> Works for any atom if base form has no Hs &amp; acid has only one.</p></dl><br>
+<h3> Muntiple Disconnected Groups</h3><dl>
+<p><dt> Two disconnected SMARTS fragments
+   <dd> ([Cl!$(Cl~c)].[c!$(c~Cl)])
+   <dd> A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B
+oth SMARTS fragments must be in the same SMILES target fragment.
+<p><dt> Two disconnected SMARTS fragments
+   <dd> ([Cl]).([c])
+   <dd> Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
+<p><dt> Two not-necessarily connected SMARTS fragments
+   <dd> ([Cl].[c])
+   <dd> Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
+<p><dt> Two not-necessarily connected fragments
+   <dd> ([SMARTS_expression]).([SMARTS_expression])
+   <dd> Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
+<p><dt> Two primary or secondary amines
+   <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
+   <dd> Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.</p></dl><br>
+<a NAME="TOOL"></a><h2>Tools &amp;Tricks</h2>
+<h3> Alternative/Equivalent Representations </h3><dl>
+<p><dt> Any carbon aromatic or non-aromatic
+   <dd> [#6] or [c,C]
+<p><dt> SMILES wildcard
+   <dd> [#0]
+   <dd> This SMARTS hits the SMILES *
+<p><dt> Factoring
+   <dd> [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
+   <dd> Factor out common atomic expressions in the recursive SMARTS.  May improve human readability.
+<p><dt> High-precidence "and"
+   <dd> [N&amp;X4&amp;+,N&amp;X3&amp;+0] or [NX4+,NX3+0]
+   <dd> High-precidence "and" (&amp;) is the default logical operator. "Or" (,) is higher precidence than &amp; and low-precidence "and" (;)
+ is lower precidence than &amp;. </p></dl><br>
+<h3> Hydrogens </h3><dl>
+<p><dt> Any atom w/ at-least 1 H
+   <dd> [*!H0,#1]
+   <dd> In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi
+dered a property of an atom.  This SMARTS provides a way to effectively hit Hs themselves.
+<p><dt> Hs on Carbons
+   <dd> [#6!H0,#1]
+<p><dt> Atoms w/ 1 H
+   <dd> [H,#1] </p></dl><br>
+<a NAME="E-"></a>
+<H2>
+ 5. Electron &amp; Proton Features
+</H2><br><br>
+<table BORDER COLS=3 WIDTH="750" NOSAVE ><tr>
+   <td align=center><a href="#ACID">Acids &amp; Bases </a></td>
+   <td align=center><a href="#CHARGE">Charge</a></td>
+   <td align=center><a href="#H_BOND"> H-bond Donors &amp; Acceptors</a></td>
+   <td align=center><a href="#RAD"> Radicals </a></td>
+</table><br><br>
+<a NAME="ACID"></a><h2> Acids &amp; Bases </h2>
+<dl>
+<p><dt> Acid
+   <dd> [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
+   <dd> Proton donor
+<p><dt> Carboxylic acid
+   <dd> [CX3](=O)[OX2H1]
+   <dd> (-oic acid, COOH)
+<p><dt> Carboxylic acid or conjugate base.
+   <dd> [CX3](=O)[OX1H0-,OX2H1]
+<p><dt> Hydroxyl_acidic
+   <dd> [$([OH]-*=[!#6])]
+   <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho
+sphorous, halogen and nitrogen oxyacids
+<p><dt> Phosphoric_Acid
+   <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
+   <dd> Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides.  Doesn't hit monophosphoric acid anhydride esters (in
+cluding acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe
+r, di-  esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
+<p><dt> Sulfonic Acid.  High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
+   <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base.  Hits Both Depiction Fo
+rms. Hits Arene sulfonic acids.
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> (acid halide, -oyl halide)</p></dl><br>
+<a NAME="CHARGE"></a><h2>Charge </h2>
+<dl>
+<p><dt> Anionic divalent Nitrogen
+   <dd> [NX2-]
+<p><dt> Oxenium Oxygen
+   <dd> [OX2H+]=*
+<p><dt> Oxonium Oxygen
+   <dd> [OX3H2+]
+<p><dt> Carbocation
+   <dd> [#6+]
+<p><dt> sp2 cationic carbon.
+   <dd> [$([cX2+](:*):*)]
+   <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
+<p><dt> Azide ion.
+   <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
+   <dd> Hits N in azide ion
+<p><dt> Zwitterion High Specificity
+   <dd> [+1]~*~*~[-1]
+   <dd> +1 charged atom separated by any 3 bonds from a -1 charged atom.
+<p><dt> Zwitterion Low Specificity, Crude
+   <dd>[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
+   <dd> Variously charged moieties separated by up to ten bonds.
+<p><dt> Zwitterion Low Specificity
+   <dd> ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
+   <dd> Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.</p></dl>
+<br>
+<a NAME="H_BOND"></a><h2> H-bond Donors &amp; Acceptors</h2>
+<dl>
+<p><dt> Hydrogen-bond acceptor
+   <dd> [#6,#7;R0]=[#8]
+   <dd> Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge
+n.
+<p><dt> Hydrogen-bond acceptor
+   <dd> [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
+   <dd> A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are
+ halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl
+y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
+<p><dt> Hydrogen-bond donor.
+   <dd> [!$([#6,H0,-,-2,-3])]
+   <dd> A H-bond donor is a non-negatively charged heteroatom with at least one H
+<p><dt> Hydrogen-bond donor.
+   <dd> [!H0;#7,#8,#9]
+   <dd> Must have an N-H bond, an O-H bond, or a F-H bond
+<p><dt> Possible intramolecular H-bond
+   <dd> [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
+   <dd> Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid
+ nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.</p></dl><br>
+<a NAME="RAD"></a><h2>Radicals </h2>
+<dl>
+<p><dt> Carbon Free-Radical
+   <dd> [#6;X3v3+0]
+   <dd> Hits a neutral carbon with three single bonds.
+<p><dt> Nitrogen Free-Radical
+   <dd> [#7;X2v4+0]
+   <dd> Hits a neutral nitrogen with two single bonds or with a single and a triple bond.  </p></dl><br>
+<a NAME="BREAK"></a>
+<H2>
+   6. Breakdown of Complex SMARTS
+</H2></center><br><br>
+<table BORDER COLS=2 WIDTH="750" NOSAVE ><tr>
+   <td align=center><a href="#AM_AC"> Amino Acid </a></td>
+   <td align=center><a href="#ES_AM"> Ester or Amide </a></td>
+   <!--th><!--a href="#">  <!--/a></td>
+</table><br><br>
+<a NAME="AM_AC"><h2>Amino Acid </h2></a>
+<b>[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]</b>
+i<pre>
+[$(                         Proline
+[                               N:
+$([                               terminal
+NX3H                                  neutral
+,                                     or
+NX4H2+])                              + charged
+,                                 or
+$([NX3](C)(C)(C))]1               internal
+[CX4H]                          C: alpha
+([CH2][CH2][CH2]1)                 pro side chain
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+),                          OR
+$(                          Glycine
+[                               N:
+$([                               terminal
+NX3H2                                neutral
+,                                    or
+NX4H3+])                             + charged
+,                                 or
+$([NX3H](C)(C))                   internal
+[CX4H2]                         C: alpha (w/ H side chain)
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+),                          OR
+$(                          Other amino acid
+[                               N:
+$([                               terminal
+NX3H2                                neutral
+,                                    or
+NX4H3+])                             + charged
+,                                 or
+$([NX3H](C)(C))]                  internal
+[CX4H]                          C: alpha
+([*])                              any side chain
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+)]
+</pre>
+<br><br>
+<a NAME="ES_AM"><h2> Ester or Amide </h2></a>
+<b>[#6][CX3](=O)[$([OX2H0]([#6])[#6]),$([#7])] </b>
+<pre>
+[#6]                    An atom that is a carbon
+[CX3]                   Connected to an atom that is a three-connected carbon
+(=O)                         Which is double bonded to an oxygen
+[                       Connected to an atom
+$(                           That is in an environment where
+[OX2H0]                         An atom that is a two-connected oxygen, without hydrogens
+([#6])[#6])                       Is connected to two carbons, one of them being the carbonyl C
+,                            Or
+$(                           That is in an environment where
+[#7]                            An atom is a nitrogen.
+)]
+</pre>
+<br><br>
+<a NAME="EXMPL"></a>
+<H2>
+ 7. Interesting Example SMARTS
+</H2>
+<dl>
+<p><dt> Oxygen double bonded to aliphatic carbon or nitrogen, single bonded to an aromatic ring, with a
+halogen in meta position
+   <dd> [#8]=[C,N]-aaa[F,Cl,Br,I]
+<p><dt> Aliphatic carbon attached to oxygen with any bond
+   <dd> C~O
+<p><dt> Oxygen or nitrogen, with at least one hydrogen attached and not in a ring
+   <dd> [O,N;!H0;R0]
+<p><dt> Oxygen double bonded to aliphatic carbon or nitrogen
+   <dd> [#8]=[C,N] or O=[C,N]
+<p><dt> Aliphatic atom single-bonded to any carbon which isn't a trifluromethyl carbon
+   <dd> A[#6;!$(C(F)(F)F)]
+<p><dt> PCB
+   <dd> [$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]-[$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]
+   <dd> Polychlorinated Biphenyls. Overall SMARTS is atom-bond-atom.  Note that ":" is explicit aromatic bond, and "-" is explicit single bo
+nd. On each side of the single bond, we use three nested SMARTS to represent
+the ortho, meta, and para position.
+<p><dt> Imidazolium Nitrogen
+   <dd> [nX3r5+]:c:n
+<p><dt> 1-methyl-2-hydroxy benzene with either a Cl or H at the 5 position.
+   <dd> [c;$([*Cl]),$([*H1])]1ccc(O)c(C)c1 or Cc1:c(O):c:c:[$(cCl),$([cH])]:c1
+   <dd> The "H" primitive in SMARTS means "total number
+of attached hydrogens", i.e., [C] will match C in [CH4] methane, [CH3]
+methyl, [CH2] methylene, etc., [CH3] will only match methyl. This is similar
+to the use of "H" in SMILES to specify hydrogen count. The default value
+for the SMARTS "H" primitive is 1 (same as SMILES, e.g., [CH2]=[CH]-[OH]
+same as CC=O). This H-specification value includes all attached hydrogens:
+implicit and explicit (e.g., isotopic [2H]).
+<p><dt> Nonstandard atom groups.
+   <dd> [!#1;!#2;!#3;!#5;!#6;!#7;!#8;!#9;!#11;!#12;!#15;!#16;!#17;!#19;!#20;!#35;!#53]</p></dl><br>
+<h2>More Information</h2>
+    <A HREF="/dayhtml/doc/theory/theory.smarts.html">Theory Manual</A><br>
+    <A HREF="/dayhtml_tutorials/languages/smarts/smarts_practice.html">SMARTS Practice</A><br>
+    </td>
+    </tr>
+    <tr>
+    <td><iframe src="/iframes/footer.html" name="iframe3" width="350" height="200"
+       scrolling="no" frameborder="0"></iframe></td>
+    </tr>
+</table>
+</body>
+</html>

data/smarts_examples.txt ADDED Viewed

	@@ -0,0 +1,1272 @@

+<H2>
+2. Functional Groups by Element
+</H2>
+<a NAME="C"></a><h2></a>C</h2>
+<h3> alkane </h3><dl>
+<p><dt> Alkyl Carbon
+   <dd> [CX4]</p></dl><br>
+<h3> alkene (-ene) </h3><dl>
+<p><dt> Allenic Carbon
+   <dd> [$([CX2](=C)=C)]
+<p><dt> Vinylic Carbon
+   <dd> [$([CX3]=[CX3])]
+   <dd> Ethenyl carbon </p></dl><br>
+<h3> alkyne (-yne) </h3><dl>
+<p><dt> Acetylenic Carbon
+   <dd> [$([CX2]#C)]</p></dl><br>
+<h3> arene (Ar , aryl-, aromatic hydrocarbons) </h3><dl>
+<p><dt> Arene
+   <dd> c </p></dl><br>
+<a NAME="CO"></a><h2>C &amp; O</h2>
+<h3>carbonyl</h3><dl>
+<p><dt> Carbonyl group. Low specificity
+   <dd> [CX3]=[OX1]
+   <dd> Hits carboxylic acid, ester, ketone, aldehyde, carbonic
+        acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
+<p><dt> Carbonyl group
+   <dd> [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
+   <dd> Hits either resonance structure
+<p><dt> Carbonyl with Carbon
+   <dd> [CX3](=[OX1])C
+   <dd> Hits aldehyde, ketone, carboxylic acid (except formic), anhydride
+        (except formic), acyl halides (acid halides). Won't hit carbamic
+        acid/ester, carbonic acid/ester.
+<p><dt> Carbonyl with Nitrogen.
+   <dd> [OX1]=CN
+   <dd> Hits amide, carbamic acid/ester, poly peptide
+<p><dt> Carbonyl with Oxygen.
+   <dd> [CX3](=[OX1])O
+   <dd> Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid
+        or ester, anhydride  Won't hit aldehyde or ketone.
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> acid halide, -oyl halide
+<p><dt> Aldehyde
+   <dd> [CX3H1](=O)[#6]
+   <dd> -al
+<p><dt> Anhydride
+   <dd> [CX3](=[OX1])[OX2][CX3](=[OX1])
+<p><dt> Amide
+   <dd> [NX3][CX3](=[OX1])[#6]
+   <dd> -amide
+<p><dt> Amidinium
+   <dd> [NX3][CX3]=[NX3+]
+<p><dt> Carbamate.
+   <dd> [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
+   <dd> Hits carbamic esters, acids, and zwitterions
+<p><dt> Carbamic ester
+   <dd> [NX3][CX3](=[OX1])[OX2H0]
+<p><dt> Carbamic acid.
+   <dd> [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
+   <dd> Hits carbamic acids and zwitterions.
+<p><dt> Carboxylate Ion.
+   <dd> [CX3](=O)[O-]
+   <dd> Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
+<p><dt> Carbonic Acid or Carbonic Ester
+   <dd> [CX3](=[OX1])(O)O
+   <dd> Carbonic Acid, Carbonic Ester, or combination
+<p><dt> Carbonic Acid or Carbonic Acid-Ester
+   <dd> [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
+   <dd> Hits acid and conjugate base. Won't hit carbonic acid diester
+<p><dt> Carbonic Ester (carbonic acid diester)
+   <dd> C[OX2][CX3](=[OX1])[OX2]C
+   <dd> Won't hit carbonic acid or combination carbonic acid/ester
+<p><dt> Carboxylic acid
+   <dd> [CX3](=O)[OX2H1]
+   <dd> -oic acid, COOH
+<p><dt> Carboxylic acid or conjugate base.
+   <dd> [CX3](=O)[OX1H0-,OX2H1]
+<p><dt> Cyanamide
+   <dd> [NX3][CX2]#[NX1]
+<p><dt> Ester Also hits anhydrides
+   <dd> [#6][CX3](=O)[OX2H0][#6]
+   <dd> won't hit formic anhydride.
+<p><dt> Ketone
+   <dd> [#6][CX3](=O)[#6]
+   <dd> -one </p></dl><br>
+<h3> ether</h3><dl>
+<p><dt> Ether
+   <dd> [OD2]([#6])[#6]</p></dl><br>
+<a NAME="H"></a><h2></a>H</h2>
+<h3> hydrogen atoms</h3><dl>
+<p><dt> Hydrogen Atom
+   <dd> [H]
+   <dd> Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
+<p><dt> Not a Hydrogen Atom
+   <dd> [!#1]
+   <dd> Hits SMILES that are not hydrogen atoms.
+<p><dt> Proton
+   <dd> [H+]
+   <dd> Hits positively charged hydrogen atoms: [H+]</p></dl><br>
+<h3> hydrogen count</h3><dl>
+<p><dt> Mono-Hydrogenated Cation
+   <dd> [+H]
+   <dd> Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
+<p><dt> Not Mono-Hydrogenated
+   <dd> [!H] or [!H1]
+   <dd> Hits atoms that don't have exactly one attached hydrogen.</p></dl><br>
+<a NAME="N"></a><h2>N</h2>
+<h3> amide </b> see carbonyl</p><br>
+mine (-amino) </h3><dl>
+<p><dt> Primary or secondary amine, not amide.
+   <dd> [NX3;H2,H1;!$(NC=O)]
+   <dd> Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 &amp; H1 respectively).  Also note that "&amp;" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
+<p><dt> Enamine
+   <dd> [NX3][CX3]=[CX3]
+<p><dt> Primary amine, not amide.
+   <dd> [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
+<p><dt> Two primary or secondary amines
+   <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
+   <dd> Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
+<p><dt> Enamine or Aniline Nitrogen
+   <dd> [NX3][$(C=C),$(cc)]</p></dl><br>
+<h3> amino acids</h3><dl>
+<p><dt> Generic amino acid: low specificity.
+   <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
+   <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
+<p><dt> Dipeptide group. generic amino acid: low specificity.
+   <dd> [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
+   <dd> Won't hit pro or gly. Hits acids and conjugate bases.
+<p><dt> Amino Acid
+   <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
+   <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/i
+n polypeptides (internal, or terminal). {e.g. usage:  Alanine side chain is  [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([
+CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}</p></dl><br>
+<h3> amino acid side chains</h3><dl>
+<p><dt> Alanine side chain
+   <dd> [CH3X4]
+<p><dt> Arginine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
+   <dd> Hits acid and conjugate base.
+<p><dt> Aspargine side chain.
+   <dd> [CH2X4][CX3](=[OX1])[NX3H2]
+   <dd> Also hits Gln side chain when used alone.
+<p><dt> Aspartate (or Aspartic acid) side chain.
+   <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
+<p><dt> Cysteine side chain.
+   <dd> [CH2X4][SX2H,SX1H0-]
+   <dd> Hits acid and conjugate base
+<p><dt> Glutamate (or Glutamic acid) side chain.
+   <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base.
+<p><dt> Glycine
+   <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
+<p><dt> Histidine side chain.
+   <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
+   <dd> Hits acid &amp; conjugate base for either Nitrogen.  Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
+2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
+<p><dt> Isoleucine side chain
+   <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
+<p><dt> Leucine side chain
+   <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
+<p><dt> Lysine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
+   <dd> Acid and conjugate base
+<p><dt> Methionine side chain
+   <dd> [CH2X4][CH2X4][SX2][CH3X4]
+<p><dt> Phenylalanine side chain
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
+<p><dt> Proline
+   <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> Serine side chain
+   <dd> [CH2X4][OX2H]
+<p><dt> Thioamide
+   <dd> [NX3][CX3]=[SX1]
+<p><dt> Threonine side chain
+   <dd> [CHX4]([CH3X4])[OX2H]
+<p><dt> Tryptophan side chain
+   <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
+<p><dt> Tyrosine side chain.
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
+   <dd> Acid and conjugate base
+<p><dt> Valine side chain
+   <dd> [CHX4]([CH3X4])[CH3X4]
+<p><dt> Alanine side chain
+   <dd> [CH3X4]
+<p><dt> Arginine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
+   <dd> Hits acid and conjugate base.
+<p><dt> Aspargine side chain.
+   <dd> [CH2X4][CX3](=[OX1])[NX3H2]
+   <dd> Also hits Gln side chain when used alone.
+<p><dt> Aspartate (or Aspartic acid) side chain.
+   <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
+<p><dt> Cysteine side chain.
+   <dd> [CH2X4][SX2H,SX1H0-]
+   <dd> Hits acid and conjugate base
+<p><dt> Glutamate (or Glutamic acid) side chain.
+   <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
+   <dd> Hits acid and conjugate base.
+<p><dt> Glycine
+   <dd> N[CX4H2][CX3](=[OX1])[O,N]
+<p><dt> Histidine side chain.
+   <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
+   <dd> Hits acid &amp; conjugate base for either Nitrogen.  Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
+2-connected without any Hs)] where there is a second-neighbor who is [3-connected
+<p><dt> Isoleucine side chain
+   <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
+<p><dt> Leucine side chain
+   <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
+<p><dt> Lysine side chain.
+   <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
+   <dd> Acid and conjugate base
+<p><dt> Methionine side chain
+   <dd> [CH2X4][CH2X4][SX2][CH3X4]
+<p><dt> Phenylalanine side chain
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
+<p><dt> Proline
+   <dd> N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
+<p><dt> Serine side chain
+   <dd> [CH2X4][OX2H]
+<p><dt> Threonine side chain
+   <dd> [CHX4]([CH3X4])[OX2H]
+<p><dt> Tryptophan side chain
+   <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
+<p><dt> Tyrosine side chain.
+   <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
+   <dd> Acid and conjugate base
+<p><dt> Valine side chain
+   <dd> [CHX4]([CH3X4])[CH3X4]</p></dl><br>
+<h3> azide (-azido) </h3><dl>
+<p><dt> Azide group.
+   <dd> [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
+   <dd> Hits any atom with an attached azide.
+<p><dt> Azide ion.
+   <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
+   <dd> Hits N in azide ion</p></dl><br>
+<h3> azo </h3><dl>
+<p><dt> Nitrogen.
+   <dd> [#7]
+   <dd> Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
+<p><dt> Azo Nitrogen. Low specificity.
+   <dd> [NX2]=N
+   <dd> Hits diazene, azoxy and some diazo structures
+<p><dt> Azo Nitrogen.diazene
+   <dd> [NX2]=[NX2]
+   <dd> (diaza alkene)
+<p><dt> Azoxy Nitrogen.
+   <dd> [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
+<p><dt> Diazo Nitrogen
+   <dd> [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
+<p><dt> Azole.
+   <dd> [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
+   <dd> 5 member aromatic heterocycle w/ 2double bonds. contains N &amp; another non C (N,O,S)  subclasses are furo-, thio-, pyrro-  (replace
+CH o' furfuran, thiophene, pyrrol w/ N)</p></dl><br>
+<h3> hydrazine</h3><dl>
+<p><dt> Hydrazine H2NNH2
+   <dd> [NX3][NX3]</p></dl><br>
+<h3> hydrazone </h3><dl>
+<p><dt> Hydrazone C=NNH2
+   <dd> [NX3][NX2]=[*]</p></dl><br>
+<h3> imine </h3><dl>
+<p><dt> Substituted imine
+   <dd> [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
+   <dd> Schiff base
+<p><dt> Substituted or un-substituted imine
+   <dd> [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
+<p><dt> Iminium
+   <dd> [NX3+]=[CX3]</p></dl><br>
+<h3> imide </h3><dl>
+<p><dt> Unsubstituted dicarboximide
+   <dd> [CX3](=[OX1])[NX3H][CX3](=[OX1])
+<p><dt> Substituted dicarboximide
+   <dd> [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
+<p><dt> Dicarboxdiimide
+   <dd> [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])</p></dl><br>
+<h3> nitrate </h3><dl>
+<p><dt> Nitrate group
+   <dd> [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
+   <dd> Also hits nitrate anion
+<p><dt> Nitrate Anion
+   <dd> [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]</p></dl><br>
+<h3> nitrile </h3><dl>
+<p><dt> Nitrile
+   <dd> [NX1]#[CX2]
+<p><dt> Isonitrile
+   <dd> [CX1-]#[NX2+]</p></dl><br>
+<h3> nitro </h3><dl>
+<p><dt> Nitro group.
+   <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
+<p><dt> Two Nitro groups
+   <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]</p></dl><br>
+<h3> nitroso </h3><dl>
+<p><dt> Nitroso-group
+   <dd> [NX2]=[OX1]</p></dl><br>
+<h3> n-oxide </h3><dl>
+<p><dt> N-Oxide
+   <dd> [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
+   <dd> Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.</p></dl><br>
+<a NAME="O"></a><h2>O</h2>
+<h3> hydroxyl (includes alcohol, phenol) </h3><dl>
+<p><dt> Hydroxyl
+   <dd> [OX2H]
+<p><dt> Hydroxyl in Alcohol
+   <dd> [#6][OX2H]
+<p><dt> Hydroxyl in Carboxylic Acid
+   <dd> [OX2H][CX3]=[OX1]
+<p><dt> Hydroxyl in H-O-P-
+   <dd> [OX2H]P
+<p><dt> Enol
+   <dd> [OX2H][#6X3]=[#6]
+<p><dt> Phenol
+   <dd> [OX2H][cX3]:[c]
+<p><dt> Enol or Phenol
+   <dd> [OX2H][$(C=C),$(cc)]
+<p><dt>  Hydroxyl_acidic
+   <dd> [$([OH]-*=[!#6])]
+   <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous,
+halogen and nitrogen oxyacids.</p></dl><br>
+<h3> peroxide </h3><dl>
+<p><dt> Peroxide groups.
+   <dd> [OX2,OX1-][OX2,OX1-]
+   <dd> Also hits anions.</p></dl><br>
+<a NAME="P"></a><h2>P</h2>
+<h3> phosphoric compounds </h3><dl>
+<p><dt> Phosphoric_acid groups.
+   <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
+   <dd> Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides.  Doesn't hit monophosphoric acid anhydride
+esters (including acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid
+ and longer, di-  esters on linear triphosphoric acid and longer).
+<p><dt> Phosphoric_ester groups.
+   <dd> [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
+   <dd> Hits both depiction forms.  Doesn't hit non-ester phosphoric_acid groups.</p></dl><br>
+<a NAME="S"></a><h2>S</h2>
+<h3>thio groups ( thio-, thi-, sulpho-, mercapto- )</h3><dl>
+<p><dt> Carbo-Thiocarboxylate
+   <dd> [S-][CX3](=S)[#6]
+<p><dt> Carbo-Thioester
+   <dd> S([#6])[CX3](=O)[#6]
+<p><dt> Thio analog of carbonyl
+   <dd> [#6X3](=[SX1])([!N])[!N]
+   <dd> Where S replaces O.  Not a thioamide.
+<p><dt> Thiol, Sulfide or Disulfide Sulfur
+   <dd> [SX2]
+<p><dt> Thiol
+   <dd> [#16X2H]
+<p><dt> Sulfur with at-least one hydrogen.
+   <dd> [#16!H0]
+<p><dt> Thioamide
+   <dd> [NX3][CX3]=[SX1]</p></dl><br>
+<h3>sulfide</h3><dl>
+<p><dt> Sulfide
+   <dd> [#16X2H0]
+   <dd> -alkylthio  Won't hit thiols. Hits disulfides.
+<p><dt> Mono-sulfide
+   <dd> [#16X2H0][!#16]
+   <dd> alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
+<p><dt> Di-sulfide
+   <dd> [#16X2H0][#16X2H0]
+   <dd> Won't hit thiols. Won't hit mono-sulfides.
+<p><dt> Two Sulfides
+   <dd> [#16X2H0][!#16].[#16X2H0][!#16]
+   <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.</p></dl><br>
+<h3>sulfinate</h3><dl>
+<p><dt> Sulfinate
+   <dd> [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
+   <dd> Won't hit Sulfinic Acid.  Hits Both Depiction Forms.
+<p><dt> Sulfinic Acid
+   <dd> [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
+   <dd> Won't hit substituted Sulfinates.  Hits Both Depiction Forms.
+        Hits acid and conjugate base (sulfinate).</p></dl><br>
+<h3>sulfone</h3><dl>
+<p><dt> Sulfone.  Low specificity.
+   <dd> [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
+   <dd> Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- &amp; di- esters, sulfamic
+acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
+<p><dt> Sulfone. High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
+   <dd> Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms.
+<p><dt> Sulfonic acid.  High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
+   <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules).
+        Hits acid and conjugate base.  Hits Both Depiction Forms. Hits Arene sulfonic acids.
+<p><dt> Sulfonate
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
+   <dd> (sulfonic ester) Only hits carbon-substituted sulfur
+        (Oxygen may be herteroatom-substituted).  Hits Both Depiction Forms.
+<p><dt> Sulfonamide.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
+   <dd> Only hits carbo- sulfonamide. Hits Both Depiction Forms.
+<p><dt> Carbo-azosulfone
+   <dd> [SX4](C)(C)(=O)=N
+   <dd> Partial N-Analog of Sulfone
+<p><dt> Sulfonamide
+   <dd> [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
+   <dd> (sulf drugs)  Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfoxide</h3><dl>
+<p><dt> Sulfoxide Low specificity.
+   <dd> [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
+   <dd> ( sulfinyl, thionyl )   Analog of carbonyl where S replaces C.
+        Hits all sulfoxides, including heteroatom-substituted sulfoxides,
+        dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids...
+        Hits Both Depiction Forms. Won't hit sulfones.
+<p><dt> Sulfoxide High specificity
+   <dd> [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
+   <dd> (sulfinyl , thionyl)  Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides
+        (Won't hit herteroatom-substituted molecules).  Hits Both Depiction Forms. Won't hit sulfones.</p></dl><br>
+<h3>sulfate</h3><dl>
+<p><dt> Sulfate
+   <dd> [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
+   <dd> (sulfuric acid monoester)  Only hits when oxygen is carbon-substituted.
+        Hits acid and conjugate base. Hits Both Depiction Forms.
+<p><dt> Sulfuric acid ester (sulfate ester)  Low specificity.
+   <dd> [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
+   <dd> Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates).
+        Hits acid and conjugate base. Hits Both Depiction Forms.
+<p><dt> Sulfuric Acid Diester.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
+   <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfamate</h3><dl>
+<p><dt> Sulfamate.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
+   <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
+<p><dt> Sulfamic Acid.
+   <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
+   <dd> Hits acid and conjugate base. Hits Both Depiction Forms.</p></dl><br>
+<h3>sulfene</h3><dl>
+<p><dt> Sulfenic acid.
+   <dd> [#16X2][OX2H,OX1H0-]
+   <dd> Hits acid and conjugate base.
+<p><dt> Sulfenate.
+   <dd> [#16X2][OX2H0]</p></dl><br>
+<a NAME="X"></a><h2>X</h2>
+<h3> halide (-halo -fluoro -chloro -bromo -iodo) </h3><dl>
+<p><dt> Any carbon attached to any halogen
+   <dd> [#6][F,Cl,Br,I]
+<p><dt> Halogen
+   <dd> [F,Cl,Br,I]
+<p><dt> Three_halides groups
+   <dd> [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
+   <dd> Hits SMILES that have three halides.</p></dl><br>
+<h3> acyl halide </h3><dl>
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> (acid halide, -oyl halide)</p></dl><br>
+<a NAME="STRUCTUAL"></a>
+<H2>
+  3. Gross Structual Features
+</H2><br><br>
+<a NAME="CHIRALITY"></a><h2>Chirality</h2>
+<dl>
+<p><dt> Specified chiral carbon.
+   <dd> [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
+   <dd> Matches carbons whose chirality is specified (clockwise or anticlockwise)  Will not match molecules whose chirality is unspecified b
+ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i
+mplicit H).
+<p><dt> "No-conflict" chiral match
+   <dd> C[C@?](F)(Cl)Br
+   <dd> Will match molecules with chiralities as specified or unspecified.
+<p><dt> "No-conflict" chiral match where an H is present
+   <dd> C[C@?H](Cl)Br
+   <dd> Will match molecules with chiralities as specified or unspecified.</p></dl><br>
+<a NAME="ORBITAL"></a><h2>Orbital Configuration</h2>
+<dl>
+<p><dt> sp2 cationic carbon
+   <dd> [$([cX2+](:*):*)]
+   <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
+<p><dt> Aromatic sp2 carbon.
+   <dd> [$([cX3](:*):*),$([cX2+](:*):*)]
+   <dd> The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with
+ a free electron in a non-bonding sp2 hybrid orbital)
+<p><dt> Any sp2 carbon.
+   <dd> [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
+   <dd> The first recursive SMARTS matches carbons that are three-connected and aromatic.  The second case matches two-connected aromatic ca
+rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital).  The third case matches three-connected non-aromatic carbons (
+alkenes). The fourth case matches non-aromatic cationic alkene carbons.
+<p><dt> Any sp2 nitrogen.
+   <dd> [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
+   <dd> Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free
+pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair
+of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist
+in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic
+3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are
+specified),  either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen),
+rspectively.
+<p><dt> Explicit Hydrogen on sp2-Nitrogen
+   <dd> [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
+   <dd> (H must be an isotope or ion)
+<p><dt> sp3 nitrogen
+   <dd> [$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)]
+   <dd> One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
+<p><dt> Explicit Hydrogen on an sp3 N.
+   <dd> [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)])]
+   <dd> One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
+<p><dt> sp2 N in N-Oxide
+   <dd> [$([$([NX3]=O),$([NX3+][O-])])]
+<p><dt> sp3 N in N-Oxide   Exclusive:
+   <dd> [$([$([NX4]=O),$([NX4+][O-])])]
+   <dd> Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
+<p><dt> sp3 N in N-Oxide Inclusive:
+   <dd> [$([$([NX4]=O),$([NX4+][O-,#0])])]
+   <dd> Hits if O could be present. Hits if * if used in place of O in smiles.</p></dl><br>
+<a NAME="CONNECT"></a><h2>Connectivity</h2>
+<dl>
+<p><dt> Quaternary Nitrogen
+   <dd> [$([NX4+]),$([NX4]=*)]
+   <dd> Hits non-aromatic Ns.
+<p><dt> Tricoordinate S double bonded to N.
+   <dd> [$([SX3]=N)]
+<p><dt> S double-bonded to Carbon
+   <dd> [$([SX1]=[#6])]
+   <dd> Hits terminal (1-connected S)
+<p><dt> Triply bonded N
+   <dd> [$([NX1]#*)]
+<p><dt> Divalent Oxygen
+   <dd> [$([OX2])]</p></dl><br>
+<a NAME="CHAIN"></a><h2>Chains &amp; Branching </h2>
+<dl>
+<p><dt> Unbranched_alkane groups.
+   <dd> [R0;D2][R0;D2][R0;D2][R0;D2]
+   <dd> Only hits alkanes (single-bond chains).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
+ (e.g. halide substituted chains count as branched).
+<p><dt> Unbranched_chain groups.
+   <dd> [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
+   <dd> Hits any bond (single, double, triple).  Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
+ (e.g. halide substituted chains count as branched).
+<p><dt> Long_chain groups.
+   <dd> [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
+   <dd> Aliphatic chains at-least 8 members long.
+<p><dt> Atom_fragment
+   <dd> [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
+   <dd> (CLOGP definition) A fragment atom is a not an isolating carbon
+<p><dt> Carbon_isolating
+   <dd> [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
+   <dd> This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati
+c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
+<p><dt> Terminal S bonded to P
+   <dd> [$([SX1]~P)]
+<p><dt> Nitrogen on -N-C=N-
+   <dd> [$([NX3]C=N)]
+<p><dt> Nitrogen on -N-N=C-
+   <dd> [$([NX3]N=C)]
+<p><dt> Nitrogen on -N-N=N-
+   <dd> [$([NX3]N=N)]
+<p><dt> Oxygen in -O-C=N-
+   <dd> [$([OX2]C=N)] </p></dl><br>
+<a NAME="ROTATE"></a><h2>Rotation</h2>
+<dl>
+<p><dt> Rotatable bond
+   <dd> [!$(*#*)&amp;!D1]-!@[!$(*#*)&amp;!D1]
+   <dd> An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note
+that logical operators can be applied to bonds ("-&amp;!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site
+and not ring". *#* any atom triple bonded to any atom.  By enclosing this SMARTS in parentheses and preceding with $, this enables us to
+use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo
+be considered rotatable without this specification.</p></dl><br>
+<a NAME="CYCLE"></a><h2>Cyclic Features</h2>
+<dl>
+<p><dt> Bicyclic
+   <dd> [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
+   <dd> Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
+<p><dt> Ortho
+   <dd> *-!:aa-!:*
+   <dd> Ortho-substituted ring
+<p><dt> Meta
+   <dd> *-!:aaa-!:*
+   <dd> Meta-substituted ring
+<p><dt> Para
+   <dd> *-!:aaaa-!:*
+   <dd> Para-substituted ring
+<p><dt> Acylic-bonds
+   <dd> *!@*
+<p><dt> Single bond and not in a ring
+   <dd> *-!@*
+<p><dt> Non-ring atom
+   <dd> [R0] or [!R]
+<p><dt> Macrocycle groups.
+   <dd> [r;!r3;!r4;!r5;!r6;!r7]
+<p><dt> S in aromatic 5-ring with lone pair
+   <dd> [sX2r5]
+<p><dt> Aromatic 5-Ring O with Lone Pair
+   <dd> [oX2r5]
+<p><dt> N in 5-sided aromatic ring
+   <dd> [nX2r5]
+<p><dt> Spiro-ring center
+   <dd> [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
+<p><dt> N in 5-ring arom
+   <dd> [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
+<p><dt> CIS or TRANS double bond in a ring
+   <dd> */,\[R]=;@[R]/,\*
+   <dd> An isomeric SMARTS consisting of four atoms and three bonds.
+<p><dt> CIS or TRANS double or aromatic bond in a ring
+   <dd> */,\[R]=,:;@[R]/,\*
+<p><dt> Unfused benzene ring
+   <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1
+   <dd> To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
+<p><dt> Multiple non-fused benzene rings
+   <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
+<p><dt> Fused benzene rings
+   <dd> c12ccccc1cccc2</p></dl><br>
+<a NAME="META"></a>
+<H2>
+   4. Meta-SMARTS
+</H2><br><br>
+<a NAME="AA"></a><h2>Amino Acids</h2>
+<dl>
+<p><dt> Generic amino acid: low specificity.
+   <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
+   <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases.  Hits single a.a.s and specific residues
+w/in polypeptides (internal, or terminal).
+<p><dt>  A.A. Template for 20 standard a.a.s
+   <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
+   <dd> Pro, Gly, Other.  Replace * w/  the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases.
+Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
+<p><dt> Proline
+   <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> Glycine
+   <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
+<p><dt> Other a.a.
+   <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
+   <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
+or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases.  Hits single a.a.s and specific residues w/i
+polypeptides (internal, or terminal).<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usage:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Alanine side chain is  [CH3X4] <br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
+<p><dt> 18_standard_aa_side_chains.
+      <dd> ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),<br>
+$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
+$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
+$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
+<dd>Can be any of the standard 18 (Pro &amp; Gly are treated separately) Hits acids and conjugate bases.
+<p><dt> N in Any_standard_amino_acid.
+      <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]<br>
+(=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]<br>
+(=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$<br>
+([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
+$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
+$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
+$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
+<dd> Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together).  A gen
+eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e
+. "standard" refers to the side chains).  (Pro, Gly, or 18 normal a.a.s.)  Hits single a.a.s and specific residues w/in polypeptides (intern
+al, or terminal).
+<p><dt> Non-standard amino acid.
+   <dd> [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),<br>
+$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>
+$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),<br>
+$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]<br>
+(=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
+$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:<br>
+[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
+[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),<br>
+$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
+$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),<br>
+$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
+$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
+$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
+   <dd> Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains).  Won't hit amino acids that are
+ non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.);
+!$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).</p></dl><br>
+<a NAME="RECUR"></a><h2>Recursive or Multiple </h2>
+<h3> Recursive SMARTS: Atoms connected to particular SMARTS</h3><dl>
+<p><dt> Ortho
+   <dd>[SMARTS_expression]-!:aa-!:[SMARTS_expression]
+<p><dt> Meta
+   <dd> [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
+<p><dt> Para
+   <dd> [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
+<p><dt> Hydrogen
+   <dd> [$([#1][SMARTS_expression])]
+   <dd> Hydrogen must be explicit i.e. an isotope or charged
+<p><dt> Nitrogen
+   <dd> [$([#7][SMARTS_expression])]
+<p><dt> Oxygen
+   <dd> [$([#8][SMARTS_expression])]
+<p><dt> Fluorine
+   <dd> [$([#9][SMARTS_expression])]</p></dl><br>
+<h3> Recursive SMARTS: Multiple groups</h3><dl>
+<p><dt> Two possible groups
+   <dd> [$(SMARTS_expression_A),$(SMARTS_expression_B)]
+   <dd> Hits atoms in either environment or group of interest, A or B.<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$(
+[NX1]#[NX2+]-[NX1-2])])]
+<p><dt> Recursive SMARTS
+   <dd> [$([atom_that_gets_hit][other_atom][other_atom])]
+   <dd> Hits first atom within parenthesis
+        &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
+        &nbsp;&nbsp;&nbsp;&nbsp;[$([CX3]=[OX1])] hits Carbonyl Carbon
+        &nbsp;&nbsp;&nbsp;&nbsp;[$([OX1]=[CX3])] hits Carbonyl Oxygen </p></dl><br>
+<h3>   Single only, Double only, Single or Double</h3><dl>
+<p><dt> Sulfide
+   <dd> [#16X2H0]
+   <dd> (-alkylthio)  Won't hit thiols. Hits disulfides too.
+<p><dt> Mono-sulfide
+   <dd> [#16X2H0][!#16]
+   <dd> (alkylthio- or alkoxy-) R-S-R  Won't hit thiols. Won't hit disulfides.
+<p><dt> Di-sulfide
+   <dd> [#16X2H0][#16X2H0]
+   <dd> Won't hit thiols. Won't hit mono-sulfides.
+<p><dt> Two sulfides
+   <dd> [#16X2H0][!#16].[#16X2H0][!#16]
+   <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
+<p><dt> Acid/conj-base
+   <dd> [OX2H,OX1H0-]
+   <dd> Hits acid and conjugate base. acid/base
+<p><dt> Non-acid Oxygen
+   <dd> [OX2H0]
+<p><dt> Acid/base
+   <dd> [H1,H0-]
+   <dd> Works for any atom if base form has no Hs &amp; acid has only one.</p></dl><br>
+<h3> Muntiple Disconnected Groups</h3><dl>
+<p><dt> Two disconnected SMARTS fragments
+   <dd> ([Cl!$(Cl~c)].[c!$(c~Cl)])
+   <dd> A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B
+oth SMARTS fragments must be in the same SMILES target fragment.
+<p><dt> Two disconnected SMARTS fragments
+   <dd> ([Cl]).([c])
+   <dd> Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
+<p><dt> Two not-necessarily connected SMARTS fragments
+   <dd> ([Cl].[c])
+   <dd> Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
+<p><dt> Two not-necessarily connected fragments
+   <dd> ([SMARTS_expression]).([SMARTS_expression])
+   <dd> Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
+<p><dt> Two primary or secondary amines
+   <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
+   <dd> Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.</p></dl><br>
+<a NAME="TOOL"></a><h2>Tools &amp;Tricks</h2>
+<h3> Alternative/Equivalent Representations </h3><dl>
+<p><dt> Any carbon aromatic or non-aromatic
+   <dd> [#6] or [c,C]
+<p><dt> SMILES wildcard
+   <dd> [#0]
+   <dd> This SMARTS hits the SMILES *
+<p><dt> Factoring
+   <dd> [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
+   <dd> Factor out common atomic expressions in the recursive SMARTS.  May improve human readability.
+<p><dt> High-precidence "and"
+   <dd> [N&amp;X4&amp;+,N&amp;X3&amp;+0] or [NX4+,NX3+0]
+   <dd> High-precidence "and" (&amp;) is the default logical operator. "Or" (,) is higher precidence than &amp; and low-precidence "and" (;)
+ is lower precidence than &amp;. </p></dl><br>
+<h3> Hydrogens </h3><dl>
+<p><dt> Any atom w/ at-least 1 H
+   <dd> [*!H0,#1]
+   <dd> In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi
+dered a property of an atom.  This SMARTS provides a way to effectively hit Hs themselves.
+<p><dt> Hs on Carbons
+   <dd> [#6!H0,#1]
+<p><dt> Atoms w/ 1 H
+   <dd> [H,#1] </p></dl><br>
+<a NAME="E-"></a>
+<H2>
+ 5. Electron &amp; Proton Features
+</H2><br><br>
+<a NAME="ACID"></a><h2> Acids &amp; Bases </h2>
+<dl>
+<p><dt> Acid
+   <dd> [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
+   <dd> Proton donor
+<p><dt> Carboxylic acid
+   <dd> [CX3](=O)[OX2H1]
+   <dd> (-oic acid, COOH)
+<p><dt> Carboxylic acid or conjugate base.
+   <dd> [CX3](=O)[OX1H0-,OX2H1]
+<p><dt> Hydroxyl_acidic
+   <dd> [$([OH]-*=[!#6])]
+   <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho
+sphorous, halogen and nitrogen oxyacids
+<p><dt> Phosphoric_Acid
+   <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
+   <dd> Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides.  Doesn't hit monophosphoric acid anhydride esters (in
+cluding acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe
+r, di-  esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
+<p><dt> Sulfonic Acid.  High specificity.
+   <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
+   <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base.  Hits Both Depiction Fo
+rms. Hits Arene sulfonic acids.
+<p><dt> Acyl Halide
+   <dd> [CX3](=[OX1])[F,Cl,Br,I]
+   <dd> (acid halide, -oyl halide)</p></dl><br>
+<a NAME="CHARGE"></a><h2>Charge </h2>
+<dl>
+<p><dt> Anionic divalent Nitrogen
+   <dd> [NX2-]
+<p><dt> Oxenium Oxygen
+   <dd> [OX2H+]=*
+<p><dt> Oxonium Oxygen
+   <dd> [OX3H2+]
+<p><dt> Carbocation
+   <dd> [#6+]
+<p><dt> sp2 cationic carbon.
+   <dd> [$([cX2+](:*):*)]
+   <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
+<p><dt> Azide ion.
+   <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
+   <dd> Hits N in azide ion
+<p><dt> Zwitterion High Specificity
+   <dd> [+1]~*~*~[-1]
+   <dd> +1 charged atom separated by any 3 bonds from a -1 charged atom.
+<p><dt> Zwitterion Low Specificity, Crude
+   <dd>[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
+   <dd> Variously charged moieties separated by up to ten bonds.
+<p><dt> Zwitterion Low Specificity
+   <dd> ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
+   <dd> Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.</p></dl>
+<br>
+<a NAME="H_BOND"></a><h2> H-bond Donors &amp; Acceptors</h2>
+<dl>
+<p><dt> Hydrogen-bond acceptor
+   <dd> [#6,#7;R0]=[#8]
+   <dd> Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge
+n.
+<p><dt> Hydrogen-bond acceptor
+   <dd> [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
+   <dd> A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are
+ halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl
+y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
+<p><dt> Hydrogen-bond donor.
+   <dd> [!$([#6,H0,-,-2,-3])]
+   <dd> A H-bond donor is a non-negatively charged heteroatom with at least one H
+<p><dt> Hydrogen-bond donor.
+   <dd> [!H0;#7,#8,#9]
+   <dd> Must have an N-H bond, an O-H bond, or a F-H bond
+<p><dt> Possible intramolecular H-bond
+   <dd> [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
+   <dd> Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid
+ nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.</p></dl><br>
+<a NAME="RAD"></a><h2>Radicals </h2>
+<dl>
+<p><dt> Carbon Free-Radical
+   <dd> [#6;X3v3+0]
+   <dd> Hits a neutral carbon with three single bonds.
+<p><dt> Nitrogen Free-Radical
+   <dd> [#7;X2v4+0]
+   <dd> Hits a neutral nitrogen with two single bonds or with a single and a triple bond.  </p></dl><br>
+<a NAME="BREAK"></a>
+<H2>
+   6. Breakdown of Complex SMARTS
+</H2></center><br><br>
+<a NAME="AM_AC"><h2>Amino Acid </h2></a>
+<b>[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]</b>
+i<pre>
+[$(                         Proline
+[                               N:
+$([                               terminal
+NX3H                                  neutral
+,                                     or
+NX4H2+])                              + charged
+,                                 or
+$([NX3](C)(C)(C))]1               internal
+[CX4H]                          C: alpha
+([CH2][CH2][CH2]1)                 pro side chain
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+),                          OR
+$(                          Glycine
+[                               N:
+$([                               terminal
+NX3H2                                neutral
+,                                    or
+NX4H3+])                             + charged
+,                                 or
+$([NX3H](C)(C))                   internal
+[CX4H2]                         C: alpha (w/ H side chain)
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+),                          OR
+$(                          Other amino acid
+[                               N:
+$([                               terminal
+NX3H2                                neutral
+,                                    or
+NX4H3+])                             + charged
+,                                 or
+$([NX3H](C)(C))]                  internal
+[CX4H]                          C: alpha
+([*])                              any side chain
+[CX3]                           C: of COOH
+(=[OX1])                        O: =O of COOH
+[OX2H,OX1-,N]                   O: term COOH (neutral or -) or intern
+)]
+</pre>
+<br><br>
+<a NAME="ES_AM"><h2> Ester or Amide </h2></a>
+<b>[#6][CX3](=O)[$([OX2H0]([#6])[#6]),$([#7])] </b>
+<pre>
+[#6]                    An atom that is a carbon
+[CX3]                   Connected to an atom that is a three-connected carbon
+(=O)                         Which is double bonded to an oxygen
+[                       Connected to an atom
+$(                           That is in an environment where
+[OX2H0]                         An atom that is a two-connected oxygen, without hydrogens
+([#6])[#6])                       Is connected to two carbons, one of them being the carbonyl C
+,                            Or
+$(                           That is in an environment where
+[#7]                            An atom is a nitrogen.
+)]
+</pre>
+<br><br>
+<a NAME="EXMPL"></a>
+<H2>
+ 7. Interesting Example SMARTS
+</H2>
+<dl>
+<p><dt> Oxygen double bonded to aliphatic carbon or nitrogen, single bonded to an aromatic ring, with a
+halogen in meta position
+   <dd> [#8]=[C,N]-aaa[F,Cl,Br,I]
+<p><dt> Aliphatic carbon attached to oxygen with any bond
+   <dd> C~O
+<p><dt> Oxygen or nitrogen, with at least one hydrogen attached and not in a ring
+   <dd> [O,N;!H0;R0]
+<p><dt> Oxygen double bonded to aliphatic carbon or nitrogen
+   <dd> [#8]=[C,N] or O=[C,N]
+<p><dt> Aliphatic atom single-bonded to any carbon which isn't a trifluromethyl carbon
+   <dd> A[#6;!$(C(F)(F)F)]
+<p><dt> PCB
+   <dd> [$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]-[$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]
+   <dd> Polychlorinated Biphenyls. Overall SMARTS is atom-bond-atom.  Note that ":" is explicit aromatic bond, and "-" is explicit single bo
+nd. On each side of the single bond, we use three nested SMARTS to represent
+the ortho, meta, and para position.
+<p><dt> Imidazolium Nitrogen
+   <dd> [nX3r5+]:c:n
+<p><dt> 1-methyl-2-hydroxy benzene with either a Cl or H at the 5 position.
+   <dd> [c;$([*Cl]),$([*H1])]1ccc(O)c(C)c1 or Cc1:c(O):c:c:[$(cCl),$([cH])]:c1
+   <dd> The "H" primitive in SMARTS means "total number
+of attached hydrogens", i.e., [C] will match C in [CH4] methane, [CH3]
+methyl, [CH2] methylene, etc., [CH3] will only match methyl. This is similar
+to the use of "H" in SMILES to specify hydrogen count. The default value
+for the SMARTS "H" primitive is 1 (same as SMILES, e.g., [CH2]=[CH]-[OH]
+same as CC=O). This H-specification value includes all attached hydrogens:
+implicit and explicit (e.g., isotopic [2H]).
+<p><dt> Nonstandard atom groups.
+   <dd> [!#1;!#2;!#3;!#5;!#6;!#7;!#8;!#9;!#11;!#12;!#15;!#16;!#17;!#19;!#20;!#35;!#53]</p></dl><br>
+<h2>More Information</h2>
+    <A HREF="/dayhtml/doc/theory/theory.smarts.html">Theory Manual</A><br>
+    <A HREF="/dayhtml_tutorials/languages/smarts/smarts_practice.html">SMARTS Practice</A><br>
+    </td>
+    </tr>
+    <tr>
+    <td><iframe src="/iframes/footer.html" name="iframe3" width="350" height="200"
+       scrolling="no" frameborder="0"></iframe></td>
+    </tr>
+</table>
+</body>
+</html>

daylight-smarts.csv DELETED Viewed

@@ -1,254 +0,0 @@
-Section ID,Section,Group,Rule Name,Smarts,Comment
-2,Functional Groups by Element,C,alkane,[CX4],Alkyl Carbon
-2,Functional Groups by Element,C,alkene (-ene),[$([CX2](=C)=C)],Allenic Carbon
-2,Functional Groups by Element,C,alkene (-ene),[$([CX3]=[CX3])],Vinylic Carbon; Ethenyl carbon
-2,Functional Groups by Element,C,alkyne (-yne),[$([CX2]#C)],Acetylenic Carbon
-2,Functional Groups by Element,C,arene (Ar , aryl-, aromatic hydrocarbons),c,Arene
-2,Functional Groups by Element,C & O,carbonyl,[CX3]=[OX1],Carbonyl group. Low specificity; Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
-2,Functional Groups by Element,C & O,Carbonyl group,[$([CX3]=[OX1]),$([CX3+]-[OX1-])],Hits either resonance structure
-2,Functional Groups by Element,C & O,Carbonyl with Carbon,[CX3](=[OX1])C,Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester.
-2,Functional Groups by Element,C & O,Carbonyl with Nitrogen.,[OX1]=CN,Hits amide, carbamic acid/ester, poly peptide
-2,Functional Groups by Element,C & O,Carbonyl with Oxygen.,[CX3](=[OX1])O,Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde or ketone.
-2,Functional Groups by Element,C & O,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],acid halide, -oyl halide
-2,Functional Groups by Element,C & O,Aldehyde,[CX3H1](=O)[#6],-al
-2,Functional Groups by Element,C & O,Anhydride,[CX3](=[OX1])[OX2][CX3](=[OX1]),
-2,Functional Groups by Element,C & O,Amide,[NX3][CX3](=[OX1])[#6],-amide
-2,Functional Groups by Element,C & O,Amidinium,[NX3][CX3]=[NX3+],
-2,Functional Groups by Element,C & O,Carbamate.,[NX3,NX4+][CX3](=[OX1])[OX2,OX1-],Hits carbamic esters, acids, and zwitterions
-2,Functional Groups by Element,C & O,Carbamic ester,[NX3][CX3](=[OX1])[OX2H0],
-2,Functional Groups by Element,C & O,Carbamic acid.,[NX3,NX4+][CX3](=[OX1])[OX2H,OX1-],Hits carbamic acids and zwitterions.
-2,Functional Groups by Element,C & O,Carboxylate Ion.,[CX3](=O)[O-],Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
-2,Functional Groups by Element,C & O,Carbonic Acid or Carbonic Ester,[CX3](=[OX1])(O)O,Carbonic Acid, Carbonic Ester, or combination
-2,Functional Groups by Element,C & O,Carbonic Acid or Carbonic Acid-Ester,[CX3](=[OX1])([OX2])[OX2H,OX1H0-1],Hits acid and conjugate base. Won't hit carbonic acid diester
-2,Functional Groups by Element,C & O,Carbonic Ester (carbonic acid diester),C[OX2][CX3](=[OX1])[OX2]C,Won't hit carbonic acid or combination carbonic acid/ester
-2,Functional Groups by Element,C & O,Carboxylic acid,[CX3](=O)[OX2H1],-oic acid, COOH
-2,Functional Groups by Element,C & O,Carboxylic acid or conjugate base.,[CX3](=O)[OX1H0-,OX2H1],
-2,Functional Groups by Element,C & O,Cyanamide,[NX3][CX2]#[NX1],
-2,Functional Groups by Element,C & O,Ester Also hits anhydrides,[#6][CX3](=O)[OX2H0][#6],won't hit formic anhydride.
-2,Functional Groups by Element,C & O,Ketone,[#6][CX3](=O)[#6],-one
-2,Functional Groups by Element,C & O,Ether,[OD2]([#6])[#6],Ether
-2,Functional Groups by Element,H,Hydrogen Atom,[H],Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
-2,Functional Groups by Element,H,Not a Hydrogen Atom,[!#1],Hits SMILES that are not hydrogen atoms.
-2,Functional Groups by Element,H,Proton,[H+],Hits positively charged hydrogen atoms: [H+]
-2,Functional Groups by Element,H,Mono-Hydrogenated Cation,[+H],Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
-2,Functional Groups by Element,H,Not Mono-Hydrogenated,[!H] or [!H1],Hits atoms that don't have exactly one attached hydrogen.
-2,Functional Groups by Element,N,amide see carbonyl,,
-2,Functional Groups by Element,N,mine (-amino),[NX3;H2,H1;!$(NC=O)],Primary or secondary amine, not amide; Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that "&" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
-2,Functional Groups by Element,N,Enamine,[NX3][CX3]=[CX3],
-2,Functional Groups by Element,N,Primary amine, not amide.,[NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6],Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
-2,Functional Groups by Element,N,Two primary or secondary amines,[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)],Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
-2,Functional Groups by Element,N,Enamine or Aniline Nitrogen,[NX3][$(C=C),$(cc)],
-2,Functional Groups by Element,N,Generic amino acid: low specificity.,[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N],For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-2,Functional Groups by Element,N,Dipeptide group. generic amino acid: low specificity.,[NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-],Won't hit pro or gly. Hits acids and conjugate bases.
-2,Functional Groups by Element,N,Amino Acid,[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N],Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}
-2,Functional Groups by Element,N,Alanine side chain,[CH3X4],
-2,Functional Groups by Element,N,Arginine side chain.,[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3],Hits acid and conjugate base.
-2,Functional Groups by Element,N,Aspargine side chain.,[CH2X4][CX3](=[OX1])[NX3H2],Also hits Gln side chain when used alone.
-2,Functional Groups by Element,N,Aspartate (or Aspartic acid) side chain.,[CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base. Also hits Glu side chain when used alone.
-2,Functional Groups by Element,N,Cysteine side chain.,[CH2X4][SX2H,SX1H0-],Hits acid and conjugate base
-2,Functional Groups by Element,N,Glutamate (or Glutamic acid) side chain.,[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base.
-2,Functional Groups by Element,N,Glycine,[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])],
-2,Functional Groups by Element,N,Histidine side chain.,[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1,Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
-2,Functional Groups by Element,N,Isoleucine side chain,[CHX4]([CH3X4])[CH2X4][CH3X4],
-2,Functional Groups by Element,N,Leucine side chain,[CH2X4][CHX4]([CH3X4])[CH3X4],
-2,Functional Groups by Element,N,Lysine side chain.,[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0],Acid and conjugate base
-2,Functional Groups by Element,N,Methionine side chain,[CH2X4][CH2X4][SX2][CH3X4],
-2,Functional Groups by Element,N,Phenylalanine side chain,[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1,
-2,Functional Groups by Element,N,Proline,[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N],
-2,Functional Groups by Element,N,Serine side chain,[CH2X4][OX2H],
-2,Functional Groups by Element,N,Thioamide,[NX3][CX3]=[SX1],
-2,Functional Groups by Element,N,Threonine side chain,[CHX4]([CH3X4])[OX2H],
-2,Functional Groups by Element,N,Tryptophan side chain,[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12,
-2,Functional Groups by Element,N,Tyrosine side chain.,[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1,Acid and conjugate base
-2,Functional Groups by Element,N,Valine side chain,[CHX4]([CH3X4])[CH3X4],
-2,Functional Groups by Element,N,Alanine side chain,[CH3X4],
-2,Functional Groups by Element,N,Arginine side chain.,[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3],Hits acid and conjugate base.
-2,Functional Groups by Element,N,Aspargine side chain.,[CH2X4][CX3](=[OX1])[NX3H2],Also hits Gln side chain when used alone.
-2,Functional Groups by Element,N,Aspartate (or Aspartic acid) side chain.,[CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base. Also hits Glu side chain when used alone.
-2,Functional Groups by Element,N,Cysteine side chain.,[CH2X4][SX2H,SX1H0-],Hits acid and conjugate base
-2,Functional Groups by Element,N,Glutamate (or Glutamic acid) side chain.,[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base.
-2,Functional Groups by Element,N,Glycine,N[CX4H2][CX3](=[OX1])[O,N],
-2,Functional Groups by Element,N,Histidine side chain.,[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1,Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
-2,Functional Groups by Element,N,Isoleucine side chain,[CHX4]([CH3X4])[CH2X4][CH3X4],
-2,Functional Groups by Element,N,Leucine side chain,[CH2X4][CHX4]([CH3X4])[CH3X4],
-2,Functional Groups by Element,N,Lysine side chain.,[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0],Acid and conjugate base
-2,Functional Groups by Element,N,Methionine side chain,[CH2X4][CH2X4][SX2][CH3X4],
-2,Functional Groups by Element,N,Phenylalanine side chain,[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1,
-2,Functional Groups by Element,N,Proline,N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N],
-2,Functional Groups by Element,N,Serine side chain,[CH2X4][OX2H],
-2,Functional Groups by Element,N,Threonine side chain,[CHX4]([CH3X4])[OX2H],
-2,Functional Groups by Element,N,Tryptophan side chain,[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12,
-2,Functional Groups by Element,N,Tyrosine side chain.,[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1,Acid and conjugate base
-2,Functional Groups by Element,N,Valine side chain,[CHX4]([CH3X4])[CH3X4],
-2,Functional Groups by Element,N,azide (-azido),Azide group.,[$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])],Hits any atom with an attached azide.
-2,Functional Groups by Element,N,azide (-azido),Azide ion.,[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])],Hits N in azide ion
-2,Functional Groups by Element,N,azo,Nitrogen.,[#7],Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
-2,Functional Groups by Element,N,azo,Azo Nitrogen. Low specificity.,[NX2]=N,Hits diazene, azoxy and some diazo structures
-2,Functional Groups by Element,N,azo,Azo Nitrogen.diazene,[NX2]=[NX2],(diaza alkene)
-2,Functional Groups by Element,N,azo,Azoxy Nitrogen.,[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])],
-2,Functional Groups by Element,N,azo,Diazo Nitrogen,[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])],
-2,Functional Groups by Element,N,azo,Azole.,[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])],5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace CH o' furfuran, thiophene, pyrrol w/ N)
-2,Functional Groups by Element,N,hydrazine,Hydrazine H2NNH2,[NX3][NX3],
-2,Functional Groups by Element,N,hydrazone,Hydrazone C=NNH2,[NX3][NX2]=[*],
-2,Functional Groups by Element,N,imine,Substituted imine,[CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6],Schiff base
-2,Functional Groups by Element,N,imine,Substituted or un-substituted imine,[$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])],
-2,Functional Groups by Element,N,imine,Iminium,[NX3+]=[CX3],
-2,Functional Groups by Element,N,imide,Unsubstituted dicarboximide,[CX3](=[OX1])[NX3H][CX3](=[OX1]),
-2,Functional Groups by Element,N,imide,Substituted dicarboximide,[CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1]),
-2,Functional Groups by Element,N,imide,Dicarboxdiimide,[CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1]),
-2,Functional Groups by Element,N,nitrate,Nitrate group,[$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)],Also hits nitrate anion
-2,Functional Groups by Element,N,nitrate,Nitrate Anion,[$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])],
-2,Functional Groups by Element,N,nitrile,Nitrile,[NX1]#[CX2],
-2,Functional Groups by Element,N,nitrile,Isonitrile,[CX1-]#[NX2+],
-2,Functional Groups by Element,N,nitro,Nitro group.,[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
-2,Functional Groups by Element,N,nitro,Two Nitro groups,[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8],
-2,Functional Groups by Element,N,nitroso,Nitroso-group,[NX2]=[OX1],
-2,Functional Groups by Element,N,n-oxide,N-Oxide,[$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])],Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl,[OX2H],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in Alcohol,[#6][OX2H],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in Carboxylic Acid,[OX2H][CX3]=[OX1],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in H-O-P-,[OX2H]P,
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Enol,[OX2H][#6X3]=[#6],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Phenol,[OX2H][cX3]:[c],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Enol or Phenol,[OX2H][$(C=C),$(cc)],
-2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl_acidic,[$([OH]-*=[!#6])],An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids.
-2,Functional Groups by Element,O,peroxide,Peroxide groups.,[OX2,OX1-][OX2,OX1-],Also hits anions.
-2,Functional Groups by Element,P,phosphoric compounds,Phosphoric_acid groups.,[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])],Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on linear triphosphoric acid and longer).
-2,Functional Groups by Element,P,phosphoric compounds,Phosphoric_ester groups.,[$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])],Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Carbo-Thiocarboxylate,[S-][CX3](=S)[#6],
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Carbo-Thioester,S([#6])[CX3](=O)[#6],
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thio analog of carbonyl,[#6X3](=[SX1])([!N])[!N],Where S replaces O. Not a thioamide.
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thiol, Sulfide or Disulfide Sulfur,[SX2],
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thiol,[#16X2H],
-2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Sulfur with at-least one hydrogen.,[#16!H0],
-2,Functional Groups by Element,S,sulfide,Sulfide,[#16X2H0],-alkylthio Won't hit thiols. Hits disulfides.
-2,Functional Groups by Element,S,sulfide,Mono-sulfide,[#16X2H0][!#16],alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
-2,Functional Groups by Element,S,sulfide,Di-sulfide,[#16X2H0][#16X2H0],Won't hit thiols. Won't hit mono-sulfides.
-2,Functional Groups by Element,S,sulfide,Two Sulfides,[#16X2H0][!#16].[#16X2H0][!#16],Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
-2,Functional Groups by Element,S,sulfinate,Sulfinate,[$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])],Won't hit Sulfinic Acid. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfinate,Sulfinic Acid,[$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])],Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate).
-2,Functional Groups by Element,S,sulfone,Sulfone. Low specificity.,[$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])],Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfone,Sulfone. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])],Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfone,Sulfonic acid. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])],Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
-2,Functional Groups by Element,S,sulfone,Sulfonate,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])],(sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfone,Sulfonamide.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])],Only hits carbo- sulfonamide. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfone,Carbo-azosulfone,[SX4](C)(C)(=O)=N,Partial N-Analog of Sulfone
-2,Functional Groups by Element,S,sulfone,Sulfonamide,[$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])],(sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfoxide,Sulfoxide Low specificity.,[$([#16X3]=[OX1]),$([#16X3+][OX1-])],( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones.
-2,Functional Groups by Element,S,sulfoxide,Sulfoxide High specificity,[$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])],(sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.
-2,Functional Groups by Element,S,sulfate,Sulfate,[$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])],(sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfate,Sulfuric acid ester (sulfate ester) Low specificity.,[$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)],Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfate,Sulfuric Acid Diester.,[$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])],Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfamate,Sulfamate.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])],Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfamate,Sulfamic Acid.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])],Hits acid and conjugate base. Hits Both Depiction Forms.
-2,Functional Groups by Element,S,sulfene,Sulfenic acid.,[#16X2][OX2H,OX1H0-],Hits acid and conjugate base.
-2,Functional Groups by Element,S,sulfene,Sulfenate.,[#16X2][OX2H0],
-2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Any carbon attached to any halogen,[#6][F,Cl,Br,I],
-2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Halogen,[F,Cl,Br,I],
-2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Three_halides groups,[F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I],Hits SMILES that have three halides.
-2,Functional Groups by Element,X,acyl halide,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],(acid halide, -oyl halide)
-3,Gross Structual Features,Chirality,Specified chiral carbon.,[$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)],Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i mplicit H).
-3,Gross Structual Features,Chirality,"No-conflict" chiral match,C[C@?](F)(Cl)Br,Will match molecules with chiralities as specified or unspecified.
-3,Gross Structual Features,Chirality,"No-conflict" chiral match where an H is present,C[C@?H](Cl)Br,Will match molecules with chiralities as specified or unspecified.
-3,Gross Structual Features,Orbital Configuration,sp2 cationic carbon,[$([cX2+](:*):*)],Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
-3,Gross Structual Features,Orbital Configuration,Aromatic sp2 carbon.,[$([cX3](:*):*),$([cX2+](:*):*)],The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)
-3,Gross Structual Features,Orbital Configuration,Any sp2 carbon.,[$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)],The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons ( alkenes). The fourth case matches non-aromatic cationic alkene carbons.
-3,Gross Structual Features,Orbital Configuration,Any sp2 nitrogen.,[$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)],Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), rspectively.
-3,Gross Structual Features,Orbital Configuration,Explicit Hydrogen on sp2-Nitrogen,[$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])],(H must be an isotope or ion)
-3,Gross Structual Features,Orbital Configuration,sp3 nitrogen,[$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)],One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
-3,Gross Structual Features,Orbital Configuration,Explicit Hydrogen on an sp3 N.,[$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])],One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
-3,Gross Structual Features,Orbital Configuration,sp2 N in N-Oxide,[$([$([NX3]=O),$([NX3+][O-])])],
-3,Gross Structual Features,Orbital Configuration,sp3 N in N-Oxide Exclusive:,[$([$([NX4]=O),$([NX4+][O-])])],Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
-3,Gross Structual Features,Orbital Configuration,sp3 N in N-Oxide Inclusive:,[$([$([NX4]=O),$([NX4+][O-,#0])])],Hits if O could be present. Hits if * if used in place of O in smiles.
-3,Gross Structual Features,Connectivity,Quaternary Nitrogen,[$([NX4+]),$([NX4]=*)],Hits non-aromatic Ns.
-3,Gross Structual Features,Connectivity,Tricoordinate S double bonded to N.,[$([SX3]=N)],
-3,Gross Structual Features,Connectivity,S double-bonded to Carbon,[$([SX1]=[#6])],Hits terminal (1-connected S)
-3,Gross Structual Features,Connectivity,Triply bonded N,[$([NX1]#*)],
-3,Gross Structual Features,Connectivity,Divalent Oxygen,[$([OX2])],
-3,Gross Structual Features,Chains & Branching,Unbranched_alkane groups.,[R0;D2][R0;D2][R0;D2][R0;D2],Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
-3,Gross Structual Features,Chains & Branching,Unbranched_chain groups.,[R0;D2]~[R0;D2]~[R0;D2]~[R0;D2],Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
-3,Gross Structual Features,Chains & Branching,Long_chain groups.,[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0],Aliphatic chains at-least 8 members long.
-3,Gross Structual Features,Chains & Branching,Atom_fragment,[!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])],(CLOGP definition) A fragment atom is a not an isolating carbon
-3,Gross Structual Features,Chains & Branching,Carbon_isolating,[$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])],This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
-3,Gross Structual Features,Chains & Branching,Terminal S bonded to P,[$([SX1]~P)],
-3,Gross Structual Features,Chains & Branching,Nitrogen on -N-C=N-,[$([NX3]C=N)],
-3,Gross Structual Features,Chains & Branching,Nitrogen on -N-N=C-,[$([NX3]N=C)],
-3,Gross Structual Features,Chains & Branching,Nitrogen on -N-N=N-,[$([NX3]N=N)],
-3,Gross Structual Features,Chains & Branching,Oxygen in -O-C=N-,[$([OX2]C=N)],
-3,Gross Structual Features,Rotation,Rotatable bond,[!$(*#*)&!D1]-!@[!$(*#*)&!D1],An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note that logical operators can be applied to bonds ("-&!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo be considered rotatable without this specification.
-3,Gross Structual Features,Cyclic Features,Bicyclic,[$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))],Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
-3,Gross Structual Features,Cyclic Features,Ortho,*-!:aa-!:*,Ortho-substituted ring
-3,Gross Structual Features,Cyclic Features,Meta,*-!:aaa-!:*,Meta-substituted ring
-3,Gross Structual Features,Cyclic Features,Para,*-!:aaaa-!:*,Para-substituted ring
-3,Gross Structual Features,Cyclic Features,Acylic-bonds,*!@*,
-3,Gross Structual Features,Cyclic Features,Single bond and not in a ring,*-!@*,
-3,Gross Structual Features,Cyclic Features,Non-ring atom,[R0] or [!R],
-3,Gross Structual Features,Cyclic Features,Macrocycle groups.,[r;!r3;!r4;!r5;!r6;!r7],
-3,Gross Structual Features,Cyclic Features,S in aromatic 5-ring with lone pair,[sX2r5],
-3,Gross Structual Features,Cyclic Features,Aromatic 5-Ring O with Lone Pair,[oX2r5],
-3,Gross Structual Features,Cyclic Features,N in 5-sided aromatic ring,[nX2r5],
-3,Gross Structual Features,Cyclic Features,Spiro-ring center,[X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
-3,Gross Structual Features,Cyclic Features,N in 5-ring arom,[$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])],anion
-3,Gross Structual Features,Cyclic Features,CIS or TRANS double bond in a ring,*/,\[R]=;@[R]/,\*,An isomeric SMARTS consisting of four atoms and three bonds.
-3,Gross Structual Features,Cyclic Features,CIS or TRANS double or aromatic bond in a ring,*/,\[R]=,:;@[R]/,\*
-3,Gross Structual Features,Cyclic Features,Unfused benzene ring,[cR1]1[cR1][cR1][cR1][cR1][cR1]1,To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
-3,Gross Structual Features,Cyclic Features,Multiple non-fused benzene rings,[cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1,
-3,Gross Structual Features,Cyclic Features,Fused benzene rings,c12ccccc1cccc2,
-4,Meta-SMARTS,Amino Acids,Generic amino acid: low specificity.,[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N],For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-4,Meta-SMARTS,Amino Acids,A.A. Template for 20 standard a.a.s,[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])],Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-4,Meta-SMARTS,Amino Acids,Proline,[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N],
-4,Meta-SMARTS,Amino Acids,Glycine,[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])],
-4,Meta-SMARTS,Amino Acids,Other a.a.,[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N],Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i polypeptides (internal, or terminal). Example usage: Alanine side chain is [CH3X4] Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
-4,Meta-SMARTS,Amino Acids,18_standard_aa_side_chains.,([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])]),Can be any of the standard 18 (Pro & Gly are treated separately) Hits acids and conjugate bases.
-4,Meta-SMARTS,Amino Acids,N in Any_standard_amino_acid.,[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])],Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern al, or terminal).
-4,Meta-SMARTS,Amino Acids,Non-standard amino acid.,[$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])],Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.); !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Ortho,[SMARTS_expression]-!:aa-!:[SMARTS_expression]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Meta,[SMARTS_expression]-!:aaa-!:[SMARTS_expression]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Para,[SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Hydrogen,[$([#1][SMARTS_expression])],Hydrogen must be explicit i.e. an isotope or charged
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Nitrogen,[$([#7][SMARTS_expression])]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Oxygen,[$([#8][SMARTS_expression])]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Fluorine,[$([#9][SMARTS_expression])]
-4,Meta-SMARTS,Recursive or Multiple,Two possible groups,[$(SMARTS_expression_A),$(SMARTS_expression_B)],Hits atoms in either environment or group of interest, A or B. Example usages: Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])] Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])] Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])])]
-4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS,[$([atom_that_gets_hit][other_atom][other_atom])],Hits first atom within parenthesis Example usages: [$([CX3]=[OX1])] hits Carbonyl Carbon [$([OX1]=[CX3])] hits Carbonyl Oxygen
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Sulfide,[#16X2H0],(-alkylthio) Won't hit thiols. Hits disulfides too.
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Mono-sulfide,[#16X2H0][!#16],(alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Di-sulfide,[#16X2H0][#16X2H0],Won't hit thiols. Won't hit mono-sulfides.
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Two sulfides,[#16X2H0][!#16].[#16X2H0][!#16],Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Acid/conj-base,[OX2H,OX1H0-],Hits acid and conjugate base. acid/base
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Non-acid Oxygen,[OX2H0],
-4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Acid/base,[H1,H0-],Works for any atom if base form has no Hs & acid has only one.
-4,Meta-SMARTS,Muntiple Disconnected Groups,Two disconnected SMARTS fragments,([Cl!$(Cl~c)].[c!$(c~Cl)]),A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B oth SMARTS fragments must be in the same SMILES target fragment.
-4,Meta-SMARTS,Muntiple Disconnected Groups,Two disconnected SMARTS fragments,([Cl]).([c]),Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
-4,Meta-SMARTS,Muntiple Disconnected Groups,Two not-necessarily connected SMARTS fragments,([Cl].[c]),Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
-4,Meta-SMARTS,Muntiple Disconnected Groups,Two not-necessarily connected fragments,([SMARTS_expression]).([SMARTS_expression]),Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
-4,Meta-SMARTS,Muntiple Disconnected Groups,Two primary or secondary amines,[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)],Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.
-4,Meta-SMARTS,Tools & Tricks,Alternative/Equivalent Representations,Any carbon aromatic or non-aromatic,[#6] or [c,C],
-4,Meta-SMARTS,Tools & Tricks,SMILES wildcard,[#0],This SMARTS hits the SMILES *
-4,Meta-SMARTS,Tools & Tricks,Factoring,[OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-],Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
-4,Meta-SMARTS,Tools & Tricks,High-precidence "and",[N&X4&+,N&X3&+0] or [NX4+,NX3+0],High-precidence "and" (&) is the default logical operator. "Or" (,) is higher precidence than & and low-precidence "and" (;) is lower precidence than &.
-4,Meta-SMARTS,Tools & Tricks,Hs on Carbons,[#6!H0,#1],
-4,Meta-SMARTS,Tools & Tricks,Atoms w/ 1 H,[H,#1],
-5,Electron & Proton Features,Acids & Bases,Acid,[!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+],Proton donor
-5,Electron & Proton Features,Acids & Bases,Carboxylic acid,[CX3](=O)[OX2H1],(-oic acid, COOH)
-5,Electron & Proton Features,Acids & Bases,Carboxylic acid or conjugate base.,[CX3](=O)[OX1H0-,OX2H1],
-5,Electron & Proton Features,Acids & Bases,Hydroxyl_acidic,[$([OH]-*=[!#6])],An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids
-5,Electron & Proton Features,Acids & Bases,Phosphoric_Acid,[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])],Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in cluding acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
-5,Electron & Proton Features,Acids & Bases,Sulfonic Acid. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])],Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo rms. Hits Arene sulfonic acids.
-5,Electron & Proton Features,Acids & Bases,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],(acid halide, -oyl halide)
-5,Electron & Proton Features,Charge,Anionic divalent Nitrogen,[NX2-],
-5,Electron & Proton Features,Charge,Oxenium Oxygen,[OX2H+]=*,
-5,Electron & Proton Features,Charge,Oxonium Oxygen,[OX3H2+],
-5,Electron & Proton Features,Charge,Carbocation,[#6+],
-5,Electron & Proton Features,Charge,sp2 cationic carbon.,[$([cX2+](:*):*)],Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
-5,Electron & Proton Features,Charge,Azide ion.,[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])],Hits N in azide ion
-5,Electron & Proton Features,Charge,Zwitterion High Specificity,[+1]~*~*~[-1],+1 charged atom separated by any 3 bonds from a -1 charged atom.
-5,Electron & Proton Features,Charge,Zwitterion Low Specificity, Crude,[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])],Variously charged moieties separated by up to ten bonds.
-5,Electron & Proton Features,Charge,Zwitterion Low Specificity,([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4]),Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.
-5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond acceptor,[#6,#7;R0]=[#8],Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge n.
-5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond acceptor,[!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3)],A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
-5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond donor.,[!$([#6,H0,-,-2,-3])],A H-bond donor is a non-negatively charged heteroatom with at least one H
-5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond donor.,[!H0;#7,#8,#9],Must have an N-H bond, an O-H bond, or a F-H bond
-5,Electron & Proton Features,H-bond Donors & Acceptors,Possible intramolecular H-bond,[O,N;!H0]-*~*-*=[$([C,N;R0]=O)],Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.
-5,Electron & Proton Features,Radicals,Carbon Free-Radical,[#6;X3v3+0],Hits a neutral carbon with three single bonds.
-5,Electron & Proton Features,Radicals,Nitrogen Free-Radical,[#7;X2v4+0],Hits a neutral nitrogen with two single bonds or with a single and a triple bond.

rawgroups.txt DELETED Viewed

@@ -1,1145 +0,0 @@
-2. Functional Groups by Element
-C
-alkane
-Alkyl Carbon
-    [CX4]
-alkene (-ene)
-Allenic Carbon
-    [$([CX2](=C)=C)]
-Vinylic Carbon
-    [$([CX3]=[CX3])]
-    Ethenyl carbon
-alkyne (-yne)
-Acetylenic Carbon
-    [$([CX2]#C)]
-arene (Ar , aryl-, aromatic hydrocarbons)
-Arene
-    c
-C & O
-carbonyl
-Carbonyl group. Low specificity
-    [CX3]=[OX1]
-    Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
-Carbonyl group
-    [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
-    Hits either resonance structure
-Carbonyl with Carbon
-    [CX3](=[OX1])C
-    Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester.
-Carbonyl with Nitrogen.
-    [OX1]=CN
-    Hits amide, carbamic acid/ester, poly peptide
-Carbonyl with Oxygen.
-    [CX3](=[OX1])O
-    Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde or ketone.
-Acyl Halide
-    [CX3](=[OX1])[F,Cl,Br,I]
-    acid halide, -oyl halide
-Aldehyde
-    [CX3H1](=O)[#6]
-    -al
-Anhydride
-    [CX3](=[OX1])[OX2][CX3](=[OX1])
-Amide
-    [NX3][CX3](=[OX1])[#6]
-    -amide
-Amidinium
-    [NX3][CX3]=[NX3+]
-Carbamate.
-    [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
-    Hits carbamic esters, acids, and zwitterions
-Carbamic ester
-    [NX3][CX3](=[OX1])[OX2H0]
-Carbamic acid.
-    [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
-    Hits carbamic acids and zwitterions.
-Carboxylate Ion.
-    [CX3](=O)[O-]
-    Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
-Carbonic Acid or Carbonic Ester
-    [CX3](=[OX1])(O)O
-    Carbonic Acid, Carbonic Ester, or combination
-Carbonic Acid or Carbonic Acid-Ester
-    [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
-    Hits acid and conjugate base. Won't hit carbonic acid diester
-Carbonic Ester (carbonic acid diester)
-    C[OX2][CX3](=[OX1])[OX2]C
-    Won't hit carbonic acid or combination carbonic acid/ester
-Carboxylic acid
-    [CX3](=O)[OX2H1]
-    -oic acid, COOH
-Carboxylic acid or conjugate base.
-    [CX3](=O)[OX1H0-,OX2H1]
-Cyanamide
-    [NX3][CX2]#[NX1]
-Ester Also hits anhydrides
-    [#6][CX3](=O)[OX2H0][#6]
-    won't hit formic anhydride.
-Ketone
-    [#6][CX3](=O)[#6]
-    -one
-ether
-Ether
-    [OD2]([#6])[#6]
-H
-hydrogen atoms
-Hydrogen Atom
-    [H]
-    Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
-Not a Hydrogen Atom
-    [!#1]
-    Hits SMILES that are not hydrogen atoms.
-Proton
-    [H+]
-    Hits positively charged hydrogen atoms: [H+]
-hydrogen count
-Mono-Hydrogenated Cation
-    [+H]
-    Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
-Not Mono-Hydrogenated
-    [!H] or [!H1]
-    Hits atoms that don't have exactly one attached hydrogen.
-N
-amide see carbonyl
-mine (-amino)
-Primary or secondary amine, not amide.
-    [NX3;H2,H1;!$(NC=O)]
-    Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that "&" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
-Enamine
-    [NX3][CX3]=[CX3]
-Primary amine, not amide.
-    [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
-Two primary or secondary amines
-    [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
-    Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
-Enamine or Aniline Nitrogen
-    [NX3][$(C=C),$(cc)]
-amino acids
-Generic amino acid: low specificity.
-    [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
-    For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-Dipeptide group. generic amino acid: low specificity.
-    [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
-    Won't hit pro or gly. Hits acids and conjugate bases.
-Amino Acid
-    [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
-    Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}
-amino acid side chains
-Alanine side chain
-    [CH3X4]
-Arginine side chain.
-    [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
-    Hits acid and conjugate base.
-Aspargine side chain.
-    [CH2X4][CX3](=[OX1])[NX3H2]
-    Also hits Gln side chain when used alone.
-Aspartate (or Aspartic acid) side chain.
-    [CH2X4][CX3](=[OX1])[OH0-,OH]
-    Hits acid and conjugate base. Also hits Glu side chain when used alone.
-Cysteine side chain.
-    [CH2X4][SX2H,SX1H0-]
-    Hits acid and conjugate base
-Glutamate (or Glutamic acid) side chain.
-    [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
-    Hits acid and conjugate base.
-Glycine
-    [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
-Histidine side chain.
-    [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
-    [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
-    Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
-Isoleucine side chain
-    [CHX4]([CH3X4])[CH2X4][CH3X4]
-Leucine side chain
-    [CH2X4][CHX4]([CH3X4])[CH3X4]
-Lysine side chain.
-    [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
-    Acid and conjugate base
-Methionine side chain
-    [CH2X4][CH2X4][SX2][CH3X4]
-Phenylalanine side chain
-    [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
-Proline
-    [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
-Serine side chain
-    [CH2X4][OX2H]
-Thioamide
-    [NX3][CX3]=[SX1]
-Threonine side chain
-    [CHX4]([CH3X4])[OX2H]
-Tryptophan side chain
-    [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
-Tyrosine side chain.
-    [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
-    Acid and conjugate base
-Valine side chain
-    [CHX4]([CH3X4])[CH3X4]
-Alanine side chain
-    [CH3X4]
-Arginine side chain.
-    [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
-    Hits acid and conjugate base.
-Aspargine side chain.
-    [CH2X4][CX3](=[OX1])[NX3H2]
-    Also hits Gln side chain when used alone.
-Aspartate (or Aspartic acid) side chain.
-    [CH2X4][CX3](=[OX1])[OH0-,OH]
-    Hits acid and conjugate base. Also hits Glu side chain when used alone.
-Cysteine side chain.
-    [CH2X4][SX2H,SX1H0-]
-    Hits acid and conjugate base
-Glutamate (or Glutamic acid) side chain.
-    [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
-    Hits acid and conjugate base.
-Glycine
-    N[CX4H2][CX3](=[OX1])[O,N]
-Histidine side chain.
-    [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
-    [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
-    Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
-Isoleucine side chain
-    [CHX4]([CH3X4])[CH2X4][CH3X4]
-Leucine side chain
-    [CH2X4][CHX4]([CH3X4])[CH3X4]
-Lysine side chain.
-    [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
-    Acid and conjugate base
-Methionine side chain
-    [CH2X4][CH2X4][SX2][CH3X4]
-Phenylalanine side chain
-    [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
-Proline
-    N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
-Serine side chain
-    [CH2X4][OX2H]
-Threonine side chain
-    [CHX4]([CH3X4])[OX2H]
-Tryptophan side chain
-    [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
-Tyrosine side chain.
-    [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
-    Acid and conjugate base
-Valine side chain
-    [CHX4]([CH3X4])[CH3X4]
-azide (-azido)
-Azide group.
-    [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
-    Hits any atom with an attached azide.
-Azide ion.
-    [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
-    Hits N in azide ion
-azo
-Nitrogen.
-    [#7]
-    Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
-Azo Nitrogen. Low specificity.
-    [NX2]=N
-    Hits diazene, azoxy and some diazo structures
-Azo Nitrogen.diazene
-    [NX2]=[NX2]
-    (diaza alkene)
-Azoxy Nitrogen.
-    [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
-Diazo Nitrogen
-    [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
-Azole.
-    [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
-    5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace CH o' furfuran, thiophene, pyrrol w/ N)
-hydrazine
-Hydrazine H2NNH2
-    [NX3][NX3]
-hydrazone
-Hydrazone C=NNH2
-    [NX3][NX2]=[*]
-imine
-Substituted imine
-    [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
-    Schiff base
-Substituted or un-substituted imine
-    [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
-Iminium
-    [NX3+]=[CX3]
-imide
-Unsubstituted dicarboximide
-    [CX3](=[OX1])[NX3H][CX3](=[OX1])
-Substituted dicarboximide
-    [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
-Dicarboxdiimide
-    [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])
-nitrate
-Nitrate group
-    [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
-    Also hits nitrate anion
-Nitrate Anion
-    [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]
-nitrile
-Nitrile
-    [NX1]#[CX2]
-Isonitrile
-    [CX1-]#[NX2+]
-nitro
-Nitro group.
-    [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
-Two Nitro groups
-    [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]
-nitroso
-Nitroso-group
-    [NX2]=[OX1]
-n-oxide
-N-Oxide
-    [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
-    Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.
-O
-hydroxyl (includes alcohol, phenol)
-Hydroxyl
-    [OX2H]
-Hydroxyl in Alcohol
-    [#6][OX2H]
-Hydroxyl in Carboxylic Acid
-    [OX2H][CX3]=[OX1]
-Hydroxyl in H-O-P-
-    [OX2H]P
-Enol
-    [OX2H][#6X3]=[#6]
-Phenol
-    [OX2H][cX3]:[c]
-Enol or Phenol
-    [OX2H][$(C=C),$(cc)]
-Hydroxyl_acidic
-    [$([OH]-*=[!#6])]
-    An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids.
-peroxide
-Peroxide groups.
-    [OX2,OX1-][OX2,OX1-]
-    Also hits anions.
-P
-phosphoric compounds
-Phosphoric_acid groups.
-    [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
-    Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on linear triphosphoric acid and longer).
-Phosphoric_ester groups.
-    [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
-    Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.
-S
-thio groups ( thio-, thi-, sulpho-, mercapto- )
-Carbo-Thiocarboxylate
-    [S-][CX3](=S)[#6]
-Carbo-Thioester
-    S([#6])[CX3](=O)[#6]
-Thio analog of carbonyl
-    [#6X3](=[SX1])([!N])[!N]
-    Where S replaces O. Not a thioamide.
-Thiol, Sulfide or Disulfide Sulfur
-    [SX2]
-Thiol
-    [#16X2H]
-Sulfur with at-least one hydrogen.
-    [#16!H0]
-Thioamide
-    [NX3][CX3]=[SX1]
-sulfide
-Sulfide
-    [#16X2H0]
-    -alkylthio Won't hit thiols. Hits disulfides.
-Mono-sulfide
-    [#16X2H0][!#16]
-    alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
-Di-sulfide
-    [#16X2H0][#16X2H0]
-    Won't hit thiols. Won't hit mono-sulfides.
-Two Sulfides
-    [#16X2H0][!#16].[#16X2H0][!#16]
-    Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
-sulfinate
-Sulfinate
-    [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
-    Won't hit Sulfinic Acid. Hits Both Depiction Forms.
-Sulfinic Acid
-    [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
-    Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate).
-sulfone
-Sulfone. Low specificity.
-    [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
-    Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
-Sulfone. High specificity.
-    [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
-    Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
-Sulfonic acid. High specificity.
-    [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
-    Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
-Sulfonate
-    [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
-    (sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
-Sulfonamide.
-    [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
-    Only hits carbo- sulfonamide. Hits Both Depiction Forms.
-Carbo-azosulfone
-    [SX4](C)(C)(=O)=N
-    Partial N-Analog of Sulfone
-Sulfonamide
-    [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
-    (sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.
-sulfoxide
-Sulfoxide Low specificity.
-    [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
-    ( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones.
-Sulfoxide High specificity
-    [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
-    (sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.
-sulfate
-Sulfate
-    [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
-    (sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms.
-Sulfuric acid ester (sulfate ester) Low specificity.
-    [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
-    Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms.
-Sulfuric Acid Diester.
-    [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
-    Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
-sulfamate
-Sulfamate.
-    [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
-    Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
-Sulfamic Acid.
-    [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
-    Hits acid and conjugate base. Hits Both Depiction Forms.
-sulfene
-Sulfenic acid.
-    [#16X2][OX2H,OX1H0-]
-    Hits acid and conjugate base.
-Sulfenate.
-    [#16X2][OX2H0]
-X
-halide (-halo -fluoro -chloro -bromo -iodo)
-Any carbon attached to any halogen
-    [#6][F,Cl,Br,I]
-Halogen
-    [F,Cl,Br,I]
-Three_halides groups
-    [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
-    Hits SMILES that have three halides.
-acyl halide
-Acyl Halide
-    [CX3](=[OX1])[F,Cl,Br,I]
-    (acid halide, -oyl halide)
-3. Gross Structual Features
-Chirality 	Orbital Configuration 	Connectivity 	Chains & Branching 	Rotation 	Cyclic Features
-Chirality
-Specified chiral carbon.
-    [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
-    Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i mplicit H).
-"No-conflict" chiral match
-    C[C@?](F)(Cl)Br
-    Will match molecules with chiralities as specified or unspecified.
-"No-conflict" chiral match where an H is present
-    C[C@?H](Cl)Br
-    Will match molecules with chiralities as specified or unspecified.
-Orbital Configuration
-sp2 cationic carbon
-    [$([cX2+](:*):*)]
-    Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
-Aromatic sp2 carbon.
-    [$([cX3](:*):*),$([cX2+](:*):*)]
-    The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)
-Any sp2 carbon.
-    [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
-    The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons ( alkenes). The fourth case matches non-aromatic cationic alkene carbons.
-Any sp2 nitrogen.
-    [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
-    Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), rspectively.
-Explicit Hydrogen on sp2-Nitrogen
-    [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
-    (H must be an isotope or ion)
-sp3 nitrogen
-    [$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)]
-    One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
-Explicit Hydrogen on an sp3 N.
-    [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])]
-    One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
-sp2 N in N-Oxide
-    [$([$([NX3]=O),$([NX3+][O-])])]
-sp3 N in N-Oxide Exclusive:
-    [$([$([NX4]=O),$([NX4+][O-])])]
-    Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
-sp3 N in N-Oxide Inclusive:
-    [$([$([NX4]=O),$([NX4+][O-,#0])])]
-    Hits if O could be present. Hits if * if used in place of O in smiles.
-Connectivity
-Quaternary Nitrogen
-    [$([NX4+]),$([NX4]=*)]
-    Hits non-aromatic Ns.
-Tricoordinate S double bonded to N.
-    [$([SX3]=N)]
-S double-bonded to Carbon
-    [$([SX1]=[#6])]
-    Hits terminal (1-connected S)
-Triply bonded N
-    [$([NX1]#*)]
-Divalent Oxygen
-    [$([OX2])]
-Chains & Branching
-Unbranched_alkane groups.
-    [R0;D2][R0;D2][R0;D2][R0;D2]
-    Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
-Unbranched_chain groups.
-    [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
-    Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
-Long_chain groups.
-    [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
-    Aliphatic chains at-least 8 members long.
-Atom_fragment
-    [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
-    (CLOGP definition) A fragment atom is a not an isolating carbon
-Carbon_isolating
-    [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
-    This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
-Terminal S bonded to P
-    [$([SX1]~P)]
-Nitrogen on -N-C=N-
-    [$([NX3]C=N)]
-Nitrogen on -N-N=C-
-    [$([NX3]N=C)]
-Nitrogen on -N-N=N-
-    [$([NX3]N=N)]
-Oxygen in -O-C=N-
-    [$([OX2]C=N)]
-Rotation
-Rotatable bond
-    [!$(*#*)&!D1]-!@[!$(*#*)&!D1]
-    An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note that logical operators can be applied to bonds ("-&!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo be considered rotatable without this specification.
-Cyclic Features
-Bicyclic
-    [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
-    Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
-Ortho
-    *-!:aa-!:*
-    Ortho-substituted ring
-Meta
-    *-!:aaa-!:*
-    Meta-substituted ring
-Para
-    *-!:aaaa-!:*
-    Para-substituted ring
-Acylic-bonds
-    *!@*
-Single bond and not in a ring
-    *-!@*
-Non-ring atom
-    [R0] or [!R]
-Macrocycle groups.
-    [r;!r3;!r4;!r5;!r6;!r7]
-S in aromatic 5-ring with lone pair
-    [sX2r5]
-Aromatic 5-Ring O with Lone Pair
-    [oX2r5]
-N in 5-sided aromatic ring
-    [nX2r5]
-Spiro-ring center
-    [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
-N in 5-ring arom
-    [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
-CIS or TRANS double bond in a ring
-    */,\[R]=;@[R]/,\*
-    An isomeric SMARTS consisting of four atoms and three bonds.
-CIS or TRANS double or aromatic bond in a ring
-    */,\[R]=,:;@[R]/,\*
-Unfused benzene ring
-    [cR1]1[cR1][cR1][cR1][cR1][cR1]1
-    To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
-Multiple non-fused benzene rings
-    [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
-Fused benzene rings
-    c12ccccc1cccc2
-4. Meta-SMARTS
-Amino Acids 	Recursive or Multiple 	Tools &Tricks
-Amino Acids
-Generic amino acid: low specificity.
-    [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
-    For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-A.A. Template for 20 standard a.a.s
-    [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),
-    $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
-    Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-Proline
-    [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
-Glycine
-    [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
-Other a.a.
-    [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
-    Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i polypeptides (internal, or terminal).
-        Example usage:
-        Alanine side chain is [CH3X4]
-        Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
-18_standard_aa_side_chains.
-    ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),
-    $([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
-    $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
-    $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
-    [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
-    $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
-    $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
-    $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
-    Can be any of the standard 18 (Pro & Gly are treated separately) Hits acids and conjugate bases.
-N in Any_standard_amino_acid.
-    [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]
-    (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]
-    (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),
-    $([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$
-    ([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
-    $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
-    $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
-    [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
-    $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
-    $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
-    $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
-    $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
-    Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern al, or terminal).
-Non-standard amino acid.
-    [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),
-    $([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),
-    $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),
-    $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]
-    (=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
-    $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:
-    [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
-    [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),
-    $([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
-    $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),
-    $([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
-    $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
-    $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
-    Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.); !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
-Recursive or Multiple
-Recursive SMARTS: Atoms connected to particular SMARTS
-Ortho
-    [SMARTS_expression]-!:aa-!:[SMARTS_expression]
-Meta
-    [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
-Para
-    [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
-Hydrogen
-    [$([#1][SMARTS_expression])]
-    Hydrogen must be explicit i.e. an isotope or charged
-Nitrogen
-    [$([#7][SMARTS_expression])]
-Oxygen
-    [$([#8][SMARTS_expression])]
-Fluorine
-    [$([#9][SMARTS_expression])]
-Recursive SMARTS: Multiple groups
-Two possible groups
-    [$(SMARTS_expression_A),$(SMARTS_expression_B)]
-    Hits atoms in either environment or group of interest, A or B.
-        Example usages:
-        Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
-        Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
-        Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])])]
-Recursive SMARTS
-    [$([atom_that_gets_hit][other_atom][other_atom])]
-    Hits first atom within parenthesis     Example usages:
-        [$([CX3]=[OX1])] hits Carbonyl Carbon     [$([OX1]=[CX3])] hits Carbonyl Oxygen
-Single only, Double only, Single or Double
-Sulfide
-    [#16X2H0]
-    (-alkylthio) Won't hit thiols. Hits disulfides too.
-Mono-sulfide
-    [#16X2H0][!#16]
-    (alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
-Di-sulfide
-    [#16X2H0][#16X2H0]
-    Won't hit thiols. Won't hit mono-sulfides.
-Two sulfides
-    [#16X2H0][!#16].[#16X2H0][!#16]
-    Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
-Acid/conj-base
-    [OX2H,OX1H0-]
-    Hits acid and conjugate base. acid/base
-Non-acid Oxygen
-    [OX2H0]
-Acid/base
-    [H1,H0-]
-    Works for any atom if base form has no Hs & acid has only one.
-Muntiple Disconnected Groups
-Two disconnected SMARTS fragments
-    ([Cl!$(Cl~c)].[c!$(c~Cl)])
-    A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B oth SMARTS fragments must be in the same SMILES target fragment.
-Two disconnected SMARTS fragments
-    ([Cl]).([c])
-    Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
-Two not-necessarily connected SMARTS fragments
-    ([Cl].[c])
-    Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
-Two not-necessarily connected fragments
-    ([SMARTS_expression]).([SMARTS_expression])
-    Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
-Two primary or secondary amines
-    [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
-    Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.
-Tools &Tricks
-Alternative/Equivalent Representations
-Any carbon aromatic or non-aromatic
-    [#6] or [c,C]
-SMILES wildcard
-    [#0]
-    This SMARTS hits the SMILES *
-Factoring
-    [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
-    Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
-High-precidence "and"
-    [N&X4&+,N&X3&+0] or [NX4+,NX3+0]
-    High-precidence "and" (&) is the default logical operator. "Or" (,) is higher precidence than & and low-precidence "and" (;) is lower precidence than &.
-Hydrogens
-Any atom w/ at-least 1 H
-    [*!H0,#1]
-    In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi dered a property of an atom. This SMARTS provides a way to effectively hit Hs themselves.
-Hs on Carbons
-    [#6!H0,#1]
-Atoms w/ 1 H
-    [H,#1]
-5. Electron & Proton Features
-Acids & Bases 	Charge 	H-bond Donors & Acceptors 	Radicals
-Acids & Bases
-Acid
-    [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
-    Proton donor
-Carboxylic acid
-    [CX3](=O)[OX2H1]
-    (-oic acid, COOH)
-Carboxylic acid or conjugate base.
-    [CX3](=O)[OX1H0-,OX2H1]
-Hydroxyl_acidic
-    [$([OH]-*=[!#6])]
-    An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids
-Phosphoric_Acid
-    [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
-    Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in cluding acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
-Sulfonic Acid. High specificity.
-    [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
-    Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo rms. Hits Arene sulfonic acids.
-Acyl Halide
-    [CX3](=[OX1])[F,Cl,Br,I]
-    (acid halide, -oyl halide)
-Charge
-Anionic divalent Nitrogen
-    [NX2-]
-Oxenium Oxygen
-    [OX2H+]=*
-Oxonium Oxygen
-    [OX3H2+]
-Carbocation
-    [#6+]
-sp2 cationic carbon.
-    [$([cX2+](:*):*)]
-    Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
-Azide ion.
-    [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
-    Hits N in azide ion
-Zwitterion High Specificity
-    [+1]~*~*~[-1]
-    +1 charged atom separated by any 3 bonds from a -1 charged atom.
-Zwitterion Low Specificity, Crude
-    [$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
-    Variously charged moieties separated by up to ten bonds.
-Zwitterion Low Specificity
-    ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
-    Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.
-H-bond Donors & Acceptors
-Hydrogen-bond acceptor
-    [#6,#7;R0]=[#8]
-    Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge n.
-Hydrogen-bond acceptor
-    [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
-    A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
-Hydrogen-bond donor.
-    [!$([#6,H0,-,-2,-3])]
-    A H-bond donor is a non-negatively charged heteroatom with at least one H
-Hydrogen-bond donor.
-    [!H0;#7,#8,#9]
-    Must have an N-H bond, an O-H bond, or a F-H bond
-Possible intramolecular H-bond
-    [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
-    Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.
-Radicals
-Carbon Free-Radical
-    [#6;X3v3+0]
-    Hits a neutral carbon with three single bonds.
-Nitrogen Free-Radical
-    [#7;X2v4+0]
-    Hits a neutral nitrogen with two single bonds or with a single and a triple bond.