tonigi commited on
Commit
a12e7b4
·
1 Parent(s): 6d7b849

move to data dir

Browse files
app.py CHANGED
@@ -28,7 +28,7 @@ compiled_patterns = {name: Chem.MolFromSmarts(smart)
28
  def load_interligand_moieties():
29
  moieties = {}
30
  try:
31
- with open("SMARTS_InteLigand.txt", "r") as f:
32
  for line in f:
33
  line = line.strip()
34
  if not line or line.startswith("#"):
 
28
  def load_interligand_moieties():
29
  moieties = {}
30
  try:
31
+ with open("data/SMARTS_InteLigand.txt", "r") as f:
32
  for line in f:
33
  line = line.strip()
34
  if not line or line.startswith("#"):
SMARTS_InteLigand.txt → data/SMARTS_InteLigand.txt RENAMED
File without changes
data/daylight_smarts.yml ADDED
@@ -0,0 +1,622 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ groups:
2
+ - name: "2. Functional Groups by Element"
3
+ subgroups:
4
+ - name: "C"
5
+ subsubgroups:
6
+ - name: "alkane"
7
+ rules:
8
+ - name: "Alkyl Carbon"
9
+ smarts: "[CX4]"
10
+ - name: "alkene (-ene)"
11
+ rules:
12
+ - name: "Allenic Carbon"
13
+ smarts: "[$([CX2](=C)=C)]"
14
+ - name: "Vinylic Carbon"
15
+ smarts: "[$([CX3]=[CX3])]"
16
+ comment: "Ethenyl carbon"
17
+ - name: "alkyne (-yne)"
18
+ rules:
19
+ - name: "Acetylenic Carbon"
20
+ smarts: "[$([CX2]#C)]"
21
+ - name: "arene (Ar , aryl-, aromatic hydrocarbons)"
22
+ rules:
23
+ - name: "Arene"
24
+ smarts: "c"
25
+ - name: "C & O"
26
+ subsubgroups:
27
+ - name: "carbonyl"
28
+ rules:
29
+ - name: "Carbonyl group. Low specificity"
30
+ smarts: "[CX3]=[OX1]"
31
+ comment: "Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide."
32
+ - name: "Carbonyl group"
33
+ smarts: "[$([CX3]=[OX1]),$([CX3+]-[OX1-])]"
34
+ comment: "Hits either resonance structure"
35
+ - name: "Carbonyl with Carbon"
36
+ smarts: "[CX3](=[OX1])C"
37
+ comment: "Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester."
38
+ - name: "Carbonyl with Nitrogen."
39
+ smarts: "[OX1]=CN"
40
+ comment: "Hits amide, carbamic acid/ester, poly peptide"
41
+ - name: "Carbonyl with Oxygen."
42
+ smarts: "[CX3](=[OX1])O"
43
+ comment: "Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde or ketone."
44
+ - name: "Acyl Halide"
45
+ smarts: "[CX3](=[OX1])[F,Cl,Br,I]"
46
+ comment: "acid halide, -oyl halide"
47
+ - name: "Aldehyde"
48
+ smarts: "[CX3H1](=O)[#6]"
49
+ comment: "-al"
50
+ - name: "Anhydride"
51
+ smarts: "[CX3](=[OX1])[OX2][CX3](=[OX1])"
52
+ - name: "Amide"
53
+ smarts: "[NX3][CX3](=[OX1])[#6]"
54
+ comment: "-amide"
55
+ - name: "Amidinium"
56
+ smarts: "[NX3][CX3]=[NX3+]"
57
+ - name: "Carbamate."
58
+ smarts: "[NX3,NX4+][CX3](=[OX1])[OX2,OX1-]"
59
+ comment: "Hits carbamic esters, acids, and zwitterions"
60
+ - name: "Carbamic ester"
61
+ smarts: "[NX3][CX3](=[OX1])[OX2H0]"
62
+ - name: "Carbamic acid."
63
+ smarts: "[NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]"
64
+ comment: "Hits carbamic acids and zwitterions."
65
+ - name: "Carboxylate Ion."
66
+ smarts: "[CX3](=O)[O-]"
67
+ comment: "Hits conjugate bases of carboxylic, carbamic, and carbonic acids."
68
+ - name: "Carbonic Acid or Carbonic Ester"
69
+ smarts: "[CX3](=[OX1])(O)O"
70
+ comment: "Carbonic Acid, Carbonic Ester, or combination"
71
+ - name: "Carbonic Acid or Carbonic Acid-Ester"
72
+ smarts: "[CX3](=[OX1])([OX2])[OX2H,OX1H0-1]"
73
+ comment: "Hits acid and conjugate base. Won't hit carbonic acid diester"
74
+ - name: "Carbonic Ester (carbonic acid diester)"
75
+ smarts: "C[OX2][CX3](=[OX1])[OX2]C"
76
+ comment: "Won't hit carbonic acid or combination carbonic acid/ester"
77
+ - name: "Carboxylic acid"
78
+ smarts: "[CX3](=O)[OX2H1]"
79
+ comment: "-oic acid, COOH"
80
+ - name: "Carboxylic acid or conjugate base."
81
+ smarts: "[CX3](=O)[OX1H0-,OX2H1]"
82
+ - name: "Cyanamide"
83
+ smarts: "[NX3][CX2]#[NX1]"
84
+ - name: "Ester Also hits anhydrides"
85
+ smarts: "[#6][CX3](=O)[OX2H0][#6]"
86
+ comment: "won't hit formic anhydride."
87
+ - name: "Ketone"
88
+ smarts: "[#6][CX3](=O)[#6]"
89
+ comment: "-one"
90
+ - name: "ether"
91
+ rules:
92
+ - name: "Ether"
93
+ smarts: "[OD2]([#6])[#6]"
94
+ - name: "H"
95
+ subsubgroups:
96
+ - name: "hydrogen atoms"
97
+ rules:
98
+ - name: "Hydrogen Atom"
99
+ smarts: "[H]"
100
+ comment: "Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]"
101
+ - name: "Not a Hydrogen Atom"
102
+ smarts: "[!#1]"
103
+ comment: "Hits SMILES that are not hydrogen atoms."
104
+ - name: "Proton"
105
+ smarts: "[H+]"
106
+ comment: "Hits positively charged hydrogen atoms: [H+]"
107
+ - name: "hydrogen count"
108
+ rules:
109
+ - name: "Mono-Hydrogenated Cation"
110
+ smarts: "[+H]"
111
+ comment: "Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]"
112
+ - name: "Not Mono-Hydrogenated"
113
+ smarts: "[!H] or [!H1]"
114
+ comment: "Hits atoms that don't have exactly one attached hydrogen."
115
+ - name: "N"
116
+ subsubgroups:
117
+ - name: "amide mine (-amino)"
118
+ rules:
119
+ - name: "Primary or secondary amine, not amide."
120
+ smarts: "[NX3;H2,H1;!$(NC=O)]"
121
+ comment: "Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that '&' is the default operator and is higher precedence than ',' which is higher precedence than ';'. Will hit cyanamides and thioamides"
122
+ - name: "Enamine"
123
+ smarts: "[NX3][CX3]=[CX3]"
124
+ - name: "Primary amine, not amide."
125
+ smarts: "[NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6]"
126
+ comment: "Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)"
127
+ - name: "Two primary or secondary amines"
128
+ smarts: "[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]"
129
+ comment: "Here we use the disconnection symbol ('.') to match two separate unbonded identical patterns."
130
+ - name: "Enamine or Aniline Nitrogen"
131
+ smarts: "[NX3][$(C=C),$(cc)]"
132
+ - name: "amino acids"
133
+ rules:
134
+ - name: "Generic amino acid: low specificity."
135
+ smarts: "[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]"
136
+ comment: "For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
137
+ - name: "Dipeptide group. generic amino acid: low specificity."
138
+ smarts: "[NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]"
139
+ comment: "Won't hit pro or gly. Hits acids and conjugate bases."
140
+ - name: "Amino Acid"
141
+ smarts: "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]"
142
+ comment: "Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}"
143
+ - name: "amino acid side chains"
144
+ rules:
145
+ - name: "Alanine side chain"
146
+ smarts: "[CH3X4]"
147
+ - name: "Arginine side chain."
148
+ smarts: "[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]"
149
+ comment: "Hits acid and conjugate base."
150
+ - name: "Aspargine side chain."
151
+ smarts: "[CH2X4][CX3](=[OX1])[NX3H2]"
152
+ comment: "Also hits Gln side chain when used alone."
153
+ - name: "Aspartate (or Aspartic acid) side chain."
154
+ smarts: "[CH2X4][CX3](=[OX1])[OH0-,OH]"
155
+ comment: "Hits acid and conjugate base. Also hits Glu side chain when used alone."
156
+ - name: "Cysteine side chain."
157
+ smarts: "[CH2X4][SX2H,SX1H0-]"
158
+ comment: "Hits acid and conjugate base"
159
+ - name: "Glutamate (or Glutamic acid) side chain."
160
+ smarts: "[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]"
161
+ comment: "Hits acid and conjugate base."
162
+ - name: "Glycine"
163
+ smarts: "[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]"
164
+ - name: "Histidine side chain."
165
+ smarts: "[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1"
166
+ comment: "Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H])."
167
+ - name: "Isoleucine side chain"
168
+ smarts: "[CHX4]([CH3X4])[CH2X4][CH3X4]"
169
+ - name: "Leucine side chain"
170
+ smarts: "[CH2X4][CHX4]([CH3X4])[CH3X4]"
171
+ - name: "Lysine side chain."
172
+ smarts: "[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]"
173
+ comment: "Acid and conjugate base"
174
+ - name: "Methionine side chain"
175
+ smarts: "[CH2X4][CH2X4][SX2][CH3X4]"
176
+ - name: "Phenylalanine side chain"
177
+ smarts: "[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1"
178
+ - name: "Proline"
179
+ smarts: "[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]"
180
+ - name: "Serine side chain"
181
+ smarts: "[CH2X4][OX2H]"
182
+ - name: "Thioamide"
183
+ smarts: "[NX3][CX3]=[SX1]"
184
+ - name: "Threonine side chain"
185
+ smarts: "[CHX4]([CH3X4])[OX2H]"
186
+ - name: "Tryptophan side chain"
187
+ smarts: "[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12"
188
+ - name: "Tyrosine side chain."
189
+ smarts: "[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1"
190
+ comment: "Acid and conjugate base"
191
+ - name: "Valine side chain"
192
+ smarts: "[CHX4]([CH3X4])[CH3X4]"
193
+ - name: "azide (-azido)"
194
+ rules:
195
+ - name: "Azide group."
196
+ smarts: "[$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]"
197
+ comment: "Hits any atom with an attached azide."
198
+ - name: "Azide ion."
199
+ smarts: "[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]"
200
+ comment: "Hits N in azide ion"
201
+ - name: "azo"
202
+ rules:
203
+ - name: "Nitrogen."
204
+ smarts: "[#7]"
205
+ comment: "Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of 'azo'"
206
+ - name: "Azo Nitrogen. Low specificity."
207
+ smarts: "[NX2]=N"
208
+ comment: "Hits diazene, azoxy and some diazo structures"
209
+ - name: "Azo Nitrogen.diazene"
210
+ smarts: "[NX2]=[NX2]"
211
+ comment: "(diaza alkene)"
212
+ - name: "Azoxy Nitrogen."
213
+ smarts: "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]"
214
+ - name: "Diazo Nitrogen"
215
+ smarts: "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]"
216
+ - name: "Azole."
217
+ smarts: "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]"
218
+ comment: "5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace CH o' furfuran, thiophene, pyrrol w/ N)"
219
+ - name: "hydrazine"
220
+ rules:
221
+ - name: "Hydrazine H2NNH2"
222
+ smarts: "[NX3][NX3]"
223
+ - name: "hydrazone"
224
+ rules:
225
+ - name: "Hydrazone C=NNH2"
226
+ smarts: "[NX3][NX2]=[*]"
227
+ - name: "imine"
228
+ rules:
229
+ - name: "Substituted imine"
230
+ smarts: "[CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]"
231
+ comment: "Schiff base"
232
+ - name: "Substituted or un-substituted imine"
233
+ smarts: "[$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]"
234
+ - name: "Iminium"
235
+ smarts: "[NX3+]=[CX3]"
236
+ - name: "imide"
237
+ rules:
238
+ - name: "Unsubstituted dicarboximide"
239
+ smarts: "[CX3](=[OX1])[NX3H][CX3](=[OX1])"
240
+ - name: "Substituted dicarboximide"
241
+ smarts: "[CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])"
242
+ - name: "Dicarboxdiimide"
243
+ smarts: "[CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])"
244
+ - name: "nitrate"
245
+ rules:
246
+ - name: "Nitrate group"
247
+ smarts: "[$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]"
248
+ comment: "Also hits nitrate anion"
249
+ - name: "Nitrate Anion"
250
+ smarts: "[$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]"
251
+ - name: "nitrile"
252
+ rules:
253
+ - name: "Nitrile"
254
+ smarts: "[NX1]#[CX2]"
255
+ - name: "Isonitrile"
256
+ smarts: "[CX1-]#[NX2+]"
257
+ - name: "nitro"
258
+ rules:
259
+ - name: "Nitro group."
260
+ smarts: "[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]"
261
+ comment: "Hits both forms."
262
+ - name: "Two Nitro groups"
263
+ smarts: "[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]"
264
+ - name: "nitroso"
265
+ rules:
266
+ - name: "Nitroso-group"
267
+ smarts: "[NX2]=[OX1]"
268
+ - name: "n-oxide"
269
+ rules:
270
+ - name: "N-Oxide"
271
+ smarts: "[$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]"
272
+ comment: "Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate."
273
+ - name: "O"
274
+ subsubgroups:
275
+ - name: "hydroxyl (includes alcohol, phenol)"
276
+ rules:
277
+ - name: "Hydroxyl"
278
+ smarts: "[OX2H]"
279
+ - name: "Hydroxyl in Alcohol"
280
+ smarts: "[#6][OX2H]"
281
+ - name: "Hydroxyl in Carboxylic Acid"
282
+ smarts: "[OX2H][CX3]=[OX1]"
283
+ - name: "Hydroxyl in H-O-P-"
284
+ smarts: "[OX2H]P"
285
+ - name: "Enol"
286
+ smarts: "[OX2H][#6X3]=[#6]"
287
+ - name: "Phenol"
288
+ smarts: "[OX2H][cX3]:[c]"
289
+ - name: "Enol or Phenol"
290
+ smarts: "[OX2H][$(C=C),$(cc)]"
291
+ - name: "Hydroxyl_acidic"
292
+ smarts: "[$([OH]-*=[!#6])]"
293
+ comment: "An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids."
294
+ - name: "peroxide"
295
+ rules:
296
+ - name: "Peroxide groups."
297
+ smarts: "[OX2,OX1-][OX2,OX1-]"
298
+ comment: "Also hits anions."
299
+ - name: "P"
300
+ subsubgroups:
301
+ - name: "phosphoric compounds"
302
+ rules:
303
+ - name: "Phosphoric_acid groups."
304
+ smarts: "[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]"
305
+ comment: "Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on linear triphosphoric acid and longer)."
306
+ - name: "Phosphoric_ester groups."
307
+ smarts: "[$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]"
308
+ comment: "Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups."
309
+ - name: "S"
310
+ subsubgroups:
311
+ - name: "thio groups ( thio-, thi-, sulpho-, mercapto- )"
312
+ rules:
313
+ - name: "Carbo-Thiocarboxylate"
314
+ smarts: "[S-][CX3](=S)[#6]"
315
+ - name: "Carbo-Thioester"
316
+ smarts: "S([#6])[CX3](=O)[#6]"
317
+ - name: "Thio analog of carbonyl"
318
+ smarts: "[#6X3](=[SX1])([!N])[!N]"
319
+ comment: "Where S replaces O. Not a thioamide."
320
+ - name: "Thiol, Sulfide or Disulfide Sulfur"
321
+ smarts: "[SX2]"
322
+ - name: "Thiol"
323
+ smarts: "[#16X2H]"
324
+ - name: "Sulfur with at-least one hydrogen."
325
+ smarts: "[#16!H0]"
326
+ - name: "Thioamide"
327
+ smarts: "[NX3][CX3]=[SX1]"
328
+ - name: "sulfide"
329
+ rules:
330
+ - name: "Sulfide"
331
+ smarts: "[#16X2H0]"
332
+ comment: "-alkylthio Won't hit thiols. Hits disulfides."
333
+ - name: "Mono-sulfide"
334
+ smarts: "[#16X2H0][!#16]"
335
+ comment: "alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides."
336
+ - name: "Di-sulfide"
337
+ smarts: "[#16X2H0][#16X2H0]"
338
+ comment: "Won't hit thiols. Won't hit mono-sulfides."
339
+ - name: "Two Sulfides"
340
+ smarts: "[#16X2H0][!#16].[#16X2H0][!#16]"
341
+ comment: "Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides."
342
+ - name: "sulfinate"
343
+ rules:
344
+ - name: "Sulfinate"
345
+ smarts: "[$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]"
346
+ comment: "Won't hit Sulfinic Acid. Hits Both Depiction Forms."
347
+ - name: "Sulfinic Acid"
348
+ smarts: "[$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]"
349
+ comment: "Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate)."
350
+ - name: "sulfone"
351
+ rules:
352
+ - name: "Sulfone. Low specificity."
353
+ smarts: "[$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]"
354
+ comment: "Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms."
355
+ - name: "Sulfone. High specificity."
356
+ smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]"
357
+ comment: "Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms."
358
+ - name: "Sulfonic acid. High specificity."
359
+ smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]"
360
+ comment: "Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids."
361
+ - name: "Sulfonate"
362
+ smarts: "[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]"
363
+ comment: "(sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms."
364
+ - name: "Sulfonamide."
365
+ smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]"
366
+ comment: "Only hits carbo- sulfonamide. Hits Both Depiction Forms."
367
+ - name: "Carbo-azosulfone"
368
+ smarts: "[SX4](C)(C)(=O)=N"
369
+ comment: "Partial N-Analog of Sulfone"
370
+ - name: "Sulfonamide"
371
+ smarts: "[$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]"
372
+ comment: "(sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms."
373
+ - name: "sulfoxide"
374
+ rules:
375
+ - name: "Sulfoxide Low specificity."
376
+ smarts: "[$([#16X3]=[OX1]),$([#16X3+][OX1-])]"
377
+ comment: "( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones."
378
+ - name: "Sulfoxide High specificity"
379
+ smarts: "[$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]"
380
+ comment: "(sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones."
381
+ - name: "sulfate"
382
+ rules:
383
+ - name: "Sulfate"
384
+ smarts: "[$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]"
385
+ comment: "(sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms."
386
+ - name: "Sulfuric acid ester (sulfate ester) Low specificity."
387
+ smarts: "[$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]"
388
+ comment: "Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms."
389
+ - name: "Sulfuric Acid Diester."
390
+ smarts: "[$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]"
391
+ comment: "Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms."
392
+ - name: "sulfamate"
393
+ rules:
394
+ - name: "Sulfamate."
395
+ smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]"
396
+ comment: "Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms."
397
+ - name: "Sulfamic Acid."
398
+ smarts: "[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]"
399
+ comment: "Hits acid and conjugate base. Hits Both Depiction Forms."
400
+ - name: "sulfene"
401
+ rules:
402
+ - name: "Sulfenic acid."
403
+ smarts: "[#16X2][OX2H,OX1H0-]"
404
+ comment: "Hits acid and conjugate base."
405
+ - name: "Sulfenate."
406
+ smarts: "[#16X2][OX2H0]"
407
+ - name: "X"
408
+ subsubgroups:
409
+ - name: "halide (-halo -fluoro -chloro -bromo -iodo)"
410
+ rules:
411
+ - name: "Any carbon attached to any halogen"
412
+ smarts: "[#6][F,Cl,Br,I]"
413
+ - name: "Halogen"
414
+ smarts: "[F,Cl,Br,I]"
415
+ - name: "Three_halides groups"
416
+ smarts: "[F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]"
417
+ comment: "Hits SMILES that have three halides."
418
+ - name: "acyl halide"
419
+ rules:
420
+ - name: "Acyl Halide"
421
+ smarts: "[CX3](=[OX1])[F,Cl,Br,I]"
422
+ comment: "(acid halide, -oyl halide)"
423
+ - name: "3. Gross Structual Features"
424
+ subgroups:
425
+ - name: "Chirality"
426
+ rules:
427
+ - name: "Specified chiral carbon."
428
+ smarts: "[$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]"
429
+ comment: "Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified but that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e. implicit H)."
430
+ - name: "\"No-conflict\" chiral match"
431
+ smarts: "C[C@?](F)(Cl)Br"
432
+ comment: "Will match molecules with chiralities as specified or unspecified."
433
+ - name: "\"No-conflict\" chiral match where an H is present"
434
+ smarts: "C[C@?H](Cl)Br"
435
+ comment: "Will match molecules with chiralities as specified or unspecified."
436
+ - name: "Orbital Configuration"
437
+ rules:
438
+ - name: "sp2 cationic carbon"
439
+ smarts: "[$([cX2+](:*):*)]"
440
+ comment: "Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital"
441
+ - name: "Aromatic sp2 carbon."
442
+ smarts: "[$([cX3](:*):*),$([cX2+](:*):*)]"
443
+ comment: "The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)"
444
+ - name: "Any sp2 carbon."
445
+ smarts: "[$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]"
446
+ comment: "The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons (alkenes). The fourth case matches non-aromatic cationic alkene carbons."
447
+ - name: "Any sp2 nitrogen."
448
+ smarts: "[$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]"
449
+ comment: "Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), respectively."
450
+ - name: "Explicit Hydrogen on sp2-Nitrogen"
451
+ smarts: "[$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]"
452
+ comment: "(H must be an isotope or ion)"
453
+ - name: "sp3 nitrogen"
454
+ smarts: "[$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)]"
455
+ comment: "One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded."
456
+ - name: "Explicit Hydrogen on an sp3 N."
457
+ smarts: "[$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])]"
458
+ comment: "One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)"
459
+ - name: "sp2 N in N-Oxide"
460
+ smarts: "[$([$([NX3]=O),$([NX3+][O-])])]"
461
+ - name: "sp3 N in N-Oxide Exclusive:"
462
+ smarts: "[$([$([NX4]=O),$([NX4+][O-])])]"
463
+ comment: "Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O."
464
+ - name: "sp3 N in N-Oxide Inclusive:"
465
+ smarts: "[$([$([NX4]=O),$([NX4+][O-,#0])])]"
466
+ comment: "Hits if O could be present. Hits if * if used in place of O in smiles."
467
+ - name: "Connectivity"
468
+ rules:
469
+ - name: "Quaternary Nitrogen"
470
+ smarts: "[$([NX4+]),$([NX4]=*)]"
471
+ comment: "Hits non-aromatic Ns."
472
+ - name: "Tricoordinate S double bonded to N."
473
+ smarts: "[$([SX3]=N)]"
474
+ - name: "S double-bonded to Carbon"
475
+ smarts: "[$([SX1]=[#6])]"
476
+ comment: "Hits terminal (1-connected S)"
477
+ - name: "Triply bonded N"
478
+ smarts: "[$([NX1]#*)]"
479
+ - name: "Divalent Oxygen"
480
+ smarts: "[$([OX2])]"
481
+ - name: "Chains & Branching"
482
+ rules:
483
+ - name: "Unbranched_alkane groups."
484
+ smarts: "[R0;D2][R0;D2][R0;D2][R0;D2]"
485
+ comment: "Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched)."
486
+ - name: "Unbranched_chain groups."
487
+ smarts: "[R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]"
488
+ comment: "Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched)."
489
+ - name: "Long_chain groups."
490
+ smarts: "[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]"
491
+ comment: "Aliphatic chains at-least 8 members long."
492
+ - name: "Atom_fragment"
493
+ smarts: "[!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]"
494
+ comment: "(CLOGP definition) A fragment atom is a not an isolating carbon"
495
+ - name: "Carbon_isolating"
496
+ smarts: "[$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]"
497
+ comment: "This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromatic hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom."
498
+ - name: "Terminal S bonded to P"
499
+ smarts: "[$([SX1]~P)]"
500
+ - name: "Nitrogen on -N-C=N-"
501
+ smarts: "[$([NX3]C=N)]"
502
+ - name: "Nitrogen on -N-N=C-"
503
+ smarts: "[$([NX3]N=C)]"
504
+ - name: "Nitrogen on -N-N=N-"
505
+ smarts: "[$([NX3]N=N)]"
506
+ - name: "Oxygen in -O-C=N-"
507
+ smarts: "[$([OX2]C=N)]"
508
+ - name: "Rotation"
509
+ rules:
510
+ - name: "Rotatable bond"
511
+ smarts: "[!$(*#*)&!D1]-!@[!$(*#*)&!D1]"
512
+ comment: "An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to an equivalent atom. Note that logical operators can be applied to bonds (\"-&!@\"). Here, the overall SMARTS consists of two atoms and one bond. The bond is \"site and not ring\". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which would be considered rotatable without this specification."
513
+ - name: "Cyclic Features"
514
+ rules:
515
+ - name: "Bicyclic"
516
+ smarts: "[$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]"
517
+ comment: "Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms."
518
+ - name: "Ortho"
519
+ smarts: "*-!:aa-!:*"
520
+ comment: "Ortho-substituted ring"
521
+ - name: "Meta"
522
+ smarts: "*-!:aaa-!:*"
523
+ comment: "Meta-substituted ring"
524
+ - name: "Para"
525
+ smarts: "*-!:aaaa-!:*"
526
+ comment: "Para-substituted ring"
527
+ - name: "Acylic-bonds"
528
+ smarts: "*!@*"
529
+ - name: "Single bond and not in a ring"
530
+ smarts: "*-!@*"
531
+ - name: "Non-ring atom"
532
+ smarts: "[R0] or [!R]"
533
+ - name: "Macrocycle groups."
534
+ smarts: "[r;!r3;!r4;!r5;!r6;!r7]"
535
+ comment: "Macrocycle groups."
536
+ - name: "S in aromatic 5-ring with lone pair"
537
+ smarts: "[sX2r5]"
538
+ - name: "Aromatic 5-Ring O with Lone Pair"
539
+ smarts: "[oX2r5]"
540
+ - name: "N in 5-sided aromatic ring"
541
+ smarts: "[nX2r5]"
542
+ - name: "Spiro-ring center"
543
+ smarts: "[X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6"
544
+ - name: "N in 5-ring arom"
545
+ smarts: "[$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])]"
546
+ comment: "anion"
547
+ - name: "CIS or TRANS double bond in a ring"
548
+ smarts: "*/,\\[R]=;@[R]/,\\*"
549
+ comment: "An isomeric SMARTS consisting of four atoms and three bonds."
550
+ - name: "CIS or TRANS double or aromatic bond in a ring"
551
+ smarts: "*/,\\[R]=,:;@[R]/,\\*"
552
+ - name: "Unfused benzene ring"
553
+ smarts: "[cR1]1[cR1][cR1][cR1][cR1][cR1]1"
554
+ comment: "To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:"
555
+ - name: "Multiple non-fused benzene rings"
556
+ smarts: "[cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1"
557
+ comment: "To find multiple non-fused benzene rings"
558
+ - name: "Fused benzene rings"
559
+ smarts: "c12ccccc1cccc2"
560
+ - name: "4. Meta-SMARTS"
561
+ subgroups:
562
+ - name: "Amino Acids"
563
+ rules:
564
+ - name: "Generic amino acid: low specificity."
565
+ smarts: "[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]"
566
+ comment: "For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
567
+ - name: "A.A. Template for 20 standard a.a.s"
568
+ smarts: "[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]"
569
+ comment: "Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get 'any standard a.a.' Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
570
+ - name: "Proline"
571
+ smarts: "[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]"
572
+ - name: "Glycine"
573
+ smarts: "[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]"
574
+ - name: "Other a.a."
575
+ smarts: "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]"
576
+ comment: "Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal)."
577
+ - name: "Recursive or Multiple"
578
+ rules:
579
+ - name: "Ortho"
580
+ smarts: "[SMARTS_expression]-!:aa-!:[SMARTS_expression]"
581
+ - name: "Meta"
582
+ smarts: "[SMARTS_expression]-!:aaa-!:[SMARTS_expression]"
583
+ - name: "Para"
584
+ smarts: "[SMARTS_expression]-!:aaaa-!:[SMARTS_expression]"
585
+ - name: "Hydrogen"
586
+ smarts: "[$([#1][SMARTS_expression])]"
587
+ comment: "Hydrogen must be explicit i.e. an isotope or charged"
588
+ - name: "Nitrogen"
589
+ smarts: "[$([#7][SMARTS_expression])]"
590
+ - name: "Oxygen"
591
+ smarts: "[$([#8][SMARTS_expression])]"
592
+ - name: "Fluorine"
593
+ smarts: "[$([#9][SMARTS_expression])]"
594
+ - name: "Two possible groups"
595
+ smarts: "[$(SMARTS_expression_A),$(SMARTS_expression_B)]"
596
+ comment: "Hits atoms in either environment or group of interest, A or B."
597
+ - name: "Tools & Tricks"
598
+ rules:
599
+ - name: "Any carbon aromatic or non-aromatic"
600
+ smarts: "[#6] or [c,C]"
601
+ - name: "SMILES wildcard"
602
+ smarts: "[#0]"
603
+ comment: "This SMARTS hits the SMILES *"
604
+ - name: "Factoring"
605
+ smarts: "[OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]"
606
+ comment: "Factor out common atomic expressions in the recursive SMARTS. May improve human readability."
607
+ - name: "High-precidence 'and'"
608
+ smarts: "[N&X4&+,N&X3&+0] or [NX4+,NX3+0]"
609
+ comment: "High-precidence 'and' (&) is the default operator. 'Or' (,) is higher precidence than & and low-precidence 'and' (;) is lower precidence than &."
610
+ - name: "5. Electron & Proton Features"
611
+ subgroups:
612
+ - name: "Acids & Bases"
613
+ rules:
614
+ - name: "Acid"
615
+ smarts: "[!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]"
616
+ comment: "Proton donor"
617
+ - name: "Carboxylic acid"
618
+ smarts: "[CX3](=O)[OX2H1]"
619
+ comment: "(-oic acid, COOH)"
620
+ - name: "Carboxylic acid or conjugate base."
621
+ smarts: "[CX3](=O)[OX..."
622
+ comment: "The file is truncated beyond this point."
data/smarts_examples.html ADDED
@@ -0,0 +1,1353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2
+ "http://www.w3.org/TR/html4/loose.dtd">
3
+ <html>
4
+ <head>
5
+ <title>Daylight&gt;SMARTS Examples</title>
6
+ <link rel="stylesheet" href="/b.css" type="text/css">
7
+ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
8
+ </head>
9
+ <body>
10
+ <table width=750 cellpadding=0 cellspacing=0 border=0>
11
+ <tr>
12
+ <td align=center> <iframe src="/iframes/header2.html" name="iframe4" width="745" height="170"
13
+ scrolling="no" frameborder="0"></iframe></td>
14
+ </tr>
15
+ </table>
16
+ <table width=750 cellpadding=15>
17
+ <tr><td class="border-bot">
18
+ <center><h1>SMARTS Examples
19
+ </h1></center>
20
+ <a name="TOP"></a><h2>Table of Contents</h2>
21
+
22
+ <a href="#INTRO">1. Introduction</a><br>
23
+ <a href="#GROUP">2. Functional Groups by Element</a><br>
24
+ <a href="#STRUCTUAL">2. Gross Structual Features</a><br>
25
+ <a href="#META">4. Meta-SMARTS</a><br>
26
+ <a href="#E-">5. Electron &amp; Proton Features</a><br>
27
+ <a href="#BREAK">6. Breakdown of Complex SMARTS</a><br>
28
+ <a href="#EXMPL">7. Interesting Example SMARTS</a><br>
29
+ <br>
30
+ <a NAME="INTRO"></a>
31
+ <H2>
32
+ 1. Introduction
33
+ </H2>
34
+ When using SMARTS to do searches, it is often helpful to have
35
+ example queries from which to start. This document contains
36
+ many potentially useful example SMARTS which may be used to
37
+ perform searches. templates, examples and ideas.
38
+ <br><br>
39
+ These SMARTS have been tested, but they may still contain errors.
40
+ Please send corrections, improvements, additions, and questions to
41
+ <A HREF="mailto:support@daylight.com">support@daylight.com.</A>
42
+
43
+ <br><br>
44
+ <a NAME="GROUP"></a>
45
+ <H2>
46
+ 2. Functional Groups by Element
47
+ </H2>
48
+
49
+ <table border=1 COLS=8 WIDTH="750"><tr>
50
+ <td align=center><a href="#C">C</a></td>
51
+ <td align=center><a href="#CO">C&amp;O</a></td>
52
+ <td align=center><a href="#H">H</a></td>
53
+ <td align=center><a href="#N">N</a></td>
54
+ <td align=center><a href="#O">O</a></td>
55
+ <td align=center><a href="#P">P</a></td>
56
+ <td align=center><a href="#S">S</a></td>
57
+ <td align=center><a href="#X">X</a></td></tr>
58
+ </table><br>
59
+ <a NAME="C"></a><h2></a>C</h2>
60
+ <h3> alkane </h3><dl>
61
+ <p><dt> Alkyl Carbon
62
+ <dd> [CX4]</p></dl><br>
63
+ <h3> alkene (-ene) </h3><dl>
64
+ <p><dt> Allenic Carbon
65
+ <dd> [$([CX2](=C)=C)]
66
+ <p><dt> Vinylic Carbon
67
+ <dd> [$([CX3]=[CX3])]
68
+ <dd> Ethenyl carbon </p></dl><br>
69
+ <h3> alkyne (-yne) </h3><dl>
70
+ <p><dt> Acetylenic Carbon
71
+ <dd> [$([CX2]#C)]</p></dl><br>
72
+ <h3> arene (Ar , aryl-, aromatic hydrocarbons) </h3><dl>
73
+ <p><dt> Arene
74
+ <dd> c </p></dl><br>
75
+ <a NAME="CO"></a><h2>C &amp; O</h2>
76
+ <h3>carbonyl</h3><dl>
77
+ <p><dt> Carbonyl group. Low specificity
78
+ <dd> [CX3]=[OX1]
79
+ <dd> Hits carboxylic acid, ester, ketone, aldehyde, carbonic
80
+ acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
81
+ <p><dt> Carbonyl group
82
+ <dd> [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
83
+ <dd> Hits either resonance structure
84
+ <p><dt> Carbonyl with Carbon
85
+ <dd> [CX3](=[OX1])C
86
+ <dd> Hits aldehyde, ketone, carboxylic acid (except formic), anhydride
87
+ (except formic), acyl halides (acid halides). Won't hit carbamic
88
+ acid/ester, carbonic acid/ester.
89
+ <p><dt> Carbonyl with Nitrogen.
90
+ <dd> [OX1]=CN
91
+ <dd> Hits amide, carbamic acid/ester, poly peptide
92
+ <p><dt> Carbonyl with Oxygen.
93
+ <dd> [CX3](=[OX1])O
94
+ <dd> Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid
95
+ or ester, anhydride Won't hit aldehyde or ketone.
96
+ <p><dt> Acyl Halide
97
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
98
+ <dd> acid halide, -oyl halide
99
+ <p><dt> Aldehyde
100
+ <dd> [CX3H1](=O)[#6]
101
+ <dd> -al
102
+ <p><dt> Anhydride
103
+ <dd> [CX3](=[OX1])[OX2][CX3](=[OX1])
104
+ <p><dt> Amide
105
+ <dd> [NX3][CX3](=[OX1])[#6]
106
+ <dd> -amide
107
+ <p><dt> Amidinium
108
+ <dd> [NX3][CX3]=[NX3+]
109
+ <p><dt> Carbamate.
110
+ <dd> [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
111
+ <dd> Hits carbamic esters, acids, and zwitterions
112
+ <p><dt> Carbamic ester
113
+ <dd> [NX3][CX3](=[OX1])[OX2H0]
114
+ <p><dt> Carbamic acid.
115
+ <dd> [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
116
+ <dd> Hits carbamic acids and zwitterions.
117
+ <p><dt> Carboxylate Ion.
118
+ <dd> [CX3](=O)[O-]
119
+ <dd> Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
120
+ <p><dt> Carbonic Acid or Carbonic Ester
121
+ <dd> [CX3](=[OX1])(O)O
122
+ <dd> Carbonic Acid, Carbonic Ester, or combination
123
+ <p><dt> Carbonic Acid or Carbonic Acid-Ester
124
+ <dd> [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
125
+ <dd> Hits acid and conjugate base. Won't hit carbonic acid diester
126
+ <p><dt> Carbonic Ester (carbonic acid diester)
127
+ <dd> C[OX2][CX3](=[OX1])[OX2]C
128
+ <dd> Won't hit carbonic acid or combination carbonic acid/ester
129
+ <p><dt> Carboxylic acid
130
+ <dd> [CX3](=O)[OX2H1]
131
+ <dd> -oic acid, COOH
132
+ <p><dt> Carboxylic acid or conjugate base.
133
+ <dd> [CX3](=O)[OX1H0-,OX2H1]
134
+ <p><dt> Cyanamide
135
+ <dd> [NX3][CX2]#[NX1]
136
+ <p><dt> Ester Also hits anhydrides
137
+ <dd> [#6][CX3](=O)[OX2H0][#6]
138
+ <dd> won't hit formic anhydride.
139
+ <p><dt> Ketone
140
+ <dd> [#6][CX3](=O)[#6]
141
+ <dd> -one </p></dl><br>
142
+ <h3> ether</h3><dl>
143
+ <p><dt> Ether
144
+ <dd> [OD2]([#6])[#6]</p></dl><br>
145
+ <a NAME="H"></a><h2></a>H</h2>
146
+ <h3> hydrogen atoms</h3><dl>
147
+ <p><dt> Hydrogen Atom
148
+ <dd> [H]
149
+ <dd> Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
150
+ <p><dt> Not a Hydrogen Atom
151
+ <dd> [!#1]
152
+ <dd> Hits SMILES that are not hydrogen atoms.
153
+ <p><dt> Proton
154
+ <dd> [H+]
155
+ <dd> Hits positively charged hydrogen atoms: [H+]</p></dl><br>
156
+ <h3> hydrogen count</h3><dl>
157
+ <p><dt> Mono-Hydrogenated Cation
158
+ <dd> [+H]
159
+ <dd> Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
160
+ <p><dt> Not Mono-Hydrogenated
161
+ <dd> [!H] or [!H1]
162
+ <dd> Hits atoms that don't have exactly one attached hydrogen.</p></dl><br>
163
+ <a NAME="N"></a><h2>N</h2>
164
+ <h3> amide </b> see carbonyl</p><br>
165
+ mine (-amino) </h3><dl>
166
+ <p><dt> Primary or secondary amine, not amide.
167
+ <dd> [NX3;H2,H1;!$(NC=O)]
168
+ <dd> Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 &amp; H1 respectively). Also note that "&amp;" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
169
+ <p><dt> Enamine
170
+ <dd> [NX3][CX3]=[CX3]
171
+ <p><dt> Primary amine, not amide.
172
+ <dd> [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
173
+ <p><dt> Two primary or secondary amines
174
+ <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
175
+ <dd> Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
176
+ <p><dt> Enamine or Aniline Nitrogen
177
+ <dd> [NX3][$(C=C),$(cc)]</p></dl><br>
178
+ <h3> amino acids</h3><dl>
179
+ <p><dt> Generic amino acid: low specificity.
180
+ <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
181
+ <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
182
+ <p><dt> Dipeptide group. generic amino acid: low specificity.
183
+ <dd> [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
184
+ <dd> Won't hit pro or gly. Hits acids and conjugate bases.
185
+ <p><dt> Amino Acid
186
+ <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
187
+ <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
188
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i
189
+ n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([
190
+ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}</p></dl><br>
191
+ <h3> amino acid side chains</h3><dl>
192
+ <p><dt> Alanine side chain
193
+ <dd> [CH3X4]
194
+
195
+ <p><dt> Arginine side chain.
196
+ <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
197
+ <dd> Hits acid and conjugate base.
198
+
199
+ <p><dt> Aspargine side chain.
200
+ <dd> [CH2X4][CX3](=[OX1])[NX3H2]
201
+ <dd> Also hits Gln side chain when used alone.
202
+
203
+ <p><dt> Aspartate (or Aspartic acid) side chain.
204
+ <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
205
+ <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
206
+
207
+ <p><dt> Cysteine side chain.
208
+ <dd> [CH2X4][SX2H,SX1H0-]
209
+ <dd> Hits acid and conjugate base
210
+
211
+ <p><dt> Glutamate (or Glutamic acid) side chain.
212
+ <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
213
+ <dd> Hits acid and conjugate base.
214
+
215
+ <p><dt> Glycine
216
+ <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
217
+ <p><dt> Histidine side chain.
218
+ <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
219
+ <dd> Hits acid &amp; conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
220
+ 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
221
+
222
+ <p><dt> Isoleucine side chain
223
+ <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
224
+
225
+ <p><dt> Leucine side chain
226
+ <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
227
+
228
+ <p><dt> Lysine side chain.
229
+ <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
230
+ <dd> Acid and conjugate base
231
+
232
+ <p><dt> Methionine side chain
233
+ <dd> [CH2X4][CH2X4][SX2][CH3X4]
234
+
235
+ <p><dt> Phenylalanine side chain
236
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
237
+
238
+ <p><dt> Proline
239
+ <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
240
+
241
+ <p><dt> Serine side chain
242
+ <dd> [CH2X4][OX2H]
243
+
244
+ <p><dt> Thioamide
245
+ <dd> [NX3][CX3]=[SX1]
246
+
247
+ <p><dt> Threonine side chain
248
+ <dd> [CHX4]([CH3X4])[OX2H]
249
+
250
+ <p><dt> Tryptophan side chain
251
+ <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
252
+
253
+ <p><dt> Tyrosine side chain.
254
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
255
+ <dd> Acid and conjugate base
256
+
257
+ <p><dt> Valine side chain
258
+ <dd> [CHX4]([CH3X4])[CH3X4]
259
+
260
+ <p><dt> Alanine side chain
261
+ <dd> [CH3X4]
262
+
263
+ <p><dt> Arginine side chain.
264
+ <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
265
+ <dd> Hits acid and conjugate base.
266
+
267
+ <p><dt> Aspargine side chain.
268
+ <dd> [CH2X4][CX3](=[OX1])[NX3H2]
269
+ <dd> Also hits Gln side chain when used alone.
270
+
271
+ <p><dt> Aspartate (or Aspartic acid) side chain.
272
+ <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
273
+ <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
274
+
275
+ <p><dt> Cysteine side chain.
276
+ <dd> [CH2X4][SX2H,SX1H0-]
277
+ <dd> Hits acid and conjugate base
278
+
279
+ <p><dt> Glutamate (or Glutamic acid) side chain.
280
+ <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
281
+ <dd> Hits acid and conjugate base.
282
+
283
+ <p><dt> Glycine
284
+ <dd> N[CX4H2][CX3](=[OX1])[O,N]
285
+
286
+ <p><dt> Histidine side chain.
287
+ <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
288
+ <dd> Hits acid &amp; conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
289
+ 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
290
+
291
+ <p><dt> Isoleucine side chain
292
+ <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
293
+
294
+ <p><dt> Leucine side chain
295
+ <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
296
+
297
+ <p><dt> Lysine side chain.
298
+ <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
299
+ <dd> Acid and conjugate base
300
+
301
+ <p><dt> Methionine side chain
302
+ <dd> [CH2X4][CH2X4][SX2][CH3X4]
303
+
304
+ <p><dt> Phenylalanine side chain
305
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
306
+
307
+ <p><dt> Proline
308
+ <dd> N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
309
+
310
+ <p><dt> Serine side chain
311
+ <dd> [CH2X4][OX2H]
312
+
313
+ <p><dt> Threonine side chain
314
+ <dd> [CHX4]([CH3X4])[OX2H]
315
+
316
+ <p><dt> Tryptophan side chain
317
+ <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
318
+
319
+ <p><dt> Tyrosine side chain.
320
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
321
+ <dd> Acid and conjugate base
322
+
323
+ <p><dt> Valine side chain
324
+ <dd> [CHX4]([CH3X4])[CH3X4]</p></dl><br>
325
+
326
+ <h3> azide (-azido) </h3><dl>
327
+
328
+ <p><dt> Azide group.
329
+ <dd> [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
330
+ <dd> Hits any atom with an attached azide.
331
+
332
+ <p><dt> Azide ion.
333
+ <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
334
+ <dd> Hits N in azide ion</p></dl><br>
335
+
336
+ <h3> azo </h3><dl>
337
+
338
+ <p><dt> Nitrogen.
339
+ <dd> [#7]
340
+ <dd> Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
341
+
342
+ <p><dt> Azo Nitrogen. Low specificity.
343
+ <dd> [NX2]=N
344
+ <dd> Hits diazene, azoxy and some diazo structures
345
+
346
+ <p><dt> Azo Nitrogen.diazene
347
+ <dd> [NX2]=[NX2]
348
+ <dd> (diaza alkene)
349
+
350
+ <p><dt> Azoxy Nitrogen.
351
+ <dd> [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
352
+
353
+ <p><dt> Diazo Nitrogen
354
+ <dd> [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
355
+
356
+ <p><dt> Azole.
357
+ <dd> [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
358
+ <dd> 5 member aromatic heterocycle w/ 2double bonds. contains N &amp; another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace
359
+ CH o' furfuran, thiophene, pyrrol w/ N)</p></dl><br>
360
+
361
+ <h3> hydrazine</h3><dl>
362
+
363
+ <p><dt> Hydrazine H2NNH2
364
+ <dd> [NX3][NX3]</p></dl><br>
365
+
366
+ <h3> hydrazone </h3><dl>
367
+
368
+ <p><dt> Hydrazone C=NNH2
369
+ <dd> [NX3][NX2]=[*]</p></dl><br>
370
+
371
+ <h3> imine </h3><dl>
372
+
373
+ <p><dt> Substituted imine
374
+ <dd> [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
375
+ <dd> Schiff base
376
+
377
+ <p><dt> Substituted or un-substituted imine
378
+ <dd> [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
379
+
380
+ <p><dt> Iminium
381
+ <dd> [NX3+]=[CX3]</p></dl><br>
382
+
383
+ <h3> imide </h3><dl>
384
+
385
+ <p><dt> Unsubstituted dicarboximide
386
+ <dd> [CX3](=[OX1])[NX3H][CX3](=[OX1])
387
+
388
+ <p><dt> Substituted dicarboximide
389
+ <dd> [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
390
+
391
+ <p><dt> Dicarboxdiimide
392
+ <dd> [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])</p></dl><br>
393
+
394
+ <h3> nitrate </h3><dl>
395
+
396
+ <p><dt> Nitrate group
397
+ <dd> [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
398
+ <dd> Also hits nitrate anion
399
+
400
+ <p><dt> Nitrate Anion
401
+ <dd> [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]</p></dl><br>
402
+
403
+ <h3> nitrile </h3><dl>
404
+
405
+ <p><dt> Nitrile
406
+ <dd> [NX1]#[CX2]
407
+
408
+ <p><dt> Isonitrile
409
+ <dd> [CX1-]#[NX2+]</p></dl><br>
410
+
411
+ <h3> nitro </h3><dl>
412
+
413
+ <p><dt> Nitro group.
414
+ <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
415
+
416
+ <p><dt> Two Nitro groups
417
+ <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]</p></dl><br>
418
+
419
+ <h3> nitroso </h3><dl>
420
+
421
+ <p><dt> Nitroso-group
422
+ <dd> [NX2]=[OX1]</p></dl><br>
423
+
424
+ <h3> n-oxide </h3><dl>
425
+
426
+ <p><dt> N-Oxide
427
+ <dd> [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
428
+ <dd> Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.</p></dl><br>
429
+
430
+
431
+ <a NAME="O"></a><h2>O</h2>
432
+
433
+
434
+ <h3> hydroxyl (includes alcohol, phenol) </h3><dl>
435
+
436
+ <p><dt> Hydroxyl
437
+ <dd> [OX2H]
438
+
439
+ <p><dt> Hydroxyl in Alcohol
440
+ <dd> [#6][OX2H]
441
+
442
+ <p><dt> Hydroxyl in Carboxylic Acid
443
+ <dd> [OX2H][CX3]=[OX1]
444
+
445
+ <p><dt> Hydroxyl in H-O-P-
446
+ <dd> [OX2H]P
447
+
448
+ <p><dt> Enol
449
+ <dd> [OX2H][#6X3]=[#6]
450
+
451
+ <p><dt> Phenol
452
+ <dd> [OX2H][cX3]:[c]
453
+
454
+ <p><dt> Enol or Phenol
455
+ <dd> [OX2H][$(C=C),$(cc)]
456
+
457
+ <p><dt> Hydroxyl_acidic
458
+ <dd> [$([OH]-*=[!#6])]
459
+ <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous,
460
+ halogen and nitrogen oxyacids.</p></dl><br>
461
+
462
+ <h3> peroxide </h3><dl>
463
+
464
+ <p><dt> Peroxide groups.
465
+ <dd> [OX2,OX1-][OX2,OX1-]
466
+ <dd> Also hits anions.</p></dl><br>
467
+
468
+
469
+ <a NAME="P"></a><h2>P</h2>
470
+
471
+
472
+ <h3> phosphoric compounds </h3><dl>
473
+
474
+ <p><dt> Phosphoric_acid groups.
475
+ <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
476
+ <dd> Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride
477
+ esters (including acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid
478
+ and longer, di- esters on linear triphosphoric acid and longer).
479
+
480
+ <p><dt> Phosphoric_ester groups.
481
+ <dd> [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
482
+ <dd> Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.</p></dl><br>
483
+
484
+ <a NAME="S"></a><h2>S</h2>
485
+
486
+
487
+ <h3>thio groups ( thio-, thi-, sulpho-, mercapto- )</h3><dl>
488
+
489
+
490
+ <p><dt> Carbo-Thiocarboxylate
491
+ <dd> [S-][CX3](=S)[#6]
492
+
493
+ <p><dt> Carbo-Thioester
494
+ <dd> S([#6])[CX3](=O)[#6]
495
+
496
+ <p><dt> Thio analog of carbonyl
497
+ <dd> [#6X3](=[SX1])([!N])[!N]
498
+ <dd> Where S replaces O. Not a thioamide.
499
+
500
+ <p><dt> Thiol, Sulfide or Disulfide Sulfur
501
+ <dd> [SX2]
502
+
503
+ <p><dt> Thiol
504
+ <dd> [#16X2H]
505
+
506
+ <p><dt> Sulfur with at-least one hydrogen.
507
+ <dd> [#16!H0]
508
+
509
+ <p><dt> Thioamide
510
+ <dd> [NX3][CX3]=[SX1]</p></dl><br>
511
+
512
+ <h3>sulfide</h3><dl>
513
+
514
+ <p><dt> Sulfide
515
+ <dd> [#16X2H0]
516
+ <dd> -alkylthio Won't hit thiols. Hits disulfides.
517
+
518
+ <p><dt> Mono-sulfide
519
+ <dd> [#16X2H0][!#16]
520
+ <dd> alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
521
+
522
+ <p><dt> Di-sulfide
523
+ <dd> [#16X2H0][#16X2H0]
524
+ <dd> Won't hit thiols. Won't hit mono-sulfides.
525
+
526
+ <p><dt> Two Sulfides
527
+ <dd> [#16X2H0][!#16].[#16X2H0][!#16]
528
+ <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.</p></dl><br>
529
+
530
+ <h3>sulfinate</h3><dl>
531
+
532
+ <p><dt> Sulfinate
533
+ <dd> [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
534
+ <dd> Won't hit Sulfinic Acid. Hits Both Depiction Forms.
535
+
536
+ <p><dt> Sulfinic Acid
537
+ <dd> [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
538
+ <dd> Won't hit substituted Sulfinates. Hits Both Depiction Forms.
539
+ Hits acid and conjugate base (sulfinate).</p></dl><br>
540
+
541
+ <h3>sulfone</h3><dl>
542
+
543
+ <p><dt> Sulfone. Low specificity.
544
+ <dd> [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
545
+ <dd> Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- &amp; di- esters, sulfamic
546
+ acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
547
+
548
+ <p><dt> Sulfone. High specificity.
549
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
550
+ <dd> Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
551
+
552
+ <p><dt> Sulfonic acid. High specificity.
553
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
554
+ <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules).
555
+ Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
556
+
557
+ <p><dt> Sulfonate
558
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
559
+ <dd> (sulfonic ester) Only hits carbon-substituted sulfur
560
+ (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
561
+
562
+ <p><dt> Sulfonamide.
563
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
564
+ <dd> Only hits carbo- sulfonamide. Hits Both Depiction Forms.
565
+
566
+ <p><dt> Carbo-azosulfone
567
+ <dd> [SX4](C)(C)(=O)=N
568
+ <dd> Partial N-Analog of Sulfone
569
+
570
+ <p><dt> Sulfonamide
571
+ <dd> [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
572
+ <dd> (sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.</p></dl><br>
573
+
574
+ <h3>sulfoxide</h3><dl>
575
+
576
+ <p><dt> Sulfoxide Low specificity.
577
+ <dd> [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
578
+ <dd> ( sulfinyl, thionyl ) Analog of carbonyl where S replaces C.
579
+ Hits all sulfoxides, including heteroatom-substituted sulfoxides,
580
+ dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids...
581
+ Hits Both Depiction Forms. Won't hit sulfones.
582
+
583
+ <p><dt> Sulfoxide High specificity
584
+ <dd> [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
585
+ <dd> (sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides
586
+ (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.</p></dl><br>
587
+
588
+ <h3>sulfate</h3><dl>
589
+
590
+ <p><dt> Sulfate
591
+ <dd> [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
592
+ <dd> (sulfuric acid monoester) Only hits when oxygen is carbon-substituted.
593
+ Hits acid and conjugate base. Hits Both Depiction Forms.
594
+
595
+ <p><dt> Sulfuric acid ester (sulfate ester) Low specificity.
596
+ <dd> [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
597
+ <dd> Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates).
598
+ Hits acid and conjugate base. Hits Both Depiction Forms.
599
+ <p><dt> Sulfuric Acid Diester.
600
+ <dd> [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
601
+ <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.</p></dl><br>
602
+
603
+ <h3>sulfamate</h3><dl>
604
+
605
+ <p><dt> Sulfamate.
606
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
607
+ <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
608
+
609
+ <p><dt> Sulfamic Acid.
610
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
611
+ <dd> Hits acid and conjugate base. Hits Both Depiction Forms.</p></dl><br>
612
+
613
+ <h3>sulfene</h3><dl>
614
+
615
+ <p><dt> Sulfenic acid.
616
+ <dd> [#16X2][OX2H,OX1H0-]
617
+ <dd> Hits acid and conjugate base.
618
+
619
+ <p><dt> Sulfenate.
620
+ <dd> [#16X2][OX2H0]</p></dl><br>
621
+
622
+
623
+ <a NAME="X"></a><h2>X</h2>
624
+
625
+
626
+ <h3> halide (-halo -fluoro -chloro -bromo -iodo) </h3><dl>
627
+
628
+ <p><dt> Any carbon attached to any halogen
629
+ <dd> [#6][F,Cl,Br,I]
630
+
631
+ <p><dt> Halogen
632
+ <dd> [F,Cl,Br,I]
633
+
634
+ <p><dt> Three_halides groups
635
+ <dd> [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
636
+ <dd> Hits SMILES that have three halides.</p></dl><br>
637
+
638
+ <h3> acyl halide </h3><dl>
639
+
640
+ <p><dt> Acyl Halide
641
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
642
+ <dd> (acid halide, -oyl halide)</p></dl><br>
643
+
644
+
645
+ <a NAME="STRUCTUAL"></a>
646
+ <H2>
647
+ 3. Gross Structual Features
648
+ </H2><br><br>
649
+
650
+
651
+ <table BORDER COLS=6 WIDTH="750" NOSAVE ><tr>
652
+ <td align=center><a href="#CHIRALITY">Chirality</a></td>
653
+ <td align=center><a href="#ORBITAL">Orbital Configuration</a></td>
654
+ <td align=center><a href="#CONNECT">Connectivity</a></td>
655
+ <td align=center><a href="#CHAIN"> Chains &amp; Branching</a></td>
656
+ <td align=center><a href="#ROTATE">Rotation</a></td>
657
+ <td align=center><a href="#CYCLE">Cyclic Features</a></td>
658
+ </table><br><br>
659
+
660
+
661
+ <a NAME="CHIRALITY"></a><h2>Chirality</h2>
662
+ <dl>
663
+ <p><dt> Specified chiral carbon.
664
+ <dd> [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
665
+ <dd> Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b
666
+ ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i
667
+ mplicit H).
668
+
669
+ <p><dt> "No-conflict" chiral match
670
+ <dd> C[C@?](F)(Cl)Br
671
+ <dd> Will match molecules with chiralities as specified or unspecified.
672
+
673
+ <p><dt> "No-conflict" chiral match where an H is present
674
+ <dd> C[C@?H](Cl)Br
675
+ <dd> Will match molecules with chiralities as specified or unspecified.</p></dl><br>
676
+
677
+ <a NAME="ORBITAL"></a><h2>Orbital Configuration</h2>
678
+
679
+ <dl>
680
+ <p><dt> sp2 cationic carbon
681
+ <dd> [$([cX2+](:*):*)]
682
+ <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
683
+
684
+ <p><dt> Aromatic sp2 carbon.
685
+ <dd> [$([cX3](:*):*),$([cX2+](:*):*)]
686
+ <dd> The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with
687
+ a free electron in a non-bonding sp2 hybrid orbital)
688
+
689
+ <p><dt> Any sp2 carbon.
690
+ <dd> [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
691
+ <dd> The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca
692
+ rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons (
693
+ alkenes). The fourth case matches non-aromatic cationic alkene carbons.
694
+
695
+ <p><dt> Any sp2 nitrogen.
696
+ <dd> [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
697
+
698
+ <dd> Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free
699
+ pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair
700
+ of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist
701
+ in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic
702
+ 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are
703
+ specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen),
704
+ rspectively.
705
+
706
+ <p><dt> Explicit Hydrogen on sp2-Nitrogen
707
+ <dd> [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
708
+ <dd> (H must be an isotope or ion)
709
+
710
+ <p><dt> sp3 nitrogen
711
+ <dd> [$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)]
712
+ <dd> One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
713
+
714
+ <p><dt> Explicit Hydrogen on an sp3 N.
715
+ <dd> [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)])]
716
+ <dd> One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
717
+
718
+ <p><dt> sp2 N in N-Oxide
719
+ <dd> [$([$([NX3]=O),$([NX3+][O-])])]
720
+
721
+ <p><dt> sp3 N in N-Oxide Exclusive:
722
+ <dd> [$([$([NX4]=O),$([NX4+][O-])])]
723
+ <dd> Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
724
+
725
+ <p><dt> sp3 N in N-Oxide Inclusive:
726
+ <dd> [$([$([NX4]=O),$([NX4+][O-,#0])])]
727
+ <dd> Hits if O could be present. Hits if * if used in place of O in smiles.</p></dl><br>
728
+
729
+
730
+ <a NAME="CONNECT"></a><h2>Connectivity</h2>
731
+
732
+ <dl>
733
+ <p><dt> Quaternary Nitrogen
734
+ <dd> [$([NX4+]),$([NX4]=*)]
735
+ <dd> Hits non-aromatic Ns.
736
+ <p><dt> Tricoordinate S double bonded to N.
737
+ <dd> [$([SX3]=N)]
738
+
739
+ <p><dt> S double-bonded to Carbon
740
+ <dd> [$([SX1]=[#6])]
741
+ <dd> Hits terminal (1-connected S)
742
+
743
+ <p><dt> Triply bonded N
744
+ <dd> [$([NX1]#*)]
745
+
746
+ <p><dt> Divalent Oxygen
747
+ <dd> [$([OX2])]</p></dl><br>
748
+
749
+
750
+ <a NAME="CHAIN"></a><h2>Chains &amp; Branching </h2>
751
+
752
+ <dl>
753
+ <p><dt> Unbranched_alkane groups.
754
+ <dd> [R0;D2][R0;D2][R0;D2][R0;D2]
755
+ <dd> Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
756
+ (e.g. halide substituted chains count as branched).
757
+
758
+ <p><dt> Unbranched_chain groups.
759
+ <dd> [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
760
+ <dd> Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
761
+ (e.g. halide substituted chains count as branched).
762
+
763
+ <p><dt> Long_chain groups.
764
+ <dd> [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
765
+ <dd> Aliphatic chains at-least 8 members long.
766
+
767
+ <p><dt> Atom_fragment
768
+ <dd> [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
769
+ <dd> (CLOGP definition) A fragment atom is a not an isolating carbon
770
+
771
+ <p><dt> Carbon_isolating
772
+ <dd> [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
773
+ <dd> This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati
774
+ c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
775
+
776
+ <p><dt> Terminal S bonded to P
777
+ <dd> [$([SX1]~P)]
778
+
779
+ <p><dt> Nitrogen on -N-C=N-
780
+ <dd> [$([NX3]C=N)]
781
+
782
+ <p><dt> Nitrogen on -N-N=C-
783
+ <dd> [$([NX3]N=C)]
784
+
785
+ <p><dt> Nitrogen on -N-N=N-
786
+ <dd> [$([NX3]N=N)]
787
+
788
+ <p><dt> Oxygen in -O-C=N-
789
+ <dd> [$([OX2]C=N)] </p></dl><br>
790
+
791
+
792
+ <a NAME="ROTATE"></a><h2>Rotation</h2>
793
+
794
+ <dl>
795
+ <p><dt> Rotatable bond
796
+ <dd> [!$(*#*)&amp;!D1]-!@[!$(*#*)&amp;!D1]
797
+ <dd> An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note
798
+ that logical operators can be applied to bonds ("-&amp;!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site
799
+ and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to
800
+ use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo
801
+ be considered rotatable without this specification.</p></dl><br>
802
+
803
+
804
+ <a NAME="CYCLE"></a><h2>Cyclic Features</h2>
805
+
806
+ <dl>
807
+ <p><dt> Bicyclic
808
+ <dd> [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
809
+ <dd> Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
810
+
811
+ <p><dt> Ortho
812
+ <dd> *-!:aa-!:*
813
+ <dd> Ortho-substituted ring
814
+
815
+ <p><dt> Meta
816
+ <dd> *-!:aaa-!:*
817
+ <dd> Meta-substituted ring
818
+
819
+ <p><dt> Para
820
+ <dd> *-!:aaaa-!:*
821
+ <dd> Para-substituted ring
822
+
823
+ <p><dt> Acylic-bonds
824
+ <dd> *!@*
825
+
826
+ <p><dt> Single bond and not in a ring
827
+ <dd> *-!@*
828
+
829
+ <p><dt> Non-ring atom
830
+ <dd> [R0] or [!R]
831
+
832
+ <p><dt> Macrocycle groups.
833
+ <dd> [r;!r3;!r4;!r5;!r6;!r7]
834
+
835
+ <p><dt> S in aromatic 5-ring with lone pair
836
+ <dd> [sX2r5]
837
+
838
+ <p><dt> Aromatic 5-Ring O with Lone Pair
839
+ <dd> [oX2r5]
840
+
841
+ <p><dt> N in 5-sided aromatic ring
842
+ <dd> [nX2r5]
843
+
844
+ <p><dt> Spiro-ring center
845
+ <dd> [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
846
+
847
+ <p><dt> N in 5-ring arom
848
+ <dd> [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
849
+
850
+ <p><dt> CIS or TRANS double bond in a ring
851
+ <dd> */,\[R]=;@[R]/,\*
852
+ <dd> An isomeric SMARTS consisting of four atoms and three bonds.
853
+
854
+ <p><dt> CIS or TRANS double or aromatic bond in a ring
855
+ <dd> */,\[R]=,:;@[R]/,\*
856
+
857
+ <p><dt> Unfused benzene ring
858
+ <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1
859
+ <dd> To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
860
+
861
+ <p><dt> Multiple non-fused benzene rings
862
+ <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
863
+
864
+ <p><dt> Fused benzene rings
865
+ <dd> c12ccccc1cccc2</p></dl><br>
866
+
867
+
868
+ <a NAME="META"></a>
869
+ <H2>
870
+ 4. Meta-SMARTS
871
+ </H2><br><br>
872
+
873
+ <table BORDER COLS=3 WIDTH="750" NOSAVE ><tr>
874
+ <td align=center><a href="#AA">Amino Acids </a></td>
875
+ <td align=center><a href="#RECUR"> Recursive or Multiple </a></td>
876
+ <td align=center><a href="#TOOL">Tools &amp;Tricks </a></td>
877
+ </table><br><br>
878
+
879
+
880
+ <a NAME="AA"></a><h2>Amino Acids</h2>
881
+
882
+ <dl>
883
+ <p><dt> Generic amino acid: low specificity.
884
+ <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
885
+ <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues
886
+ w/in polypeptides (internal, or terminal).
887
+
888
+ <p><dt> A.A. Template for 20 standard a.a.s
889
+ <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
890
+
891
+ <dd> Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases.
892
+ Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
893
+
894
+ <p><dt> Proline
895
+ <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
896
+
897
+ <p><dt> Glycine
898
+ <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
899
+
900
+ <p><dt> Other a.a.
901
+ <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
902
+ <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
903
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i
904
+ polypeptides (internal, or terminal).<br>
905
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usage:<br>
906
+ &nbsp;&nbsp;&nbsp;&nbsp;Alanine side chain is [CH3X4] <br>
907
+ &nbsp;&nbsp;&nbsp;&nbsp;Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
908
+
909
+ <p><dt> 18_standard_aa_side_chains.
910
+ <dd> ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),<br>
911
+ $([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
912
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
913
+ $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
914
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
915
+ $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
916
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
917
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
918
+ $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
919
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
920
+ <dd>Can be any of the standard 18 (Pro &amp; Gly are treated separately) Hits acids and conjugate bases.
921
+
922
+ <p><dt> N in Any_standard_amino_acid.
923
+ <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]<br>
924
+ (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]<br>
925
+ (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),<br>
926
+ $([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$<br>
927
+ ([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
928
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
929
+ $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
930
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
931
+ $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
932
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
933
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
934
+ $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
935
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
936
+ $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
937
+ <dd> Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen
938
+ eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e
939
+ . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern
940
+ al, or terminal).
941
+
942
+ <p><dt> Non-standard amino acid.
943
+ <dd> [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),<br>
944
+ $([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>
945
+ $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),<br>
946
+ $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]<br>
947
+ (=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
948
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:<br>
949
+ [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
950
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),<br>
951
+ $([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
952
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
953
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),<br>
954
+ $([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
955
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
956
+ $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
957
+ <dd> Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are
958
+ non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.);
959
+ !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).</p></dl><br>
960
+
961
+
962
+ <a NAME="RECUR"></a><h2>Recursive or Multiple </h2>
963
+
964
+ <h3> Recursive SMARTS: Atoms connected to particular SMARTS</h3><dl>
965
+
966
+ <p><dt> Ortho
967
+ <dd>[SMARTS_expression]-!:aa-!:[SMARTS_expression]
968
+
969
+ <p><dt> Meta
970
+ <dd> [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
971
+
972
+ <p><dt> Para
973
+ <dd> [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
974
+
975
+ <p><dt> Hydrogen
976
+ <dd> [$([#1][SMARTS_expression])]
977
+ <dd> Hydrogen must be explicit i.e. an isotope or charged
978
+
979
+ <p><dt> Nitrogen
980
+ <dd> [$([#7][SMARTS_expression])]
981
+
982
+ <p><dt> Oxygen
983
+ <dd> [$([#8][SMARTS_expression])]
984
+
985
+ <p><dt> Fluorine
986
+ <dd> [$([#9][SMARTS_expression])]</p></dl><br>
987
+
988
+ <h3> Recursive SMARTS: Multiple groups</h3><dl>
989
+
990
+ <p><dt> Two possible groups
991
+ <dd> [$(SMARTS_expression_A),$(SMARTS_expression_B)]
992
+ <dd> Hits atoms in either environment or group of interest, A or B.<br>
993
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
994
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]<br>
995
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]<br>
996
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$(
997
+ [NX1]#[NX2+]-[NX1-2])])]
998
+
999
+ <p><dt> Recursive SMARTS
1000
+ <dd> [$([atom_that_gets_hit][other_atom][other_atom])]
1001
+ <dd> Hits first atom within parenthesis
1002
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
1003
+ &nbsp;&nbsp;&nbsp;&nbsp;[$([CX3]=[OX1])] hits Carbonyl Carbon
1004
+ &nbsp;&nbsp;&nbsp;&nbsp;[$([OX1]=[CX3])] hits Carbonyl Oxygen </p></dl><br>
1005
+
1006
+ <h3> Single only, Double only, Single or Double</h3><dl>
1007
+
1008
+ <p><dt> Sulfide
1009
+ <dd> [#16X2H0]
1010
+ <dd> (-alkylthio) Won't hit thiols. Hits disulfides too.
1011
+
1012
+ <p><dt> Mono-sulfide
1013
+ <dd> [#16X2H0][!#16]
1014
+ <dd> (alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
1015
+
1016
+ <p><dt> Di-sulfide
1017
+ <dd> [#16X2H0][#16X2H0]
1018
+ <dd> Won't hit thiols. Won't hit mono-sulfides.
1019
+
1020
+ <p><dt> Two sulfides
1021
+ <dd> [#16X2H0][!#16].[#16X2H0][!#16]
1022
+
1023
+ <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
1024
+
1025
+ <p><dt> Acid/conj-base
1026
+ <dd> [OX2H,OX1H0-]
1027
+ <dd> Hits acid and conjugate base. acid/base
1028
+
1029
+ <p><dt> Non-acid Oxygen
1030
+ <dd> [OX2H0]
1031
+
1032
+ <p><dt> Acid/base
1033
+ <dd> [H1,H0-]
1034
+ <dd> Works for any atom if base form has no Hs &amp; acid has only one.</p></dl><br>
1035
+
1036
+ <h3> Muntiple Disconnected Groups</h3><dl>
1037
+
1038
+ <p><dt> Two disconnected SMARTS fragments
1039
+ <dd> ([Cl!$(Cl~c)].[c!$(c~Cl)])
1040
+ <dd> A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B
1041
+ oth SMARTS fragments must be in the same SMILES target fragment.
1042
+
1043
+ <p><dt> Two disconnected SMARTS fragments
1044
+ <dd> ([Cl]).([c])
1045
+ <dd> Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
1046
+
1047
+ <p><dt> Two not-necessarily connected SMARTS fragments
1048
+ <dd> ([Cl].[c])
1049
+ <dd> Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
1050
+
1051
+ <p><dt> Two not-necessarily connected fragments
1052
+ <dd> ([SMARTS_expression]).([SMARTS_expression])
1053
+ <dd> Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
1054
+
1055
+ <p><dt> Two primary or secondary amines
1056
+ <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
1057
+ <dd> Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.</p></dl><br>
1058
+
1059
+
1060
+ <a NAME="TOOL"></a><h2>Tools &amp;Tricks</h2>
1061
+
1062
+ <h3> Alternative/Equivalent Representations </h3><dl>
1063
+
1064
+ <p><dt> Any carbon aromatic or non-aromatic
1065
+ <dd> [#6] or [c,C]
1066
+
1067
+ <p><dt> SMILES wildcard
1068
+ <dd> [#0]
1069
+ <dd> This SMARTS hits the SMILES *
1070
+
1071
+ <p><dt> Factoring
1072
+ <dd> [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
1073
+ <dd> Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
1074
+
1075
+ <p><dt> High-precidence "and"
1076
+ <dd> [N&amp;X4&amp;+,N&amp;X3&amp;+0] or [NX4+,NX3+0]
1077
+ <dd> High-precidence "and" (&amp;) is the default logical operator. "Or" (,) is higher precidence than &amp; and low-precidence "and" (;)
1078
+ is lower precidence than &amp;. </p></dl><br>
1079
+
1080
+ <h3> Hydrogens </h3><dl>
1081
+
1082
+ <p><dt> Any atom w/ at-least 1 H
1083
+ <dd> [*!H0,#1]
1084
+ <dd> In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi
1085
+ dered a property of an atom. This SMARTS provides a way to effectively hit Hs themselves.
1086
+
1087
+ <p><dt> Hs on Carbons
1088
+ <dd> [#6!H0,#1]
1089
+
1090
+ <p><dt> Atoms w/ 1 H
1091
+ <dd> [H,#1] </p></dl><br>
1092
+
1093
+
1094
+ <a NAME="E-"></a>
1095
+ <H2>
1096
+ 5. Electron &amp; Proton Features
1097
+ </H2><br><br>
1098
+
1099
+ <table BORDER COLS=3 WIDTH="750" NOSAVE ><tr>
1100
+ <td align=center><a href="#ACID">Acids &amp; Bases </a></td>
1101
+ <td align=center><a href="#CHARGE">Charge</a></td>
1102
+ <td align=center><a href="#H_BOND"> H-bond Donors &amp; Acceptors</a></td>
1103
+ <td align=center><a href="#RAD"> Radicals </a></td>
1104
+ </table><br><br>
1105
+
1106
+
1107
+ <a NAME="ACID"></a><h2> Acids &amp; Bases </h2>
1108
+
1109
+ <dl>
1110
+ <p><dt> Acid
1111
+ <dd> [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
1112
+ <dd> Proton donor
1113
+
1114
+ <p><dt> Carboxylic acid
1115
+ <dd> [CX3](=O)[OX2H1]
1116
+ <dd> (-oic acid, COOH)
1117
+
1118
+ <p><dt> Carboxylic acid or conjugate base.
1119
+ <dd> [CX3](=O)[OX1H0-,OX2H1]
1120
+
1121
+ <p><dt> Hydroxyl_acidic
1122
+ <dd> [$([OH]-*=[!#6])]
1123
+ <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho
1124
+ sphorous, halogen and nitrogen oxyacids
1125
+
1126
+ <p><dt> Phosphoric_Acid
1127
+ <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
1128
+ <dd> Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in
1129
+ cluding acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe
1130
+ r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
1131
+
1132
+ <p><dt> Sulfonic Acid. High specificity.
1133
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
1134
+ <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo
1135
+ rms. Hits Arene sulfonic acids.
1136
+
1137
+ <p><dt> Acyl Halide
1138
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
1139
+ <dd> (acid halide, -oyl halide)</p></dl><br>
1140
+
1141
+
1142
+ <a NAME="CHARGE"></a><h2>Charge </h2>
1143
+
1144
+ <dl>
1145
+ <p><dt> Anionic divalent Nitrogen
1146
+ <dd> [NX2-]
1147
+
1148
+ <p><dt> Oxenium Oxygen
1149
+ <dd> [OX2H+]=*
1150
+
1151
+ <p><dt> Oxonium Oxygen
1152
+ <dd> [OX3H2+]
1153
+
1154
+ <p><dt> Carbocation
1155
+ <dd> [#6+]
1156
+
1157
+ <p><dt> sp2 cationic carbon.
1158
+ <dd> [$([cX2+](:*):*)]
1159
+ <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
1160
+
1161
+ <p><dt> Azide ion.
1162
+ <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
1163
+ <dd> Hits N in azide ion
1164
+
1165
+ <p><dt> Zwitterion High Specificity
1166
+ <dd> [+1]~*~*~[-1]
1167
+ <dd> +1 charged atom separated by any 3 bonds from a -1 charged atom.
1168
+
1169
+ <p><dt> Zwitterion Low Specificity, Crude
1170
+ <dd>[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
1171
+ <dd> Variously charged moieties separated by up to ten bonds.
1172
+
1173
+ <p><dt> Zwitterion Low Specificity
1174
+ <dd> ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
1175
+ <dd> Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.</p></dl>
1176
+ <br>
1177
+
1178
+
1179
+ <a NAME="H_BOND"></a><h2> H-bond Donors &amp; Acceptors</h2>
1180
+
1181
+ <dl>
1182
+ <p><dt> Hydrogen-bond acceptor
1183
+ <dd> [#6,#7;R0]=[#8]
1184
+ <dd> Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge
1185
+ n.
1186
+
1187
+ <p><dt> Hydrogen-bond acceptor
1188
+ <dd> [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
1189
+ <dd> A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are
1190
+ halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl
1191
+ y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
1192
+
1193
+ <p><dt> Hydrogen-bond donor.
1194
+ <dd> [!$([#6,H0,-,-2,-3])]
1195
+ <dd> A H-bond donor is a non-negatively charged heteroatom with at least one H
1196
+
1197
+ <p><dt> Hydrogen-bond donor.
1198
+ <dd> [!H0;#7,#8,#9]
1199
+ <dd> Must have an N-H bond, an O-H bond, or a F-H bond
1200
+
1201
+ <p><dt> Possible intramolecular H-bond
1202
+ <dd> [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
1203
+ <dd> Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid
1204
+ nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.</p></dl><br>
1205
+
1206
+ <a NAME="RAD"></a><h2>Radicals </h2>
1207
+
1208
+ <dl>
1209
+ <p><dt> Carbon Free-Radical
1210
+ <dd> [#6;X3v3+0]
1211
+ <dd> Hits a neutral carbon with three single bonds.
1212
+
1213
+ <p><dt> Nitrogen Free-Radical
1214
+ <dd> [#7;X2v4+0]
1215
+ <dd> Hits a neutral nitrogen with two single bonds or with a single and a triple bond. </p></dl><br>
1216
+
1217
+
1218
+ <a NAME="BREAK"></a>
1219
+ <H2>
1220
+ 6. Breakdown of Complex SMARTS
1221
+ </H2></center><br><br>
1222
+
1223
+
1224
+ <table BORDER COLS=2 WIDTH="750" NOSAVE ><tr>
1225
+ <td align=center><a href="#AM_AC"> Amino Acid </a></td>
1226
+ <td align=center><a href="#ES_AM"> Ester or Amide </a></td>
1227
+ <!--th><!--a href="#"> <!--/a></td>
1228
+ </table><br><br>
1229
+
1230
+
1231
+ <a NAME="AM_AC"><h2>Amino Acid </h2></a>
1232
+
1233
+ <b>[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]</b>
1234
+
1235
+ i<pre>
1236
+ [$( Proline
1237
+ [ N:
1238
+ $([ terminal
1239
+ NX3H neutral
1240
+ , or
1241
+ NX4H2+]) + charged
1242
+ , or
1243
+ $([NX3](C)(C)(C))]1 internal
1244
+ [CX4H] C: alpha
1245
+ ([CH2][CH2][CH2]1) pro side chain
1246
+ [CX3] C: of COOH
1247
+ (=[OX1]) O: =O of COOH
1248
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1249
+ ), OR
1250
+ $( Glycine
1251
+ [ N:
1252
+ $([ terminal
1253
+ NX3H2 neutral
1254
+ , or
1255
+ NX4H3+]) + charged
1256
+ , or
1257
+ $([NX3H](C)(C)) internal
1258
+ [CX4H2] C: alpha (w/ H side chain)
1259
+ [CX3] C: of COOH
1260
+ (=[OX1]) O: =O of COOH
1261
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1262
+ ), OR
1263
+ $( Other amino acid
1264
+ [ N:
1265
+ $([ terminal
1266
+ NX3H2 neutral
1267
+ , or
1268
+ NX4H3+]) + charged
1269
+ , or
1270
+ $([NX3H](C)(C))] internal
1271
+ [CX4H] C: alpha
1272
+ ([*]) any side chain
1273
+ [CX3] C: of COOH
1274
+ (=[OX1]) O: =O of COOH
1275
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1276
+ )]
1277
+ </pre>
1278
+
1279
+ <br><br>
1280
+ <a NAME="ES_AM"><h2> Ester or Amide </h2></a>
1281
+
1282
+
1283
+ <b>[#6][CX3](=O)[$([OX2H0]([#6])[#6]),$([#7])] </b>
1284
+ <pre>
1285
+ [#6] An atom that is a carbon
1286
+ [CX3] Connected to an atom that is a three-connected carbon
1287
+ (=O) Which is double bonded to an oxygen
1288
+ [ Connected to an atom
1289
+ $( That is in an environment where
1290
+ [OX2H0] An atom that is a two-connected oxygen, without hydrogens
1291
+ ([#6])[#6]) Is connected to two carbons, one of them being the carbonyl C
1292
+ , Or
1293
+ $( That is in an environment where
1294
+ [#7] An atom is a nitrogen.
1295
+ )]
1296
+ </pre>
1297
+ <br><br>
1298
+ <a NAME="EXMPL"></a>
1299
+ <H2>
1300
+ 7. Interesting Example SMARTS
1301
+ </H2>
1302
+
1303
+ <dl>
1304
+ <p><dt> Oxygen double bonded to aliphatic carbon or nitrogen, single bonded to an aromatic ring, with a
1305
+ halogen in meta position
1306
+ <dd> [#8]=[C,N]-aaa[F,Cl,Br,I]
1307
+
1308
+ <p><dt> Aliphatic carbon attached to oxygen with any bond
1309
+ <dd> C~O
1310
+
1311
+ <p><dt> Oxygen or nitrogen, with at least one hydrogen attached and not in a ring
1312
+ <dd> [O,N;!H0;R0]
1313
+
1314
+ <p><dt> Oxygen double bonded to aliphatic carbon or nitrogen
1315
+ <dd> [#8]=[C,N] or O=[C,N]
1316
+
1317
+ <p><dt> Aliphatic atom single-bonded to any carbon which isn't a trifluromethyl carbon
1318
+ <dd> A[#6;!$(C(F)(F)F)]
1319
+
1320
+ <p><dt> PCB
1321
+ <dd> [$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]-[$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]
1322
+ <dd> Polychlorinated Biphenyls. Overall SMARTS is atom-bond-atom. Note that ":" is explicit aromatic bond, and "-" is explicit single bo
1323
+ nd. On each side of the single bond, we use three nested SMARTS to represent
1324
+ the ortho, meta, and para position.
1325
+
1326
+ <p><dt> Imidazolium Nitrogen
1327
+ <dd> [nX3r5+]:c:n
1328
+
1329
+ <p><dt> 1-methyl-2-hydroxy benzene with either a Cl or H at the 5 position.
1330
+ <dd> [c;$([*Cl]),$([*H1])]1ccc(O)c(C)c1 or Cc1:c(O):c:c:[$(cCl),$([cH])]:c1
1331
+ <dd> The "H" primitive in SMARTS means "total number
1332
+ of attached hydrogens", i.e., [C] will match C in [CH4] methane, [CH3]
1333
+ methyl, [CH2] methylene, etc., [CH3] will only match methyl. This is similar
1334
+ to the use of "H" in SMILES to specify hydrogen count. The default value
1335
+ for the SMARTS "H" primitive is 1 (same as SMILES, e.g., [CH2]=[CH]-[OH]
1336
+ same as CC=O). This H-specification value includes all attached hydrogens:
1337
+ implicit and explicit (e.g., isotopic [2H]).
1338
+
1339
+ <p><dt> Nonstandard atom groups.
1340
+ <dd> [!#1;!#2;!#3;!#5;!#6;!#7;!#8;!#9;!#11;!#12;!#15;!#16;!#17;!#19;!#20;!#35;!#53]</p></dl><br>
1341
+ <h2>More Information</h2>
1342
+ <A HREF="/dayhtml/doc/theory/theory.smarts.html">Theory Manual</A><br>
1343
+ <A HREF="/dayhtml_tutorials/languages/smarts/smarts_practice.html">SMARTS Practice</A><br>
1344
+ </td>
1345
+ </tr>
1346
+ <tr>
1347
+ <td><iframe src="/iframes/footer.html" name="iframe3" width="350" height="200"
1348
+ scrolling="no" frameborder="0"></iframe></td>
1349
+ </tr>
1350
+ </table>
1351
+ </body>
1352
+ </html>
1353
+
data/smarts_examples.txt ADDED
@@ -0,0 +1,1272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <H2>
3
+ 2. Functional Groups by Element
4
+ </H2>
5
+
6
+ <a NAME="C"></a><h2></a>C</h2>
7
+ <h3> alkane </h3><dl>
8
+ <p><dt> Alkyl Carbon
9
+ <dd> [CX4]</p></dl><br>
10
+ <h3> alkene (-ene) </h3><dl>
11
+ <p><dt> Allenic Carbon
12
+ <dd> [$([CX2](=C)=C)]
13
+ <p><dt> Vinylic Carbon
14
+ <dd> [$([CX3]=[CX3])]
15
+ <dd> Ethenyl carbon </p></dl><br>
16
+ <h3> alkyne (-yne) </h3><dl>
17
+ <p><dt> Acetylenic Carbon
18
+ <dd> [$([CX2]#C)]</p></dl><br>
19
+ <h3> arene (Ar , aryl-, aromatic hydrocarbons) </h3><dl>
20
+ <p><dt> Arene
21
+ <dd> c </p></dl><br>
22
+ <a NAME="CO"></a><h2>C &amp; O</h2>
23
+ <h3>carbonyl</h3><dl>
24
+ <p><dt> Carbonyl group. Low specificity
25
+ <dd> [CX3]=[OX1]
26
+ <dd> Hits carboxylic acid, ester, ketone, aldehyde, carbonic
27
+ acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
28
+ <p><dt> Carbonyl group
29
+ <dd> [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
30
+ <dd> Hits either resonance structure
31
+ <p><dt> Carbonyl with Carbon
32
+ <dd> [CX3](=[OX1])C
33
+ <dd> Hits aldehyde, ketone, carboxylic acid (except formic), anhydride
34
+ (except formic), acyl halides (acid halides). Won't hit carbamic
35
+ acid/ester, carbonic acid/ester.
36
+ <p><dt> Carbonyl with Nitrogen.
37
+ <dd> [OX1]=CN
38
+ <dd> Hits amide, carbamic acid/ester, poly peptide
39
+ <p><dt> Carbonyl with Oxygen.
40
+ <dd> [CX3](=[OX1])O
41
+ <dd> Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid
42
+ or ester, anhydride Won't hit aldehyde or ketone.
43
+ <p><dt> Acyl Halide
44
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
45
+ <dd> acid halide, -oyl halide
46
+ <p><dt> Aldehyde
47
+ <dd> [CX3H1](=O)[#6]
48
+ <dd> -al
49
+ <p><dt> Anhydride
50
+ <dd> [CX3](=[OX1])[OX2][CX3](=[OX1])
51
+ <p><dt> Amide
52
+ <dd> [NX3][CX3](=[OX1])[#6]
53
+ <dd> -amide
54
+ <p><dt> Amidinium
55
+ <dd> [NX3][CX3]=[NX3+]
56
+ <p><dt> Carbamate.
57
+ <dd> [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
58
+ <dd> Hits carbamic esters, acids, and zwitterions
59
+ <p><dt> Carbamic ester
60
+ <dd> [NX3][CX3](=[OX1])[OX2H0]
61
+ <p><dt> Carbamic acid.
62
+ <dd> [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
63
+ <dd> Hits carbamic acids and zwitterions.
64
+ <p><dt> Carboxylate Ion.
65
+ <dd> [CX3](=O)[O-]
66
+ <dd> Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
67
+ <p><dt> Carbonic Acid or Carbonic Ester
68
+ <dd> [CX3](=[OX1])(O)O
69
+ <dd> Carbonic Acid, Carbonic Ester, or combination
70
+ <p><dt> Carbonic Acid or Carbonic Acid-Ester
71
+ <dd> [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
72
+ <dd> Hits acid and conjugate base. Won't hit carbonic acid diester
73
+ <p><dt> Carbonic Ester (carbonic acid diester)
74
+ <dd> C[OX2][CX3](=[OX1])[OX2]C
75
+ <dd> Won't hit carbonic acid or combination carbonic acid/ester
76
+ <p><dt> Carboxylic acid
77
+ <dd> [CX3](=O)[OX2H1]
78
+ <dd> -oic acid, COOH
79
+ <p><dt> Carboxylic acid or conjugate base.
80
+ <dd> [CX3](=O)[OX1H0-,OX2H1]
81
+ <p><dt> Cyanamide
82
+ <dd> [NX3][CX2]#[NX1]
83
+ <p><dt> Ester Also hits anhydrides
84
+ <dd> [#6][CX3](=O)[OX2H0][#6]
85
+ <dd> won't hit formic anhydride.
86
+ <p><dt> Ketone
87
+ <dd> [#6][CX3](=O)[#6]
88
+ <dd> -one </p></dl><br>
89
+ <h3> ether</h3><dl>
90
+ <p><dt> Ether
91
+ <dd> [OD2]([#6])[#6]</p></dl><br>
92
+ <a NAME="H"></a><h2></a>H</h2>
93
+ <h3> hydrogen atoms</h3><dl>
94
+ <p><dt> Hydrogen Atom
95
+ <dd> [H]
96
+ <dd> Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
97
+ <p><dt> Not a Hydrogen Atom
98
+ <dd> [!#1]
99
+ <dd> Hits SMILES that are not hydrogen atoms.
100
+ <p><dt> Proton
101
+ <dd> [H+]
102
+ <dd> Hits positively charged hydrogen atoms: [H+]</p></dl><br>
103
+ <h3> hydrogen count</h3><dl>
104
+ <p><dt> Mono-Hydrogenated Cation
105
+ <dd> [+H]
106
+ <dd> Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
107
+ <p><dt> Not Mono-Hydrogenated
108
+ <dd> [!H] or [!H1]
109
+ <dd> Hits atoms that don't have exactly one attached hydrogen.</p></dl><br>
110
+ <a NAME="N"></a><h2>N</h2>
111
+ <h3> amide </b> see carbonyl</p><br>
112
+ mine (-amino) </h3><dl>
113
+ <p><dt> Primary or secondary amine, not amide.
114
+ <dd> [NX3;H2,H1;!$(NC=O)]
115
+ <dd> Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 &amp; H1 respectively). Also note that "&amp;" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
116
+ <p><dt> Enamine
117
+ <dd> [NX3][CX3]=[CX3]
118
+ <p><dt> Primary amine, not amide.
119
+ <dd> [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
120
+ <p><dt> Two primary or secondary amines
121
+ <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
122
+ <dd> Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
123
+ <p><dt> Enamine or Aniline Nitrogen
124
+ <dd> [NX3][$(C=C),$(cc)]</p></dl><br>
125
+ <h3> amino acids</h3><dl>
126
+ <p><dt> Generic amino acid: low specificity.
127
+ <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
128
+ <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
129
+ <p><dt> Dipeptide group. generic amino acid: low specificity.
130
+ <dd> [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
131
+ <dd> Won't hit pro or gly. Hits acids and conjugate bases.
132
+ <p><dt> Amino Acid
133
+ <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
134
+ <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
135
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i
136
+ n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([
137
+ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}</p></dl><br>
138
+ <h3> amino acid side chains</h3><dl>
139
+ <p><dt> Alanine side chain
140
+ <dd> [CH3X4]
141
+
142
+ <p><dt> Arginine side chain.
143
+ <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
144
+ <dd> Hits acid and conjugate base.
145
+
146
+ <p><dt> Aspargine side chain.
147
+ <dd> [CH2X4][CX3](=[OX1])[NX3H2]
148
+ <dd> Also hits Gln side chain when used alone.
149
+
150
+ <p><dt> Aspartate (or Aspartic acid) side chain.
151
+ <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
152
+ <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
153
+
154
+ <p><dt> Cysteine side chain.
155
+ <dd> [CH2X4][SX2H,SX1H0-]
156
+ <dd> Hits acid and conjugate base
157
+
158
+ <p><dt> Glutamate (or Glutamic acid) side chain.
159
+ <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
160
+ <dd> Hits acid and conjugate base.
161
+
162
+ <p><dt> Glycine
163
+ <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
164
+ <p><dt> Histidine side chain.
165
+ <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
166
+ <dd> Hits acid &amp; conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
167
+ 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
168
+
169
+ <p><dt> Isoleucine side chain
170
+ <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
171
+
172
+ <p><dt> Leucine side chain
173
+ <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
174
+
175
+ <p><dt> Lysine side chain.
176
+ <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
177
+ <dd> Acid and conjugate base
178
+
179
+ <p><dt> Methionine side chain
180
+ <dd> [CH2X4][CH2X4][SX2][CH3X4]
181
+
182
+ <p><dt> Phenylalanine side chain
183
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
184
+
185
+ <p><dt> Proline
186
+ <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
187
+
188
+ <p><dt> Serine side chain
189
+ <dd> [CH2X4][OX2H]
190
+
191
+ <p><dt> Thioamide
192
+ <dd> [NX3][CX3]=[SX1]
193
+
194
+ <p><dt> Threonine side chain
195
+ <dd> [CHX4]([CH3X4])[OX2H]
196
+
197
+ <p><dt> Tryptophan side chain
198
+ <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
199
+
200
+ <p><dt> Tyrosine side chain.
201
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
202
+ <dd> Acid and conjugate base
203
+
204
+ <p><dt> Valine side chain
205
+ <dd> [CHX4]([CH3X4])[CH3X4]
206
+
207
+ <p><dt> Alanine side chain
208
+ <dd> [CH3X4]
209
+
210
+ <p><dt> Arginine side chain.
211
+ <dd> [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
212
+ <dd> Hits acid and conjugate base.
213
+
214
+ <p><dt> Aspargine side chain.
215
+ <dd> [CH2X4][CX3](=[OX1])[NX3H2]
216
+ <dd> Also hits Gln side chain when used alone.
217
+
218
+ <p><dt> Aspartate (or Aspartic acid) side chain.
219
+ <dd> [CH2X4][CX3](=[OX1])[OH0-,OH]
220
+ <dd> Hits acid and conjugate base. Also hits Glu side chain when used alone.
221
+
222
+ <p><dt> Cysteine side chain.
223
+ <dd> [CH2X4][SX2H,SX1H0-]
224
+ <dd> Hits acid and conjugate base
225
+
226
+ <p><dt> Glutamate (or Glutamic acid) side chain.
227
+ <dd> [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
228
+ <dd> Hits acid and conjugate base.
229
+
230
+ <p><dt> Glycine
231
+ <dd> N[CX4H2][CX3](=[OX1])[O,N]
232
+
233
+ <p><dt> Histidine side chain.
234
+ <dd> [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:<br>[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
235
+ <dd> Hits acid &amp; conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral
236
+ 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
237
+
238
+ <p><dt> Isoleucine side chain
239
+ <dd> [CHX4]([CH3X4])[CH2X4][CH3X4]
240
+
241
+ <p><dt> Leucine side chain
242
+ <dd> [CH2X4][CHX4]([CH3X4])[CH3X4]
243
+
244
+ <p><dt> Lysine side chain.
245
+ <dd> [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
246
+ <dd> Acid and conjugate base
247
+
248
+ <p><dt> Methionine side chain
249
+ <dd> [CH2X4][CH2X4][SX2][CH3X4]
250
+
251
+ <p><dt> Phenylalanine side chain
252
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
253
+
254
+ <p><dt> Proline
255
+ <dd> N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
256
+
257
+ <p><dt> Serine side chain
258
+ <dd> [CH2X4][OX2H]
259
+
260
+ <p><dt> Threonine side chain
261
+ <dd> [CHX4]([CH3X4])[OX2H]
262
+
263
+ <p><dt> Tryptophan side chain
264
+ <dd> [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
265
+
266
+ <p><dt> Tyrosine side chain.
267
+ <dd> [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
268
+ <dd> Acid and conjugate base
269
+
270
+ <p><dt> Valine side chain
271
+ <dd> [CHX4]([CH3X4])[CH3X4]</p></dl><br>
272
+
273
+ <h3> azide (-azido) </h3><dl>
274
+
275
+ <p><dt> Azide group.
276
+ <dd> [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
277
+ <dd> Hits any atom with an attached azide.
278
+
279
+ <p><dt> Azide ion.
280
+ <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
281
+ <dd> Hits N in azide ion</p></dl><br>
282
+
283
+ <h3> azo </h3><dl>
284
+
285
+ <p><dt> Nitrogen.
286
+ <dd> [#7]
287
+ <dd> Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
288
+
289
+ <p><dt> Azo Nitrogen. Low specificity.
290
+ <dd> [NX2]=N
291
+ <dd> Hits diazene, azoxy and some diazo structures
292
+
293
+ <p><dt> Azo Nitrogen.diazene
294
+ <dd> [NX2]=[NX2]
295
+ <dd> (diaza alkene)
296
+
297
+ <p><dt> Azoxy Nitrogen.
298
+ <dd> [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
299
+
300
+ <p><dt> Diazo Nitrogen
301
+ <dd> [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
302
+
303
+ <p><dt> Azole.
304
+ <dd> [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
305
+ <dd> 5 member aromatic heterocycle w/ 2double bonds. contains N &amp; another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace
306
+ CH o' furfuran, thiophene, pyrrol w/ N)</p></dl><br>
307
+
308
+ <h3> hydrazine</h3><dl>
309
+
310
+ <p><dt> Hydrazine H2NNH2
311
+ <dd> [NX3][NX3]</p></dl><br>
312
+
313
+ <h3> hydrazone </h3><dl>
314
+
315
+ <p><dt> Hydrazone C=NNH2
316
+ <dd> [NX3][NX2]=[*]</p></dl><br>
317
+
318
+ <h3> imine </h3><dl>
319
+
320
+ <p><dt> Substituted imine
321
+ <dd> [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
322
+ <dd> Schiff base
323
+
324
+ <p><dt> Substituted or un-substituted imine
325
+ <dd> [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
326
+
327
+ <p><dt> Iminium
328
+ <dd> [NX3+]=[CX3]</p></dl><br>
329
+
330
+ <h3> imide </h3><dl>
331
+
332
+ <p><dt> Unsubstituted dicarboximide
333
+ <dd> [CX3](=[OX1])[NX3H][CX3](=[OX1])
334
+
335
+ <p><dt> Substituted dicarboximide
336
+ <dd> [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
337
+
338
+ <p><dt> Dicarboxdiimide
339
+ <dd> [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])</p></dl><br>
340
+
341
+ <h3> nitrate </h3><dl>
342
+
343
+ <p><dt> Nitrate group
344
+ <dd> [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
345
+ <dd> Also hits nitrate anion
346
+
347
+ <p><dt> Nitrate Anion
348
+ <dd> [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]</p></dl><br>
349
+
350
+ <h3> nitrile </h3><dl>
351
+
352
+ <p><dt> Nitrile
353
+ <dd> [NX1]#[CX2]
354
+
355
+ <p><dt> Isonitrile
356
+ <dd> [CX1-]#[NX2+]</p></dl><br>
357
+
358
+ <h3> nitro </h3><dl>
359
+
360
+ <p><dt> Nitro group.
361
+ <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
362
+
363
+ <p><dt> Two Nitro groups
364
+ <dd> [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]</p></dl><br>
365
+
366
+ <h3> nitroso </h3><dl>
367
+
368
+ <p><dt> Nitroso-group
369
+ <dd> [NX2]=[OX1]</p></dl><br>
370
+
371
+ <h3> n-oxide </h3><dl>
372
+
373
+ <p><dt> N-Oxide
374
+ <dd> [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
375
+ <dd> Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.</p></dl><br>
376
+
377
+
378
+ <a NAME="O"></a><h2>O</h2>
379
+
380
+
381
+ <h3> hydroxyl (includes alcohol, phenol) </h3><dl>
382
+
383
+ <p><dt> Hydroxyl
384
+ <dd> [OX2H]
385
+
386
+ <p><dt> Hydroxyl in Alcohol
387
+ <dd> [#6][OX2H]
388
+
389
+ <p><dt> Hydroxyl in Carboxylic Acid
390
+ <dd> [OX2H][CX3]=[OX1]
391
+
392
+ <p><dt> Hydroxyl in H-O-P-
393
+ <dd> [OX2H]P
394
+
395
+ <p><dt> Enol
396
+ <dd> [OX2H][#6X3]=[#6]
397
+
398
+ <p><dt> Phenol
399
+ <dd> [OX2H][cX3]:[c]
400
+
401
+ <p><dt> Enol or Phenol
402
+ <dd> [OX2H][$(C=C),$(cc)]
403
+
404
+ <p><dt> Hydroxyl_acidic
405
+ <dd> [$([OH]-*=[!#6])]
406
+ <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous,
407
+ halogen and nitrogen oxyacids.</p></dl><br>
408
+
409
+ <h3> peroxide </h3><dl>
410
+
411
+ <p><dt> Peroxide groups.
412
+ <dd> [OX2,OX1-][OX2,OX1-]
413
+ <dd> Also hits anions.</p></dl><br>
414
+
415
+
416
+ <a NAME="P"></a><h2>P</h2>
417
+
418
+
419
+ <h3> phosphoric compounds </h3><dl>
420
+
421
+ <p><dt> Phosphoric_acid groups.
422
+ <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
423
+ <dd> Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride
424
+ esters (including acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid
425
+ and longer, di- esters on linear triphosphoric acid and longer).
426
+
427
+ <p><dt> Phosphoric_ester groups.
428
+ <dd> [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
429
+ <dd> Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.</p></dl><br>
430
+
431
+ <a NAME="S"></a><h2>S</h2>
432
+
433
+
434
+ <h3>thio groups ( thio-, thi-, sulpho-, mercapto- )</h3><dl>
435
+
436
+
437
+ <p><dt> Carbo-Thiocarboxylate
438
+ <dd> [S-][CX3](=S)[#6]
439
+
440
+ <p><dt> Carbo-Thioester
441
+ <dd> S([#6])[CX3](=O)[#6]
442
+
443
+ <p><dt> Thio analog of carbonyl
444
+ <dd> [#6X3](=[SX1])([!N])[!N]
445
+ <dd> Where S replaces O. Not a thioamide.
446
+
447
+ <p><dt> Thiol, Sulfide or Disulfide Sulfur
448
+ <dd> [SX2]
449
+
450
+ <p><dt> Thiol
451
+ <dd> [#16X2H]
452
+
453
+ <p><dt> Sulfur with at-least one hydrogen.
454
+ <dd> [#16!H0]
455
+
456
+ <p><dt> Thioamide
457
+ <dd> [NX3][CX3]=[SX1]</p></dl><br>
458
+
459
+ <h3>sulfide</h3><dl>
460
+
461
+ <p><dt> Sulfide
462
+ <dd> [#16X2H0]
463
+ <dd> -alkylthio Won't hit thiols. Hits disulfides.
464
+
465
+ <p><dt> Mono-sulfide
466
+ <dd> [#16X2H0][!#16]
467
+ <dd> alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
468
+
469
+ <p><dt> Di-sulfide
470
+ <dd> [#16X2H0][#16X2H0]
471
+ <dd> Won't hit thiols. Won't hit mono-sulfides.
472
+
473
+ <p><dt> Two Sulfides
474
+ <dd> [#16X2H0][!#16].[#16X2H0][!#16]
475
+ <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.</p></dl><br>
476
+
477
+ <h3>sulfinate</h3><dl>
478
+
479
+ <p><dt> Sulfinate
480
+ <dd> [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
481
+ <dd> Won't hit Sulfinic Acid. Hits Both Depiction Forms.
482
+
483
+ <p><dt> Sulfinic Acid
484
+ <dd> [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
485
+ <dd> Won't hit substituted Sulfinates. Hits Both Depiction Forms.
486
+ Hits acid and conjugate base (sulfinate).</p></dl><br>
487
+
488
+ <h3>sulfone</h3><dl>
489
+
490
+ <p><dt> Sulfone. Low specificity.
491
+ <dd> [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
492
+ <dd> Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- &amp; di- esters, sulfamic
493
+ acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
494
+
495
+ <p><dt> Sulfone. High specificity.
496
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
497
+ <dd> Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
498
+
499
+ <p><dt> Sulfonic acid. High specificity.
500
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
501
+ <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules).
502
+ Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
503
+
504
+ <p><dt> Sulfonate
505
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
506
+ <dd> (sulfonic ester) Only hits carbon-substituted sulfur
507
+ (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
508
+
509
+ <p><dt> Sulfonamide.
510
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
511
+ <dd> Only hits carbo- sulfonamide. Hits Both Depiction Forms.
512
+
513
+ <p><dt> Carbo-azosulfone
514
+ <dd> [SX4](C)(C)(=O)=N
515
+ <dd> Partial N-Analog of Sulfone
516
+
517
+ <p><dt> Sulfonamide
518
+ <dd> [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
519
+ <dd> (sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.</p></dl><br>
520
+
521
+ <h3>sulfoxide</h3><dl>
522
+
523
+ <p><dt> Sulfoxide Low specificity.
524
+ <dd> [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
525
+ <dd> ( sulfinyl, thionyl ) Analog of carbonyl where S replaces C.
526
+ Hits all sulfoxides, including heteroatom-substituted sulfoxides,
527
+ dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids...
528
+ Hits Both Depiction Forms. Won't hit sulfones.
529
+
530
+ <p><dt> Sulfoxide High specificity
531
+ <dd> [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
532
+ <dd> (sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides
533
+ (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.</p></dl><br>
534
+
535
+ <h3>sulfate</h3><dl>
536
+
537
+ <p><dt> Sulfate
538
+ <dd> [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
539
+ <dd> (sulfuric acid monoester) Only hits when oxygen is carbon-substituted.
540
+ Hits acid and conjugate base. Hits Both Depiction Forms.
541
+
542
+ <p><dt> Sulfuric acid ester (sulfate ester) Low specificity.
543
+ <dd> [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
544
+ <dd> Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates).
545
+ Hits acid and conjugate base. Hits Both Depiction Forms.
546
+ <p><dt> Sulfuric Acid Diester.
547
+ <dd> [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
548
+ <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.</p></dl><br>
549
+
550
+ <h3>sulfamate</h3><dl>
551
+
552
+ <p><dt> Sulfamate.
553
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
554
+ <dd> Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
555
+
556
+ <p><dt> Sulfamic Acid.
557
+ <dd> [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
558
+ <dd> Hits acid and conjugate base. Hits Both Depiction Forms.</p></dl><br>
559
+
560
+ <h3>sulfene</h3><dl>
561
+
562
+ <p><dt> Sulfenic acid.
563
+ <dd> [#16X2][OX2H,OX1H0-]
564
+ <dd> Hits acid and conjugate base.
565
+
566
+ <p><dt> Sulfenate.
567
+ <dd> [#16X2][OX2H0]</p></dl><br>
568
+
569
+
570
+ <a NAME="X"></a><h2>X</h2>
571
+
572
+
573
+ <h3> halide (-halo -fluoro -chloro -bromo -iodo) </h3><dl>
574
+
575
+ <p><dt> Any carbon attached to any halogen
576
+ <dd> [#6][F,Cl,Br,I]
577
+
578
+ <p><dt> Halogen
579
+ <dd> [F,Cl,Br,I]
580
+
581
+ <p><dt> Three_halides groups
582
+ <dd> [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
583
+ <dd> Hits SMILES that have three halides.</p></dl><br>
584
+
585
+ <h3> acyl halide </h3><dl>
586
+
587
+ <p><dt> Acyl Halide
588
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
589
+ <dd> (acid halide, -oyl halide)</p></dl><br>
590
+
591
+
592
+ <a NAME="STRUCTUAL"></a>
593
+ <H2>
594
+ 3. Gross Structual Features
595
+ </H2><br><br>
596
+
597
+
598
+
599
+ <a NAME="CHIRALITY"></a><h2>Chirality</h2>
600
+ <dl>
601
+ <p><dt> Specified chiral carbon.
602
+ <dd> [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
603
+ <dd> Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b
604
+ ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i
605
+ mplicit H).
606
+
607
+ <p><dt> "No-conflict" chiral match
608
+ <dd> C[C@?](F)(Cl)Br
609
+ <dd> Will match molecules with chiralities as specified or unspecified.
610
+
611
+ <p><dt> "No-conflict" chiral match where an H is present
612
+ <dd> C[C@?H](Cl)Br
613
+ <dd> Will match molecules with chiralities as specified or unspecified.</p></dl><br>
614
+
615
+ <a NAME="ORBITAL"></a><h2>Orbital Configuration</h2>
616
+
617
+ <dl>
618
+ <p><dt> sp2 cationic carbon
619
+ <dd> [$([cX2+](:*):*)]
620
+ <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
621
+
622
+ <p><dt> Aromatic sp2 carbon.
623
+ <dd> [$([cX3](:*):*),$([cX2+](:*):*)]
624
+ <dd> The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with
625
+ a free electron in a non-bonding sp2 hybrid orbital)
626
+
627
+ <p><dt> Any sp2 carbon.
628
+ <dd> [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
629
+ <dd> The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca
630
+ rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons (
631
+ alkenes). The fourth case matches non-aromatic cationic alkene carbons.
632
+
633
+ <p><dt> Any sp2 nitrogen.
634
+ <dd> [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
635
+
636
+ <dd> Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free
637
+ pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair
638
+ of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist
639
+ in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic
640
+ 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are
641
+ specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen),
642
+ rspectively.
643
+
644
+ <p><dt> Explicit Hydrogen on sp2-Nitrogen
645
+ <dd> [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
646
+ <dd> (H must be an isotope or ion)
647
+
648
+ <p><dt> sp3 nitrogen
649
+ <dd> [$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)]
650
+ <dd> One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
651
+
652
+ <p><dt> Explicit Hydrogen on an sp3 N.
653
+ <dd> [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&amp;!$(*:*)])]
654
+ <dd> One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
655
+
656
+ <p><dt> sp2 N in N-Oxide
657
+ <dd> [$([$([NX3]=O),$([NX3+][O-])])]
658
+
659
+ <p><dt> sp3 N in N-Oxide Exclusive:
660
+ <dd> [$([$([NX4]=O),$([NX4+][O-])])]
661
+ <dd> Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
662
+
663
+ <p><dt> sp3 N in N-Oxide Inclusive:
664
+ <dd> [$([$([NX4]=O),$([NX4+][O-,#0])])]
665
+ <dd> Hits if O could be present. Hits if * if used in place of O in smiles.</p></dl><br>
666
+
667
+
668
+ <a NAME="CONNECT"></a><h2>Connectivity</h2>
669
+
670
+ <dl>
671
+ <p><dt> Quaternary Nitrogen
672
+ <dd> [$([NX4+]),$([NX4]=*)]
673
+ <dd> Hits non-aromatic Ns.
674
+ <p><dt> Tricoordinate S double bonded to N.
675
+ <dd> [$([SX3]=N)]
676
+
677
+ <p><dt> S double-bonded to Carbon
678
+ <dd> [$([SX1]=[#6])]
679
+ <dd> Hits terminal (1-connected S)
680
+
681
+ <p><dt> Triply bonded N
682
+ <dd> [$([NX1]#*)]
683
+
684
+ <p><dt> Divalent Oxygen
685
+ <dd> [$([OX2])]</p></dl><br>
686
+
687
+
688
+ <a NAME="CHAIN"></a><h2>Chains &amp; Branching </h2>
689
+
690
+ <dl>
691
+ <p><dt> Unbranched_alkane groups.
692
+ <dd> [R0;D2][R0;D2][R0;D2][R0;D2]
693
+ <dd> Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
694
+ (e.g. halide substituted chains count as branched).
695
+
696
+ <p><dt> Unbranched_chain groups.
697
+ <dd> [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
698
+ <dd> Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches
699
+ (e.g. halide substituted chains count as branched).
700
+
701
+ <p><dt> Long_chain groups.
702
+ <dd> [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
703
+ <dd> Aliphatic chains at-least 8 members long.
704
+
705
+ <p><dt> Atom_fragment
706
+ <dd> [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
707
+ <dd> (CLOGP definition) A fragment atom is a not an isolating carbon
708
+
709
+ <p><dt> Carbon_isolating
710
+ <dd> [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
711
+ <dd> This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati
712
+ c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
713
+
714
+ <p><dt> Terminal S bonded to P
715
+ <dd> [$([SX1]~P)]
716
+
717
+ <p><dt> Nitrogen on -N-C=N-
718
+ <dd> [$([NX3]C=N)]
719
+
720
+ <p><dt> Nitrogen on -N-N=C-
721
+ <dd> [$([NX3]N=C)]
722
+
723
+ <p><dt> Nitrogen on -N-N=N-
724
+ <dd> [$([NX3]N=N)]
725
+
726
+ <p><dt> Oxygen in -O-C=N-
727
+ <dd> [$([OX2]C=N)] </p></dl><br>
728
+
729
+
730
+ <a NAME="ROTATE"></a><h2>Rotation</h2>
731
+
732
+ <dl>
733
+ <p><dt> Rotatable bond
734
+ <dd> [!$(*#*)&amp;!D1]-!@[!$(*#*)&amp;!D1]
735
+ <dd> An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note
736
+ that logical operators can be applied to bonds ("-&amp;!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site
737
+ and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to
738
+ use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo
739
+ be considered rotatable without this specification.</p></dl><br>
740
+
741
+
742
+ <a NAME="CYCLE"></a><h2>Cyclic Features</h2>
743
+
744
+ <dl>
745
+ <p><dt> Bicyclic
746
+ <dd> [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
747
+ <dd> Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
748
+
749
+ <p><dt> Ortho
750
+ <dd> *-!:aa-!:*
751
+ <dd> Ortho-substituted ring
752
+
753
+ <p><dt> Meta
754
+ <dd> *-!:aaa-!:*
755
+ <dd> Meta-substituted ring
756
+
757
+ <p><dt> Para
758
+ <dd> *-!:aaaa-!:*
759
+ <dd> Para-substituted ring
760
+
761
+ <p><dt> Acylic-bonds
762
+ <dd> *!@*
763
+
764
+ <p><dt> Single bond and not in a ring
765
+ <dd> *-!@*
766
+
767
+ <p><dt> Non-ring atom
768
+ <dd> [R0] or [!R]
769
+
770
+ <p><dt> Macrocycle groups.
771
+ <dd> [r;!r3;!r4;!r5;!r6;!r7]
772
+
773
+ <p><dt> S in aromatic 5-ring with lone pair
774
+ <dd> [sX2r5]
775
+
776
+ <p><dt> Aromatic 5-Ring O with Lone Pair
777
+ <dd> [oX2r5]
778
+
779
+ <p><dt> N in 5-sided aromatic ring
780
+ <dd> [nX2r5]
781
+
782
+ <p><dt> Spiro-ring center
783
+ <dd> [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
784
+
785
+ <p><dt> N in 5-ring arom
786
+ <dd> [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
787
+
788
+ <p><dt> CIS or TRANS double bond in a ring
789
+ <dd> */,\[R]=;@[R]/,\*
790
+ <dd> An isomeric SMARTS consisting of four atoms and three bonds.
791
+
792
+ <p><dt> CIS or TRANS double or aromatic bond in a ring
793
+ <dd> */,\[R]=,:;@[R]/,\*
794
+
795
+ <p><dt> Unfused benzene ring
796
+ <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1
797
+ <dd> To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
798
+
799
+ <p><dt> Multiple non-fused benzene rings
800
+ <dd> [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
801
+
802
+ <p><dt> Fused benzene rings
803
+ <dd> c12ccccc1cccc2</p></dl><br>
804
+
805
+
806
+ <a NAME="META"></a>
807
+ <H2>
808
+ 4. Meta-SMARTS
809
+ </H2><br><br>
810
+
811
+
812
+ <a NAME="AA"></a><h2>Amino Acids</h2>
813
+
814
+ <dl>
815
+ <p><dt> Generic amino acid: low specificity.
816
+ <dd> [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
817
+ <dd> For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues
818
+ w/in polypeptides (internal, or terminal).
819
+
820
+ <p><dt> A.A. Template for 20 standard a.a.s
821
+ <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
822
+
823
+ <dd> Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases.
824
+ Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
825
+
826
+ <p><dt> Proline
827
+ <dd> [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
828
+
829
+ <p><dt> Glycine
830
+ <dd> [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
831
+
832
+ <p><dt> Other a.a.
833
+ <dd> [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
834
+ <dd> Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline
835
+ or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i
836
+ polypeptides (internal, or terminal).<br>
837
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usage:<br>
838
+ &nbsp;&nbsp;&nbsp;&nbsp;Alanine side chain is [CH3X4] <br>
839
+ &nbsp;&nbsp;&nbsp;&nbsp;Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
840
+
841
+ <p><dt> 18_standard_aa_side_chains.
842
+ <dd> ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),<br>
843
+ $([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
844
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
845
+ $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
846
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
847
+ $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
848
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
849
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
850
+ $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
851
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
852
+ <dd>Can be any of the standard 18 (Pro &amp; Gly are treated separately) Hits acids and conjugate bases.
853
+
854
+ <p><dt> N in Any_standard_amino_acid.
855
+ <dd> [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]<br>
856
+ (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]<br>
857
+ (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),<br>
858
+ $([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$<br>
859
+ ([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
860
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
861
+ $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
862
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),<br>
863
+ $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
864
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
865
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),<br>
866
+ $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
867
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
868
+ $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
869
+ <dd> Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen
870
+ eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e
871
+ . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern
872
+ al, or terminal).
873
+
874
+ <p><dt> Non-standard amino acid.
875
+ <dd> [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),<br>
876
+ $([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),<br>
877
+ $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),<br>
878
+ $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]<br>
879
+ (=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),<br>
880
+ $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:<br>
881
+ [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:<br>
882
+ [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),<br>
883
+ $([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),<br>
884
+ $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),<br>
885
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),<br>
886
+ $([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),<br>
887
+ $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),<br>
888
+ $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
889
+ <dd> Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are
890
+ non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.);
891
+ !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).</p></dl><br>
892
+
893
+
894
+ <a NAME="RECUR"></a><h2>Recursive or Multiple </h2>
895
+
896
+ <h3> Recursive SMARTS: Atoms connected to particular SMARTS</h3><dl>
897
+
898
+ <p><dt> Ortho
899
+ <dd>[SMARTS_expression]-!:aa-!:[SMARTS_expression]
900
+
901
+ <p><dt> Meta
902
+ <dd> [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
903
+
904
+ <p><dt> Para
905
+ <dd> [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
906
+
907
+ <p><dt> Hydrogen
908
+ <dd> [$([#1][SMARTS_expression])]
909
+ <dd> Hydrogen must be explicit i.e. an isotope or charged
910
+
911
+ <p><dt> Nitrogen
912
+ <dd> [$([#7][SMARTS_expression])]
913
+
914
+ <p><dt> Oxygen
915
+ <dd> [$([#8][SMARTS_expression])]
916
+
917
+ <p><dt> Fluorine
918
+ <dd> [$([#9][SMARTS_expression])]</p></dl><br>
919
+
920
+ <h3> Recursive SMARTS: Multiple groups</h3><dl>
921
+
922
+ <p><dt> Two possible groups
923
+ <dd> [$(SMARTS_expression_A),$(SMARTS_expression_B)]
924
+ <dd> Hits atoms in either environment or group of interest, A or B.<br>
925
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
926
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]<br>
927
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]<br>
928
+ &nbsp;&nbsp;&nbsp;&nbsp;Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$(
929
+ [NX1]#[NX2+]-[NX1-2])])]
930
+
931
+ <p><dt> Recursive SMARTS
932
+ <dd> [$([atom_that_gets_hit][other_atom][other_atom])]
933
+ <dd> Hits first atom within parenthesis
934
+ &nbsp;&nbsp;&nbsp;&nbsp;Example usages:<br>
935
+ &nbsp;&nbsp;&nbsp;&nbsp;[$([CX3]=[OX1])] hits Carbonyl Carbon
936
+ &nbsp;&nbsp;&nbsp;&nbsp;[$([OX1]=[CX3])] hits Carbonyl Oxygen </p></dl><br>
937
+
938
+ <h3> Single only, Double only, Single or Double</h3><dl>
939
+
940
+ <p><dt> Sulfide
941
+ <dd> [#16X2H0]
942
+ <dd> (-alkylthio) Won't hit thiols. Hits disulfides too.
943
+
944
+ <p><dt> Mono-sulfide
945
+ <dd> [#16X2H0][!#16]
946
+ <dd> (alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
947
+
948
+ <p><dt> Di-sulfide
949
+ <dd> [#16X2H0][#16X2H0]
950
+ <dd> Won't hit thiols. Won't hit mono-sulfides.
951
+
952
+ <p><dt> Two sulfides
953
+ <dd> [#16X2H0][!#16].[#16X2H0][!#16]
954
+
955
+ <dd> Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
956
+
957
+ <p><dt> Acid/conj-base
958
+ <dd> [OX2H,OX1H0-]
959
+ <dd> Hits acid and conjugate base. acid/base
960
+
961
+ <p><dt> Non-acid Oxygen
962
+ <dd> [OX2H0]
963
+
964
+ <p><dt> Acid/base
965
+ <dd> [H1,H0-]
966
+ <dd> Works for any atom if base form has no Hs &amp; acid has only one.</p></dl><br>
967
+
968
+ <h3> Muntiple Disconnected Groups</h3><dl>
969
+
970
+ <p><dt> Two disconnected SMARTS fragments
971
+ <dd> ([Cl!$(Cl~c)].[c!$(c~Cl)])
972
+ <dd> A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B
973
+ oth SMARTS fragments must be in the same SMILES target fragment.
974
+
975
+ <p><dt> Two disconnected SMARTS fragments
976
+ <dd> ([Cl]).([c])
977
+ <dd> Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
978
+
979
+ <p><dt> Two not-necessarily connected SMARTS fragments
980
+ <dd> ([Cl].[c])
981
+ <dd> Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
982
+
983
+ <p><dt> Two not-necessarily connected fragments
984
+ <dd> ([SMARTS_expression]).([SMARTS_expression])
985
+ <dd> Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
986
+
987
+ <p><dt> Two primary or secondary amines
988
+ <dd> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
989
+ <dd> Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.</p></dl><br>
990
+
991
+
992
+ <a NAME="TOOL"></a><h2>Tools &amp;Tricks</h2>
993
+
994
+ <h3> Alternative/Equivalent Representations </h3><dl>
995
+
996
+ <p><dt> Any carbon aromatic or non-aromatic
997
+ <dd> [#6] or [c,C]
998
+
999
+ <p><dt> SMILES wildcard
1000
+ <dd> [#0]
1001
+ <dd> This SMARTS hits the SMILES *
1002
+
1003
+ <p><dt> Factoring
1004
+ <dd> [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
1005
+ <dd> Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
1006
+
1007
+ <p><dt> High-precidence "and"
1008
+ <dd> [N&amp;X4&amp;+,N&amp;X3&amp;+0] or [NX4+,NX3+0]
1009
+ <dd> High-precidence "and" (&amp;) is the default logical operator. "Or" (,) is higher precidence than &amp; and low-precidence "and" (;)
1010
+ is lower precidence than &amp;. </p></dl><br>
1011
+
1012
+ <h3> Hydrogens </h3><dl>
1013
+
1014
+ <p><dt> Any atom w/ at-least 1 H
1015
+ <dd> [*!H0,#1]
1016
+ <dd> In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi
1017
+ dered a property of an atom. This SMARTS provides a way to effectively hit Hs themselves.
1018
+
1019
+ <p><dt> Hs on Carbons
1020
+ <dd> [#6!H0,#1]
1021
+
1022
+ <p><dt> Atoms w/ 1 H
1023
+ <dd> [H,#1] </p></dl><br>
1024
+
1025
+
1026
+ <a NAME="E-"></a>
1027
+ <H2>
1028
+ 5. Electron &amp; Proton Features
1029
+ </H2><br><br>
1030
+
1031
+
1032
+ <a NAME="ACID"></a><h2> Acids &amp; Bases </h2>
1033
+
1034
+ <dl>
1035
+ <p><dt> Acid
1036
+ <dd> [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
1037
+ <dd> Proton donor
1038
+
1039
+ <p><dt> Carboxylic acid
1040
+ <dd> [CX3](=O)[OX2H1]
1041
+ <dd> (-oic acid, COOH)
1042
+
1043
+ <p><dt> Carboxylic acid or conjugate base.
1044
+ <dd> [CX3](=O)[OX1H0-,OX2H1]
1045
+
1046
+ <p><dt> Hydroxyl_acidic
1047
+ <dd> [$([OH]-*=[!#6])]
1048
+ <dd> An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho
1049
+ sphorous, halogen and nitrogen oxyacids
1050
+
1051
+ <p><dt> Phosphoric_Acid
1052
+ <dd> [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
1053
+ <dd> Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in
1054
+ cluding acidic mono- &amp; di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe
1055
+ r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
1056
+
1057
+ <p><dt> Sulfonic Acid. High specificity.
1058
+ <dd> [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
1059
+ <dd> Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo
1060
+ rms. Hits Arene sulfonic acids.
1061
+
1062
+ <p><dt> Acyl Halide
1063
+ <dd> [CX3](=[OX1])[F,Cl,Br,I]
1064
+ <dd> (acid halide, -oyl halide)</p></dl><br>
1065
+
1066
+
1067
+ <a NAME="CHARGE"></a><h2>Charge </h2>
1068
+
1069
+ <dl>
1070
+ <p><dt> Anionic divalent Nitrogen
1071
+ <dd> [NX2-]
1072
+
1073
+ <p><dt> Oxenium Oxygen
1074
+ <dd> [OX2H+]=*
1075
+
1076
+ <p><dt> Oxonium Oxygen
1077
+ <dd> [OX3H2+]
1078
+
1079
+ <p><dt> Carbocation
1080
+ <dd> [#6+]
1081
+
1082
+ <p><dt> sp2 cationic carbon.
1083
+ <dd> [$([cX2+](:*):*)]
1084
+ <dd> Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
1085
+
1086
+ <p><dt> Azide ion.
1087
+ <dd> [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
1088
+ <dd> Hits N in azide ion
1089
+
1090
+ <p><dt> Zwitterion High Specificity
1091
+ <dd> [+1]~*~*~[-1]
1092
+ <dd> +1 charged atom separated by any 3 bonds from a -1 charged atom.
1093
+
1094
+ <p><dt> Zwitterion Low Specificity, Crude
1095
+ <dd>[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
1096
+ <dd> Variously charged moieties separated by up to ten bonds.
1097
+
1098
+ <p><dt> Zwitterion Low Specificity
1099
+ <dd> ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
1100
+ <dd> Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.</p></dl>
1101
+ <br>
1102
+
1103
+
1104
+ <a NAME="H_BOND"></a><h2> H-bond Donors &amp; Acceptors</h2>
1105
+
1106
+ <dl>
1107
+ <p><dt> Hydrogen-bond acceptor
1108
+ <dd> [#6,#7;R0]=[#8]
1109
+ <dd> Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge
1110
+ n.
1111
+
1112
+ <p><dt> Hydrogen-bond acceptor
1113
+ <dd> [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
1114
+ <dd> A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are
1115
+ halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl
1116
+ y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
1117
+
1118
+ <p><dt> Hydrogen-bond donor.
1119
+ <dd> [!$([#6,H0,-,-2,-3])]
1120
+ <dd> A H-bond donor is a non-negatively charged heteroatom with at least one H
1121
+
1122
+ <p><dt> Hydrogen-bond donor.
1123
+ <dd> [!H0;#7,#8,#9]
1124
+ <dd> Must have an N-H bond, an O-H bond, or a F-H bond
1125
+
1126
+ <p><dt> Possible intramolecular H-bond
1127
+ <dd> [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
1128
+ <dd> Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid
1129
+ nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.</p></dl><br>
1130
+
1131
+ <a NAME="RAD"></a><h2>Radicals </h2>
1132
+
1133
+ <dl>
1134
+ <p><dt> Carbon Free-Radical
1135
+ <dd> [#6;X3v3+0]
1136
+ <dd> Hits a neutral carbon with three single bonds.
1137
+
1138
+ <p><dt> Nitrogen Free-Radical
1139
+ <dd> [#7;X2v4+0]
1140
+ <dd> Hits a neutral nitrogen with two single bonds or with a single and a triple bond. </p></dl><br>
1141
+
1142
+
1143
+ <a NAME="BREAK"></a>
1144
+ <H2>
1145
+ 6. Breakdown of Complex SMARTS
1146
+ </H2></center><br><br>
1147
+
1148
+
1149
+
1150
+ <a NAME="AM_AC"><h2>Amino Acid </h2></a>
1151
+
1152
+ <b>[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]</b>
1153
+
1154
+ i<pre>
1155
+ [$( Proline
1156
+ [ N:
1157
+ $([ terminal
1158
+ NX3H neutral
1159
+ , or
1160
+ NX4H2+]) + charged
1161
+ , or
1162
+ $([NX3](C)(C)(C))]1 internal
1163
+ [CX4H] C: alpha
1164
+ ([CH2][CH2][CH2]1) pro side chain
1165
+ [CX3] C: of COOH
1166
+ (=[OX1]) O: =O of COOH
1167
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1168
+ ), OR
1169
+ $( Glycine
1170
+ [ N:
1171
+ $([ terminal
1172
+ NX3H2 neutral
1173
+ , or
1174
+ NX4H3+]) + charged
1175
+ , or
1176
+ $([NX3H](C)(C)) internal
1177
+ [CX4H2] C: alpha (w/ H side chain)
1178
+ [CX3] C: of COOH
1179
+ (=[OX1]) O: =O of COOH
1180
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1181
+ ), OR
1182
+ $( Other amino acid
1183
+ [ N:
1184
+ $([ terminal
1185
+ NX3H2 neutral
1186
+ , or
1187
+ NX4H3+]) + charged
1188
+ , or
1189
+ $([NX3H](C)(C))] internal
1190
+ [CX4H] C: alpha
1191
+ ([*]) any side chain
1192
+ [CX3] C: of COOH
1193
+ (=[OX1]) O: =O of COOH
1194
+ [OX2H,OX1-,N] O: term COOH (neutral or -) or intern
1195
+ )]
1196
+ </pre>
1197
+
1198
+ <br><br>
1199
+ <a NAME="ES_AM"><h2> Ester or Amide </h2></a>
1200
+
1201
+
1202
+ <b>[#6][CX3](=O)[$([OX2H0]([#6])[#6]),$([#7])] </b>
1203
+ <pre>
1204
+ [#6] An atom that is a carbon
1205
+ [CX3] Connected to an atom that is a three-connected carbon
1206
+ (=O) Which is double bonded to an oxygen
1207
+ [ Connected to an atom
1208
+ $( That is in an environment where
1209
+ [OX2H0] An atom that is a two-connected oxygen, without hydrogens
1210
+ ([#6])[#6]) Is connected to two carbons, one of them being the carbonyl C
1211
+ , Or
1212
+ $( That is in an environment where
1213
+ [#7] An atom is a nitrogen.
1214
+ )]
1215
+ </pre>
1216
+ <br><br>
1217
+ <a NAME="EXMPL"></a>
1218
+ <H2>
1219
+ 7. Interesting Example SMARTS
1220
+ </H2>
1221
+
1222
+ <dl>
1223
+ <p><dt> Oxygen double bonded to aliphatic carbon or nitrogen, single bonded to an aromatic ring, with a
1224
+ halogen in meta position
1225
+ <dd> [#8]=[C,N]-aaa[F,Cl,Br,I]
1226
+
1227
+ <p><dt> Aliphatic carbon attached to oxygen with any bond
1228
+ <dd> C~O
1229
+
1230
+ <p><dt> Oxygen or nitrogen, with at least one hydrogen attached and not in a ring
1231
+ <dd> [O,N;!H0;R0]
1232
+
1233
+ <p><dt> Oxygen double bonded to aliphatic carbon or nitrogen
1234
+ <dd> [#8]=[C,N] or O=[C,N]
1235
+
1236
+ <p><dt> Aliphatic atom single-bonded to any carbon which isn't a trifluromethyl carbon
1237
+ <dd> A[#6;!$(C(F)(F)F)]
1238
+
1239
+ <p><dt> PCB
1240
+ <dd> [$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]-[$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]
1241
+ <dd> Polychlorinated Biphenyls. Overall SMARTS is atom-bond-atom. Note that ":" is explicit aromatic bond, and "-" is explicit single bo
1242
+ nd. On each side of the single bond, we use three nested SMARTS to represent
1243
+ the ortho, meta, and para position.
1244
+
1245
+ <p><dt> Imidazolium Nitrogen
1246
+ <dd> [nX3r5+]:c:n
1247
+
1248
+ <p><dt> 1-methyl-2-hydroxy benzene with either a Cl or H at the 5 position.
1249
+ <dd> [c;$([*Cl]),$([*H1])]1ccc(O)c(C)c1 or Cc1:c(O):c:c:[$(cCl),$([cH])]:c1
1250
+ <dd> The "H" primitive in SMARTS means "total number
1251
+ of attached hydrogens", i.e., [C] will match C in [CH4] methane, [CH3]
1252
+ methyl, [CH2] methylene, etc., [CH3] will only match methyl. This is similar
1253
+ to the use of "H" in SMILES to specify hydrogen count. The default value
1254
+ for the SMARTS "H" primitive is 1 (same as SMILES, e.g., [CH2]=[CH]-[OH]
1255
+ same as CC=O). This H-specification value includes all attached hydrogens:
1256
+ implicit and explicit (e.g., isotopic [2H]).
1257
+
1258
+ <p><dt> Nonstandard atom groups.
1259
+ <dd> [!#1;!#2;!#3;!#5;!#6;!#7;!#8;!#9;!#11;!#12;!#15;!#16;!#17;!#19;!#20;!#35;!#53]</p></dl><br>
1260
+ <h2>More Information</h2>
1261
+ <A HREF="/dayhtml/doc/theory/theory.smarts.html">Theory Manual</A><br>
1262
+ <A HREF="/dayhtml_tutorials/languages/smarts/smarts_practice.html">SMARTS Practice</A><br>
1263
+ </td>
1264
+ </tr>
1265
+ <tr>
1266
+ <td><iframe src="/iframes/footer.html" name="iframe3" width="350" height="200"
1267
+ scrolling="no" frameborder="0"></iframe></td>
1268
+ </tr>
1269
+ </table>
1270
+ </body>
1271
+ </html>
1272
+
daylight-smarts.csv DELETED
@@ -1,254 +0,0 @@
1
- Section ID,Section,Group,Rule Name,Smarts,Comment
2
- 2,Functional Groups by Element,C,alkane,[CX4],Alkyl Carbon
3
- 2,Functional Groups by Element,C,alkene (-ene),[$([CX2](=C)=C)],Allenic Carbon
4
- 2,Functional Groups by Element,C,alkene (-ene),[$([CX3]=[CX3])],Vinylic Carbon; Ethenyl carbon
5
- 2,Functional Groups by Element,C,alkyne (-yne),[$([CX2]#C)],Acetylenic Carbon
6
- 2,Functional Groups by Element,C,arene (Ar , aryl-, aromatic hydrocarbons),c,Arene
7
- 2,Functional Groups by Element,C & O,carbonyl,[CX3]=[OX1],Carbonyl group. Low specificity; Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
8
- 2,Functional Groups by Element,C & O,Carbonyl group,[$([CX3]=[OX1]),$([CX3+]-[OX1-])],Hits either resonance structure
9
- 2,Functional Groups by Element,C & O,Carbonyl with Carbon,[CX3](=[OX1])C,Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester.
10
- 2,Functional Groups by Element,C & O,Carbonyl with Nitrogen.,[OX1]=CN,Hits amide, carbamic acid/ester, poly peptide
11
- 2,Functional Groups by Element,C & O,Carbonyl with Oxygen.,[CX3](=[OX1])O,Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde or ketone.
12
- 2,Functional Groups by Element,C & O,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],acid halide, -oyl halide
13
- 2,Functional Groups by Element,C & O,Aldehyde,[CX3H1](=O)[#6],-al
14
- 2,Functional Groups by Element,C & O,Anhydride,[CX3](=[OX1])[OX2][CX3](=[OX1]),
15
- 2,Functional Groups by Element,C & O,Amide,[NX3][CX3](=[OX1])[#6],-amide
16
- 2,Functional Groups by Element,C & O,Amidinium,[NX3][CX3]=[NX3+],
17
- 2,Functional Groups by Element,C & O,Carbamate.,[NX3,NX4+][CX3](=[OX1])[OX2,OX1-],Hits carbamic esters, acids, and zwitterions
18
- 2,Functional Groups by Element,C & O,Carbamic ester,[NX3][CX3](=[OX1])[OX2H0],
19
- 2,Functional Groups by Element,C & O,Carbamic acid.,[NX3,NX4+][CX3](=[OX1])[OX2H,OX1-],Hits carbamic acids and zwitterions.
20
- 2,Functional Groups by Element,C & O,Carboxylate Ion.,[CX3](=O)[O-],Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
21
- 2,Functional Groups by Element,C & O,Carbonic Acid or Carbonic Ester,[CX3](=[OX1])(O)O,Carbonic Acid, Carbonic Ester, or combination
22
- 2,Functional Groups by Element,C & O,Carbonic Acid or Carbonic Acid-Ester,[CX3](=[OX1])([OX2])[OX2H,OX1H0-1],Hits acid and conjugate base. Won't hit carbonic acid diester
23
- 2,Functional Groups by Element,C & O,Carbonic Ester (carbonic acid diester),C[OX2][CX3](=[OX1])[OX2]C,Won't hit carbonic acid or combination carbonic acid/ester
24
- 2,Functional Groups by Element,C & O,Carboxylic acid,[CX3](=O)[OX2H1],-oic acid, COOH
25
- 2,Functional Groups by Element,C & O,Carboxylic acid or conjugate base.,[CX3](=O)[OX1H0-,OX2H1],
26
- 2,Functional Groups by Element,C & O,Cyanamide,[NX3][CX2]#[NX1],
27
- 2,Functional Groups by Element,C & O,Ester Also hits anhydrides,[#6][CX3](=O)[OX2H0][#6],won't hit formic anhydride.
28
- 2,Functional Groups by Element,C & O,Ketone,[#6][CX3](=O)[#6],-one
29
- 2,Functional Groups by Element,C & O,Ether,[OD2]([#6])[#6],Ether
30
- 2,Functional Groups by Element,H,Hydrogen Atom,[H],Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
31
- 2,Functional Groups by Element,H,Not a Hydrogen Atom,[!#1],Hits SMILES that are not hydrogen atoms.
32
- 2,Functional Groups by Element,H,Proton,[H+],Hits positively charged hydrogen atoms: [H+]
33
- 2,Functional Groups by Element,H,Mono-Hydrogenated Cation,[+H],Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
34
- 2,Functional Groups by Element,H,Not Mono-Hydrogenated,[!H] or [!H1],Hits atoms that don't have exactly one attached hydrogen.
35
- 2,Functional Groups by Element,N,amide see carbonyl,,
36
- 2,Functional Groups by Element,N,mine (-amino),[NX3;H2,H1;!$(NC=O)],Primary or secondary amine, not amide; Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that "&" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
37
- 2,Functional Groups by Element,N,Enamine,[NX3][CX3]=[CX3],
38
- 2,Functional Groups by Element,N,Primary amine, not amide.,[NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6],Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
39
- 2,Functional Groups by Element,N,Two primary or secondary amines,[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)],Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
40
- 2,Functional Groups by Element,N,Enamine or Aniline Nitrogen,[NX3][$(C=C),$(cc)],
41
- 2,Functional Groups by Element,N,Generic amino acid: low specificity.,[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N],For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
42
- 2,Functional Groups by Element,N,Dipeptide group. generic amino acid: low specificity.,[NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-],Won't hit pro or gly. Hits acids and conjugate bases.
43
- 2,Functional Groups by Element,N,Amino Acid,[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N],Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}
44
- 2,Functional Groups by Element,N,Alanine side chain,[CH3X4],
45
- 2,Functional Groups by Element,N,Arginine side chain.,[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3],Hits acid and conjugate base.
46
- 2,Functional Groups by Element,N,Aspargine side chain.,[CH2X4][CX3](=[OX1])[NX3H2],Also hits Gln side chain when used alone.
47
- 2,Functional Groups by Element,N,Aspartate (or Aspartic acid) side chain.,[CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base. Also hits Glu side chain when used alone.
48
- 2,Functional Groups by Element,N,Cysteine side chain.,[CH2X4][SX2H,SX1H0-],Hits acid and conjugate base
49
- 2,Functional Groups by Element,N,Glutamate (or Glutamic acid) side chain.,[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base.
50
- 2,Functional Groups by Element,N,Glycine,[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])],
51
- 2,Functional Groups by Element,N,Histidine side chain.,[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1,Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
52
- 2,Functional Groups by Element,N,Isoleucine side chain,[CHX4]([CH3X4])[CH2X4][CH3X4],
53
- 2,Functional Groups by Element,N,Leucine side chain,[CH2X4][CHX4]([CH3X4])[CH3X4],
54
- 2,Functional Groups by Element,N,Lysine side chain.,[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0],Acid and conjugate base
55
- 2,Functional Groups by Element,N,Methionine side chain,[CH2X4][CH2X4][SX2][CH3X4],
56
- 2,Functional Groups by Element,N,Phenylalanine side chain,[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1,
57
- 2,Functional Groups by Element,N,Proline,[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N],
58
- 2,Functional Groups by Element,N,Serine side chain,[CH2X4][OX2H],
59
- 2,Functional Groups by Element,N,Thioamide,[NX3][CX3]=[SX1],
60
- 2,Functional Groups by Element,N,Threonine side chain,[CHX4]([CH3X4])[OX2H],
61
- 2,Functional Groups by Element,N,Tryptophan side chain,[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12,
62
- 2,Functional Groups by Element,N,Tyrosine side chain.,[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1,Acid and conjugate base
63
- 2,Functional Groups by Element,N,Valine side chain,[CHX4]([CH3X4])[CH3X4],
64
- 2,Functional Groups by Element,N,Alanine side chain,[CH3X4],
65
- 2,Functional Groups by Element,N,Arginine side chain.,[CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3],Hits acid and conjugate base.
66
- 2,Functional Groups by Element,N,Aspargine side chain.,[CH2X4][CX3](=[OX1])[NX3H2],Also hits Gln side chain when used alone.
67
- 2,Functional Groups by Element,N,Aspartate (or Aspartic acid) side chain.,[CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base. Also hits Glu side chain when used alone.
68
- 2,Functional Groups by Element,N,Cysteine side chain.,[CH2X4][SX2H,SX1H0-],Hits acid and conjugate base
69
- 2,Functional Groups by Element,N,Glutamate (or Glutamic acid) side chain.,[CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH],Hits acid and conjugate base.
70
- 2,Functional Groups by Element,N,Glycine,N[CX4H2][CX3](=[OX1])[O,N],
71
- 2,Functional Groups by Element,N,Histidine side chain.,[CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1,Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
72
- 2,Functional Groups by Element,N,Isoleucine side chain,[CHX4]([CH3X4])[CH2X4][CH3X4],
73
- 2,Functional Groups by Element,N,Leucine side chain,[CH2X4][CHX4]([CH3X4])[CH3X4],
74
- 2,Functional Groups by Element,N,Lysine side chain.,[CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0],Acid and conjugate base
75
- 2,Functional Groups by Element,N,Methionine side chain,[CH2X4][CH2X4][SX2][CH3X4],
76
- 2,Functional Groups by Element,N,Phenylalanine side chain,[CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1,
77
- 2,Functional Groups by Element,N,Proline,N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N],
78
- 2,Functional Groups by Element,N,Serine side chain,[CH2X4][OX2H],
79
- 2,Functional Groups by Element,N,Threonine side chain,[CHX4]([CH3X4])[OX2H],
80
- 2,Functional Groups by Element,N,Tryptophan side chain,[CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12,
81
- 2,Functional Groups by Element,N,Tyrosine side chain.,[CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1,Acid and conjugate base
82
- 2,Functional Groups by Element,N,Valine side chain,[CHX4]([CH3X4])[CH3X4],
83
- 2,Functional Groups by Element,N,azide (-azido),Azide group.,[$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])],Hits any atom with an attached azide.
84
- 2,Functional Groups by Element,N,azide (-azido),Azide ion.,[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])],Hits N in azide ion
85
- 2,Functional Groups by Element,N,azo,Nitrogen.,[#7],Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
86
- 2,Functional Groups by Element,N,azo,Azo Nitrogen. Low specificity.,[NX2]=N,Hits diazene, azoxy and some diazo structures
87
- 2,Functional Groups by Element,N,azo,Azo Nitrogen.diazene,[NX2]=[NX2],(diaza alkene)
88
- 2,Functional Groups by Element,N,azo,Azoxy Nitrogen.,[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])],
89
- 2,Functional Groups by Element,N,azo,Diazo Nitrogen,[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])],
90
- 2,Functional Groups by Element,N,azo,Azole.,[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])],5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace CH o' furfuran, thiophene, pyrrol w/ N)
91
- 2,Functional Groups by Element,N,hydrazine,Hydrazine H2NNH2,[NX3][NX3],
92
- 2,Functional Groups by Element,N,hydrazone,Hydrazone C=NNH2,[NX3][NX2]=[*],
93
- 2,Functional Groups by Element,N,imine,Substituted imine,[CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6],Schiff base
94
- 2,Functional Groups by Element,N,imine,Substituted or un-substituted imine,[$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])],
95
- 2,Functional Groups by Element,N,imine,Iminium,[NX3+]=[CX3],
96
- 2,Functional Groups by Element,N,imide,Unsubstituted dicarboximide,[CX3](=[OX1])[NX3H][CX3](=[OX1]),
97
- 2,Functional Groups by Element,N,imide,Substituted dicarboximide,[CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1]),
98
- 2,Functional Groups by Element,N,imide,Dicarboxdiimide,[CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1]),
99
- 2,Functional Groups by Element,N,nitrate,Nitrate group,[$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)],Also hits nitrate anion
100
- 2,Functional Groups by Element,N,nitrate,Nitrate Anion,[$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])],
101
- 2,Functional Groups by Element,N,nitrile,Nitrile,[NX1]#[CX2],
102
- 2,Functional Groups by Element,N,nitrile,Isonitrile,[CX1-]#[NX2+],
103
- 2,Functional Groups by Element,N,nitro,Nitro group.,[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
104
- 2,Functional Groups by Element,N,nitro,Two Nitro groups,[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8],
105
- 2,Functional Groups by Element,N,nitroso,Nitroso-group,[NX2]=[OX1],
106
- 2,Functional Groups by Element,N,n-oxide,N-Oxide,[$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])],Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.
107
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl,[OX2H],
108
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in Alcohol,[#6][OX2H],
109
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in Carboxylic Acid,[OX2H][CX3]=[OX1],
110
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl in H-O-P-,[OX2H]P,
111
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Enol,[OX2H][#6X3]=[#6],
112
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Phenol,[OX2H][cX3]:[c],
113
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Enol or Phenol,[OX2H][$(C=C),$(cc)],
114
- 2,Functional Groups by Element,O,hydroxyl (includes alcohol, phenol),Hydroxyl_acidic,[$([OH]-*=[!#6])],An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids.
115
- 2,Functional Groups by Element,O,peroxide,Peroxide groups.,[OX2,OX1-][OX2,OX1-],Also hits anions.
116
- 2,Functional Groups by Element,P,phosphoric compounds,Phosphoric_acid groups.,[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])],Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on linear triphosphoric acid and longer).
117
- 2,Functional Groups by Element,P,phosphoric compounds,Phosphoric_ester groups.,[$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])],Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.
118
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Carbo-Thiocarboxylate,[S-][CX3](=S)[#6],
119
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Carbo-Thioester,S([#6])[CX3](=O)[#6],
120
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thio analog of carbonyl,[#6X3](=[SX1])([!N])[!N],Where S replaces O. Not a thioamide.
121
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thiol, Sulfide or Disulfide Sulfur,[SX2],
122
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Thiol,[#16X2H],
123
- 2,Functional Groups by Element,S,thio groups ( thio-, thi-, sulpho-, mercapto- ),Sulfur with at-least one hydrogen.,[#16!H0],
124
- 2,Functional Groups by Element,S,sulfide,Sulfide,[#16X2H0],-alkylthio Won't hit thiols. Hits disulfides.
125
- 2,Functional Groups by Element,S,sulfide,Mono-sulfide,[#16X2H0][!#16],alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
126
- 2,Functional Groups by Element,S,sulfide,Di-sulfide,[#16X2H0][#16X2H0],Won't hit thiols. Won't hit mono-sulfides.
127
- 2,Functional Groups by Element,S,sulfide,Two Sulfides,[#16X2H0][!#16].[#16X2H0][!#16],Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
128
- 2,Functional Groups by Element,S,sulfinate,Sulfinate,[$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])],Won't hit Sulfinic Acid. Hits Both Depiction Forms.
129
- 2,Functional Groups by Element,S,sulfinate,Sulfinic Acid,[$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])],Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate).
130
- 2,Functional Groups by Element,S,sulfone,Sulfone. Low specificity.,[$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])],Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
131
- 2,Functional Groups by Element,S,sulfone,Sulfone. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])],Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
132
- 2,Functional Groups by Element,S,sulfone,Sulfonic acid. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])],Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
133
- 2,Functional Groups by Element,S,sulfone,Sulfonate,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])],(sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
134
- 2,Functional Groups by Element,S,sulfone,Sulfonamide.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])],Only hits carbo- sulfonamide. Hits Both Depiction Forms.
135
- 2,Functional Groups by Element,S,sulfone,Carbo-azosulfone,[SX4](C)(C)(=O)=N,Partial N-Analog of Sulfone
136
- 2,Functional Groups by Element,S,sulfone,Sulfonamide,[$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])],(sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.
137
- 2,Functional Groups by Element,S,sulfoxide,Sulfoxide Low specificity.,[$([#16X3]=[OX1]),$([#16X3+][OX1-])],( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones.
138
- 2,Functional Groups by Element,S,sulfoxide,Sulfoxide High specificity,[$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])],(sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.
139
- 2,Functional Groups by Element,S,sulfate,Sulfate,[$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])],(sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms.
140
- 2,Functional Groups by Element,S,sulfate,Sulfuric acid ester (sulfate ester) Low specificity.,[$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)],Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms.
141
- 2,Functional Groups by Element,S,sulfate,Sulfuric Acid Diester.,[$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])],Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
142
- 2,Functional Groups by Element,S,sulfamate,Sulfamate.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])],Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
143
- 2,Functional Groups by Element,S,sulfamate,Sulfamic Acid.,[$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])],Hits acid and conjugate base. Hits Both Depiction Forms.
144
- 2,Functional Groups by Element,S,sulfene,Sulfenic acid.,[#16X2][OX2H,OX1H0-],Hits acid and conjugate base.
145
- 2,Functional Groups by Element,S,sulfene,Sulfenate.,[#16X2][OX2H0],
146
- 2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Any carbon attached to any halogen,[#6][F,Cl,Br,I],
147
- 2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Halogen,[F,Cl,Br,I],
148
- 2,Functional Groups by Element,X,halide (-halo -fluoro -chloro -bromo -iodo),Three_halides groups,[F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I],Hits SMILES that have three halides.
149
- 2,Functional Groups by Element,X,acyl halide,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],(acid halide, -oyl halide)
150
- 3,Gross Structual Features,Chirality,Specified chiral carbon.,[$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)],Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i mplicit H).
151
- 3,Gross Structual Features,Chirality,"No-conflict" chiral match,C[C@?](F)(Cl)Br,Will match molecules with chiralities as specified or unspecified.
152
- 3,Gross Structual Features,Chirality,"No-conflict" chiral match where an H is present,C[C@?H](Cl)Br,Will match molecules with chiralities as specified or unspecified.
153
- 3,Gross Structual Features,Orbital Configuration,sp2 cationic carbon,[$([cX2+](:*):*)],Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
154
- 3,Gross Structual Features,Orbital Configuration,Aromatic sp2 carbon.,[$([cX3](:*):*),$([cX2+](:*):*)],The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)
155
- 3,Gross Structual Features,Orbital Configuration,Any sp2 carbon.,[$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)],The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons ( alkenes). The fourth case matches non-aromatic cationic alkene carbons.
156
- 3,Gross Structual Features,Orbital Configuration,Any sp2 nitrogen.,[$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)],Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), rspectively.
157
- 3,Gross Structual Features,Orbital Configuration,Explicit Hydrogen on sp2-Nitrogen,[$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])],(H must be an isotope or ion)
158
- 3,Gross Structual Features,Orbital Configuration,sp3 nitrogen,[$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)],One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
159
- 3,Gross Structual Features,Orbital Configuration,Explicit Hydrogen on an sp3 N.,[$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])],One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
160
- 3,Gross Structual Features,Orbital Configuration,sp2 N in N-Oxide,[$([$([NX3]=O),$([NX3+][O-])])],
161
- 3,Gross Structual Features,Orbital Configuration,sp3 N in N-Oxide Exclusive:,[$([$([NX4]=O),$([NX4+][O-])])],Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
162
- 3,Gross Structual Features,Orbital Configuration,sp3 N in N-Oxide Inclusive:,[$([$([NX4]=O),$([NX4+][O-,#0])])],Hits if O could be present. Hits if * if used in place of O in smiles.
163
- 3,Gross Structual Features,Connectivity,Quaternary Nitrogen,[$([NX4+]),$([NX4]=*)],Hits non-aromatic Ns.
164
- 3,Gross Structual Features,Connectivity,Tricoordinate S double bonded to N.,[$([SX3]=N)],
165
- 3,Gross Structual Features,Connectivity,S double-bonded to Carbon,[$([SX1]=[#6])],Hits terminal (1-connected S)
166
- 3,Gross Structual Features,Connectivity,Triply bonded N,[$([NX1]#*)],
167
- 3,Gross Structual Features,Connectivity,Divalent Oxygen,[$([OX2])],
168
- 3,Gross Structual Features,Chains & Branching,Unbranched_alkane groups.,[R0;D2][R0;D2][R0;D2][R0;D2],Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
169
- 3,Gross Structual Features,Chains & Branching,Unbranched_chain groups.,[R0;D2]~[R0;D2]~[R0;D2]~[R0;D2],Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
170
- 3,Gross Structual Features,Chains & Branching,Long_chain groups.,[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0],Aliphatic chains at-least 8 members long.
171
- 3,Gross Structual Features,Chains & Branching,Atom_fragment,[!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])],(CLOGP definition) A fragment atom is a not an isolating carbon
172
- 3,Gross Structual Features,Chains & Branching,Carbon_isolating,[$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])],This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
173
- 3,Gross Structual Features,Chains & Branching,Terminal S bonded to P,[$([SX1]~P)],
174
- 3,Gross Structual Features,Chains & Branching,Nitrogen on -N-C=N-,[$([NX3]C=N)],
175
- 3,Gross Structual Features,Chains & Branching,Nitrogen on -N-N=C-,[$([NX3]N=C)],
176
- 3,Gross Structual Features,Chains & Branching,Nitrogen on -N-N=N-,[$([NX3]N=N)],
177
- 3,Gross Structual Features,Chains & Branching,Oxygen in -O-C=N-,[$([OX2]C=N)],
178
- 3,Gross Structual Features,Rotation,Rotatable bond,[!$(*#*)&!D1]-!@[!$(*#*)&!D1],An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note that logical operators can be applied to bonds ("-&!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo be considered rotatable without this specification.
179
- 3,Gross Structual Features,Cyclic Features,Bicyclic,[$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))],Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
180
- 3,Gross Structual Features,Cyclic Features,Ortho,*-!:aa-!:*,Ortho-substituted ring
181
- 3,Gross Structual Features,Cyclic Features,Meta,*-!:aaa-!:*,Meta-substituted ring
182
- 3,Gross Structual Features,Cyclic Features,Para,*-!:aaaa-!:*,Para-substituted ring
183
- 3,Gross Structual Features,Cyclic Features,Acylic-bonds,*!@*,
184
- 3,Gross Structual Features,Cyclic Features,Single bond and not in a ring,*-!@*,
185
- 3,Gross Structual Features,Cyclic Features,Non-ring atom,[R0] or [!R],
186
- 3,Gross Structual Features,Cyclic Features,Macrocycle groups.,[r;!r3;!r4;!r5;!r6;!r7],
187
- 3,Gross Structual Features,Cyclic Features,S in aromatic 5-ring with lone pair,[sX2r5],
188
- 3,Gross Structual Features,Cyclic Features,Aromatic 5-Ring O with Lone Pair,[oX2r5],
189
- 3,Gross Structual Features,Cyclic Features,N in 5-sided aromatic ring,[nX2r5],
190
- 3,Gross Structual Features,Cyclic Features,Spiro-ring center,[X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
191
- 3,Gross Structual Features,Cyclic Features,N in 5-ring arom,[$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])],anion
192
- 3,Gross Structual Features,Cyclic Features,CIS or TRANS double bond in a ring,*/,\[R]=;@[R]/,\*,An isomeric SMARTS consisting of four atoms and three bonds.
193
- 3,Gross Structual Features,Cyclic Features,CIS or TRANS double or aromatic bond in a ring,*/,\[R]=,:;@[R]/,\*
194
- 3,Gross Structual Features,Cyclic Features,Unfused benzene ring,[cR1]1[cR1][cR1][cR1][cR1][cR1]1,To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
195
- 3,Gross Structual Features,Cyclic Features,Multiple non-fused benzene rings,[cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1,
196
- 3,Gross Structual Features,Cyclic Features,Fused benzene rings,c12ccccc1cccc2,
197
- 4,Meta-SMARTS,Amino Acids,Generic amino acid: low specificity.,[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N],For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
198
- 4,Meta-SMARTS,Amino Acids,A.A. Template for 20 standard a.a.s,[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])],Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
199
- 4,Meta-SMARTS,Amino Acids,Proline,[$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N],
200
- 4,Meta-SMARTS,Amino Acids,Glycine,[$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])],
201
- 4,Meta-SMARTS,Amino Acids,Other a.a.,[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N],Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i polypeptides (internal, or terminal). Example usage: Alanine side chain is [CH3X4] Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
202
- 4,Meta-SMARTS,Amino Acids,18_standard_aa_side_chains.,([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])]),Can be any of the standard 18 (Pro & Gly are treated separately) Hits acids and conjugate bases.
203
- 4,Meta-SMARTS,Amino Acids,N in Any_standard_amino_acid.,[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])],Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern al, or terminal).
204
- 4,Meta-SMARTS,Amino Acids,Non-standard amino acid.,[$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),$([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),$([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),$([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])],Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.); !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
205
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Ortho,[SMARTS_expression]-!:aa-!:[SMARTS_expression]
206
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Meta,[SMARTS_expression]-!:aaa-!:[SMARTS_expression]
207
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Para,[SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
208
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Hydrogen,[$([#1][SMARTS_expression])],Hydrogen must be explicit i.e. an isotope or charged
209
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Nitrogen,[$([#7][SMARTS_expression])]
210
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Oxygen,[$([#8][SMARTS_expression])]
211
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS: Atoms connected to particular SMARTS,Fluorine,[$([#9][SMARTS_expression])]
212
- 4,Meta-SMARTS,Recursive or Multiple,Two possible groups,[$(SMARTS_expression_A),$(SMARTS_expression_B)],Hits atoms in either environment or group of interest, A or B. Example usages: Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])] Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])] Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])])]
213
- 4,Meta-SMARTS,Recursive or Multiple,Recursive SMARTS,[$([atom_that_gets_hit][other_atom][other_atom])],Hits first atom within parenthesis Example usages: [$([CX3]=[OX1])] hits Carbonyl Carbon [$([OX1]=[CX3])] hits Carbonyl Oxygen
214
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Sulfide,[#16X2H0],(-alkylthio) Won't hit thiols. Hits disulfides too.
215
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Mono-sulfide,[#16X2H0][!#16],(alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
216
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Di-sulfide,[#16X2H0][#16X2H0],Won't hit thiols. Won't hit mono-sulfides.
217
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Two sulfides,[#16X2H0][!#16].[#16X2H0][!#16],Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
218
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Acid/conj-base,[OX2H,OX1H0-],Hits acid and conjugate base. acid/base
219
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Non-acid Oxygen,[OX2H0],
220
- 4,Meta-SMARTS,Recursive or Multiple,Single only, Double only, Single or Double,Acid/base,[H1,H0-],Works for any atom if base form has no Hs & acid has only one.
221
- 4,Meta-SMARTS,Muntiple Disconnected Groups,Two disconnected SMARTS fragments,([Cl!$(Cl~c)].[c!$(c~Cl)]),A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B oth SMARTS fragments must be in the same SMILES target fragment.
222
- 4,Meta-SMARTS,Muntiple Disconnected Groups,Two disconnected SMARTS fragments,([Cl]).([c]),Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
223
- 4,Meta-SMARTS,Muntiple Disconnected Groups,Two not-necessarily connected SMARTS fragments,([Cl].[c]),Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
224
- 4,Meta-SMARTS,Muntiple Disconnected Groups,Two not-necessarily connected fragments,([SMARTS_expression]).([SMARTS_expression]),Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
225
- 4,Meta-SMARTS,Muntiple Disconnected Groups,Two primary or secondary amines,[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)],Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.
226
- 4,Meta-SMARTS,Tools & Tricks,Alternative/Equivalent Representations,Any carbon aromatic or non-aromatic,[#6] or [c,C],
227
- 4,Meta-SMARTS,Tools & Tricks,SMILES wildcard,[#0],This SMARTS hits the SMILES *
228
- 4,Meta-SMARTS,Tools & Tricks,Factoring,[OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-],Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
229
- 4,Meta-SMARTS,Tools & Tricks,High-precidence "and",[N&X4&+,N&X3&+0] or [NX4+,NX3+0],High-precidence "and" (&) is the default logical operator. "Or" (,) is higher precidence than & and low-precidence "and" (;) is lower precidence than &.
230
- 4,Meta-SMARTS,Tools & Tricks,Hs on Carbons,[#6!H0,#1],
231
- 4,Meta-SMARTS,Tools & Tricks,Atoms w/ 1 H,[H,#1],
232
- 5,Electron & Proton Features,Acids & Bases,Acid,[!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+],Proton donor
233
- 5,Electron & Proton Features,Acids & Bases,Carboxylic acid,[CX3](=O)[OX2H1],(-oic acid, COOH)
234
- 5,Electron & Proton Features,Acids & Bases,Carboxylic acid or conjugate base.,[CX3](=O)[OX1H0-,OX2H1],
235
- 5,Electron & Proton Features,Acids & Bases,Hydroxyl_acidic,[$([OH]-*=[!#6])],An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids
236
- 5,Electron & Proton Features,Acids & Bases,Phosphoric_Acid,[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])],Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in cluding acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
237
- 5,Electron & Proton Features,Acids & Bases,Sulfonic Acid. High specificity.,[$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])],Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo rms. Hits Arene sulfonic acids.
238
- 5,Electron & Proton Features,Acids & Bases,Acyl Halide,[CX3](=[OX1])[F,Cl,Br,I],(acid halide, -oyl halide)
239
- 5,Electron & Proton Features,Charge,Anionic divalent Nitrogen,[NX2-],
240
- 5,Electron & Proton Features,Charge,Oxenium Oxygen,[OX2H+]=*,
241
- 5,Electron & Proton Features,Charge,Oxonium Oxygen,[OX3H2+],
242
- 5,Electron & Proton Features,Charge,Carbocation,[#6+],
243
- 5,Electron & Proton Features,Charge,sp2 cationic carbon.,[$([cX2+](:*):*)],Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
244
- 5,Electron & Proton Features,Charge,Azide ion.,[$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])],Hits N in azide ion
245
- 5,Electron & Proton Features,Charge,Zwitterion High Specificity,[+1]~*~*~[-1],+1 charged atom separated by any 3 bonds from a -1 charged atom.
246
- 5,Electron & Proton Features,Charge,Zwitterion Low Specificity, Crude,[$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])],Variously charged moieties separated by up to ten bonds.
247
- 5,Electron & Proton Features,Charge,Zwitterion Low Specificity,([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4]),Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.
248
- 5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond acceptor,[#6,#7;R0]=[#8],Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge n.
249
- 5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond acceptor,[!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3)],A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
250
- 5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond donor.,[!$([#6,H0,-,-2,-3])],A H-bond donor is a non-negatively charged heteroatom with at least one H
251
- 5,Electron & Proton Features,H-bond Donors & Acceptors,Hydrogen-bond donor.,[!H0;#7,#8,#9],Must have an N-H bond, an O-H bond, or a F-H bond
252
- 5,Electron & Proton Features,H-bond Donors & Acceptors,Possible intramolecular H-bond,[O,N;!H0]-*~*-*=[$([C,N;R0]=O)],Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.
253
- 5,Electron & Proton Features,Radicals,Carbon Free-Radical,[#6;X3v3+0],Hits a neutral carbon with three single bonds.
254
- 5,Electron & Proton Features,Radicals,Nitrogen Free-Radical,[#7;X2v4+0],Hits a neutral nitrogen with two single bonds or with a single and a triple bond.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
rawgroups.txt DELETED
@@ -1,1145 +0,0 @@
1
-
2
- 2. Functional Groups by Element
3
-
4
- C
5
- alkane
6
-
7
- Alkyl Carbon
8
- [CX4]
9
-
10
-
11
- alkene (-ene)
12
-
13
- Allenic Carbon
14
- [$([CX2](=C)=C)]
15
-
16
- Vinylic Carbon
17
- [$([CX3]=[CX3])]
18
- Ethenyl carbon
19
-
20
-
21
- alkyne (-yne)
22
-
23
- Acetylenic Carbon
24
- [$([CX2]#C)]
25
-
26
-
27
- arene (Ar , aryl-, aromatic hydrocarbons)
28
-
29
- Arene
30
- c
31
-
32
-
33
- C & O
34
- carbonyl
35
-
36
- Carbonyl group. Low specificity
37
- [CX3]=[OX1]
38
- Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl halide, amide.
39
-
40
- Carbonyl group
41
- [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
42
- Hits either resonance structure
43
-
44
- Carbonyl with Carbon
45
- [CX3](=[OX1])C
46
- Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid halides). Won't hit carbamic acid/ester, carbonic acid/ester.
47
-
48
- Carbonyl with Nitrogen.
49
- [OX1]=CN
50
- Hits amide, carbamic acid/ester, poly peptide
51
-
52
- Carbonyl with Oxygen.
53
- [CX3](=[OX1])O
54
- Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde or ketone.
55
-
56
- Acyl Halide
57
- [CX3](=[OX1])[F,Cl,Br,I]
58
- acid halide, -oyl halide
59
-
60
- Aldehyde
61
- [CX3H1](=O)[#6]
62
- -al
63
-
64
- Anhydride
65
- [CX3](=[OX1])[OX2][CX3](=[OX1])
66
-
67
- Amide
68
- [NX3][CX3](=[OX1])[#6]
69
- -amide
70
-
71
- Amidinium
72
- [NX3][CX3]=[NX3+]
73
-
74
- Carbamate.
75
- [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
76
- Hits carbamic esters, acids, and zwitterions
77
-
78
- Carbamic ester
79
- [NX3][CX3](=[OX1])[OX2H0]
80
-
81
- Carbamic acid.
82
- [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
83
- Hits carbamic acids and zwitterions.
84
-
85
- Carboxylate Ion.
86
- [CX3](=O)[O-]
87
- Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
88
-
89
- Carbonic Acid or Carbonic Ester
90
- [CX3](=[OX1])(O)O
91
- Carbonic Acid, Carbonic Ester, or combination
92
-
93
- Carbonic Acid or Carbonic Acid-Ester
94
- [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
95
- Hits acid and conjugate base. Won't hit carbonic acid diester
96
-
97
- Carbonic Ester (carbonic acid diester)
98
- C[OX2][CX3](=[OX1])[OX2]C
99
- Won't hit carbonic acid or combination carbonic acid/ester
100
-
101
- Carboxylic acid
102
- [CX3](=O)[OX2H1]
103
- -oic acid, COOH
104
-
105
- Carboxylic acid or conjugate base.
106
- [CX3](=O)[OX1H0-,OX2H1]
107
-
108
- Cyanamide
109
- [NX3][CX2]#[NX1]
110
-
111
- Ester Also hits anhydrides
112
- [#6][CX3](=O)[OX2H0][#6]
113
- won't hit formic anhydride.
114
-
115
- Ketone
116
- [#6][CX3](=O)[#6]
117
- -one
118
-
119
-
120
- ether
121
-
122
- Ether
123
- [OD2]([#6])[#6]
124
-
125
-
126
- H
127
- hydrogen atoms
128
-
129
- Hydrogen Atom
130
- [H]
131
- Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
132
-
133
- Not a Hydrogen Atom
134
- [!#1]
135
- Hits SMILES that are not hydrogen atoms.
136
-
137
- Proton
138
- [H+]
139
- Hits positively charged hydrogen atoms: [H+]
140
-
141
-
142
- hydrogen count
143
-
144
- Mono-Hydrogenated Cation
145
- [+H]
146
- Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
147
-
148
- Not Mono-Hydrogenated
149
- [!H] or [!H1]
150
- Hits atoms that don't have exactly one attached hydrogen.
151
-
152
-
153
- N
154
- amide see carbonyl
155
-
156
-
157
- mine (-amino)
158
-
159
- Primary or secondary amine, not amide.
160
- [NX3;H2,H1;!$(NC=O)]
161
- Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is specified by N's H-count (H2 & H1 respectively). Also note that "&" (and) is the dafault opperator and is higher precedence that "," (or), which is higher precedence than ";" (and). Will hit cyanamides and thioamides
162
-
163
- Enamine
164
- [NX3][CX3]=[CX3]
165
-
166
- Primary amine, not amide.
167
- [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not ammonium ion (N must be 3-connected), not ammonia (N's H-count can't be 3), not cyanamide (C not triple bonded to a hetero-atom)
168
-
169
- Two primary or secondary amines
170
- [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
171
- Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
172
-
173
- Enamine or Aniline Nitrogen
174
- [NX3][$(C=C),$(cc)]
175
-
176
-
177
- amino acids
178
-
179
- Generic amino acid: low specificity.
180
- [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
181
- For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
182
-
183
- Dipeptide group. generic amino acid: low specificity.
184
- [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-]
185
- Won't hit pro or gly. Hits acids and conjugate bases.
186
-
187
- Amino Acid
188
- [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
189
- Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal). {e.g. usage: Alanine side chain is [CH3X4] . Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}
190
-
191
-
192
- amino acid side chains
193
-
194
- Alanine side chain
195
- [CH3X4]
196
-
197
- Arginine side chain.
198
- [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
199
- Hits acid and conjugate base.
200
-
201
- Aspargine side chain.
202
- [CH2X4][CX3](=[OX1])[NX3H2]
203
- Also hits Gln side chain when used alone.
204
-
205
- Aspartate (or Aspartic acid) side chain.
206
- [CH2X4][CX3](=[OX1])[OH0-,OH]
207
- Hits acid and conjugate base. Also hits Glu side chain when used alone.
208
-
209
- Cysteine side chain.
210
- [CH2X4][SX2H,SX1H0-]
211
- Hits acid and conjugate base
212
-
213
- Glutamate (or Glutamic acid) side chain.
214
- [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
215
- Hits acid and conjugate base.
216
-
217
- Glycine
218
- [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
219
-
220
- Histidine side chain.
221
- [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
222
- [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
223
- Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected with one H]) or (3-connected with one H).
224
-
225
- Isoleucine side chain
226
- [CHX4]([CH3X4])[CH2X4][CH3X4]
227
-
228
- Leucine side chain
229
- [CH2X4][CHX4]([CH3X4])[CH3X4]
230
-
231
- Lysine side chain.
232
- [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
233
- Acid and conjugate base
234
-
235
- Methionine side chain
236
- [CH2X4][CH2X4][SX2][CH3X4]
237
-
238
- Phenylalanine side chain
239
- [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
240
-
241
- Proline
242
- [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
243
-
244
- Serine side chain
245
- [CH2X4][OX2H]
246
-
247
- Thioamide
248
- [NX3][CX3]=[SX1]
249
-
250
- Threonine side chain
251
- [CHX4]([CH3X4])[OX2H]
252
-
253
- Tryptophan side chain
254
- [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
255
-
256
- Tyrosine side chain.
257
- [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
258
- Acid and conjugate base
259
-
260
- Valine side chain
261
- [CHX4]([CH3X4])[CH3X4]
262
-
263
- Alanine side chain
264
- [CH3X4]
265
-
266
- Arginine side chain.
267
- [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
268
- Hits acid and conjugate base.
269
-
270
- Aspargine side chain.
271
- [CH2X4][CX3](=[OX1])[NX3H2]
272
- Also hits Gln side chain when used alone.
273
-
274
- Aspartate (or Aspartic acid) side chain.
275
- [CH2X4][CX3](=[OX1])[OH0-,OH]
276
- Hits acid and conjugate base. Also hits Glu side chain when used alone.
277
-
278
- Cysteine side chain.
279
- [CH2X4][SX2H,SX1H0-]
280
- Hits acid and conjugate base
281
-
282
- Glutamate (or Glutamic acid) side chain.
283
- [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
284
- Hits acid and conjugate base.
285
-
286
- Glycine
287
- N[CX4H2][CX3](=[OX1])[O,N]
288
-
289
- Histidine side chain.
290
- [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
291
- [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
292
- Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-connected
293
-
294
- Isoleucine side chain
295
- [CHX4]([CH3X4])[CH2X4][CH3X4]
296
-
297
- Leucine side chain
298
- [CH2X4][CHX4]([CH3X4])[CH3X4]
299
-
300
- Lysine side chain.
301
- [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
302
- Acid and conjugate base
303
-
304
- Methionine side chain
305
- [CH2X4][CH2X4][SX2][CH3X4]
306
-
307
- Phenylalanine side chain
308
- [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
309
-
310
- Proline
311
- N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
312
-
313
- Serine side chain
314
- [CH2X4][OX2H]
315
-
316
- Threonine side chain
317
- [CHX4]([CH3X4])[OX2H]
318
-
319
- Tryptophan side chain
320
- [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
321
-
322
- Tyrosine side chain.
323
- [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
324
- Acid and conjugate base
325
-
326
- Valine side chain
327
- [CHX4]([CH3X4])[CH3X4]
328
-
329
-
330
- azide (-azido)
331
-
332
- Azide group.
333
- [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
334
- Hits any atom with an attached azide.
335
-
336
- Azide ion.
337
- [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
338
- Hits N in azide ion
339
-
340
-
341
- azo
342
-
343
- Nitrogen.
344
- [#7]
345
- Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
346
-
347
- Azo Nitrogen. Low specificity.
348
- [NX2]=N
349
- Hits diazene, azoxy and some diazo structures
350
-
351
- Azo Nitrogen.diazene
352
- [NX2]=[NX2]
353
- (diaza alkene)
354
-
355
- Azoxy Nitrogen.
356
- [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
357
-
358
- Diazo Nitrogen
359
- [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
360
-
361
- Azole.
362
- [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
363
- 5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are furo-, thio-, pyrro- (replace CH o' furfuran, thiophene, pyrrol w/ N)
364
-
365
-
366
- hydrazine
367
-
368
- Hydrazine H2NNH2
369
- [NX3][NX3]
370
-
371
-
372
- hydrazone
373
-
374
- Hydrazone C=NNH2
375
- [NX3][NX2]=[*]
376
-
377
-
378
- imine
379
-
380
- Substituted imine
381
- [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
382
- Schiff base
383
-
384
- Substituted or un-substituted imine
385
- [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
386
-
387
- Iminium
388
- [NX3+]=[CX3]
389
-
390
-
391
- imide
392
-
393
- Unsubstituted dicarboximide
394
- [CX3](=[OX1])[NX3H][CX3](=[OX1])
395
-
396
- Substituted dicarboximide
397
- [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
398
-
399
- Dicarboxdiimide
400
- [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])
401
-
402
-
403
- nitrate
404
-
405
- Nitrate group
406
- [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
407
- Also hits nitrate anion
408
-
409
- Nitrate Anion
410
- [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]
411
-
412
-
413
- nitrile
414
-
415
- Nitrile
416
- [NX1]#[CX2]
417
-
418
- Isonitrile
419
- [CX1-]#[NX2+]
420
-
421
-
422
- nitro
423
-
424
- Nitro group.
425
- [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
426
-
427
- Two Nitro groups
428
- [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]
429
-
430
-
431
- nitroso
432
-
433
- Nitroso-group
434
- [NX2]=[OX1]
435
-
436
-
437
- n-oxide
438
-
439
- N-Oxide
440
- [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
441
- Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.
442
-
443
-
444
- O
445
- hydroxyl (includes alcohol, phenol)
446
-
447
- Hydroxyl
448
- [OX2H]
449
-
450
- Hydroxyl in Alcohol
451
- [#6][OX2H]
452
-
453
- Hydroxyl in Carboxylic Acid
454
- [OX2H][CX3]=[OX1]
455
-
456
- Hydroxyl in H-O-P-
457
- [OX2H]P
458
-
459
- Enol
460
- [OX2H][#6X3]=[#6]
461
-
462
- Phenol
463
- [OX2H][cX3]:[c]
464
-
465
- Enol or Phenol
466
- [OX2H][$(C=C),$(cc)]
467
-
468
- Hydroxyl_acidic
469
- [$([OH]-*=[!#6])]
470
- An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, phosphorous, halogen and nitrogen oxyacids.
471
-
472
-
473
- peroxide
474
-
475
- Peroxide groups.
476
- [OX2,OX1-][OX2,OX1-]
477
- Also hits anions.
478
-
479
-
480
- P
481
- phosphoric compounds
482
-
483
- Phosphoric_acid groups.
484
- [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
485
- Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on linear triphosphoric acid and longer).
486
-
487
- Phosphoric_ester groups.
488
- [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
489
- Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.
490
-
491
-
492
- S
493
- thio groups ( thio-, thi-, sulpho-, mercapto- )
494
-
495
- Carbo-Thiocarboxylate
496
- [S-][CX3](=S)[#6]
497
-
498
- Carbo-Thioester
499
- S([#6])[CX3](=O)[#6]
500
-
501
- Thio analog of carbonyl
502
- [#6X3](=[SX1])([!N])[!N]
503
- Where S replaces O. Not a thioamide.
504
-
505
- Thiol, Sulfide or Disulfide Sulfur
506
- [SX2]
507
-
508
- Thiol
509
- [#16X2H]
510
-
511
- Sulfur with at-least one hydrogen.
512
- [#16!H0]
513
-
514
- Thioamide
515
- [NX3][CX3]=[SX1]
516
-
517
-
518
- sulfide
519
-
520
- Sulfide
521
- [#16X2H0]
522
- -alkylthio Won't hit thiols. Hits disulfides.
523
-
524
- Mono-sulfide
525
- [#16X2H0][!#16]
526
- alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
527
-
528
- Di-sulfide
529
- [#16X2H0][#16X2H0]
530
- Won't hit thiols. Won't hit mono-sulfides.
531
-
532
- Two Sulfides
533
- [#16X2H0][!#16].[#16X2H0][!#16]
534
- Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
535
-
536
-
537
- sulfinate
538
-
539
- Sulfinate
540
- [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
541
- Won't hit Sulfinic Acid. Hits Both Depiction Forms.
542
-
543
- Sulfinic Acid
544
- [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
545
- Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate).
546
-
547
-
548
- sulfone
549
-
550
- Sulfone. Low specificity.
551
- [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
552
- Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono- & di- esters, sulfamic acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
553
-
554
- Sulfone. High specificity.
555
- [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
556
- Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
557
-
558
- Sulfonic acid. High specificity.
559
- [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
560
- Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
561
-
562
- Sulfonate
563
- [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
564
- (sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both Depiction Forms.
565
-
566
- Sulfonamide.
567
- [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
568
- Only hits carbo- sulfonamide. Hits Both Depiction Forms.
569
-
570
- Carbo-azosulfone
571
- [SX4](C)(C)(=O)=N
572
- Partial N-Analog of Sulfone
573
-
574
- Sulfonamide
575
- [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
576
- (sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.
577
-
578
-
579
- sulfoxide
580
-
581
- Sulfoxide Low specificity.
582
- [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
583
- ( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction Forms. Won't hit sulfones.
584
-
585
- Sulfoxide High specificity
586
- [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
587
- (sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.
588
-
589
-
590
- sulfate
591
-
592
- Sulfate
593
- [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]
594
- (sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base. Hits Both Depiction Forms.
595
-
596
- Sulfuric acid ester (sulfate ester) Low specificity.
597
- [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
598
- Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and conjugate base. Hits Both Depiction Forms.
599
-
600
- Sulfuric Acid Diester.
601
- [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6])]
602
- Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
603
-
604
-
605
- sulfamate
606
-
607
- Sulfamate.
608
- [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
609
- Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
610
-
611
- Sulfamic Acid.
612
- [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2H,OX1H0-])]
613
- Hits acid and conjugate base. Hits Both Depiction Forms.
614
-
615
-
616
- sulfene
617
-
618
- Sulfenic acid.
619
- [#16X2][OX2H,OX1H0-]
620
- Hits acid and conjugate base.
621
-
622
- Sulfenate.
623
- [#16X2][OX2H0]
624
-
625
-
626
- X
627
- halide (-halo -fluoro -chloro -bromo -iodo)
628
-
629
- Any carbon attached to any halogen
630
- [#6][F,Cl,Br,I]
631
-
632
- Halogen
633
- [F,Cl,Br,I]
634
-
635
- Three_halides groups
636
- [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
637
- Hits SMILES that have three halides.
638
-
639
-
640
- acyl halide
641
-
642
- Acyl Halide
643
- [CX3](=[OX1])[F,Cl,Br,I]
644
- (acid halide, -oyl halide)
645
-
646
-
647
- 3. Gross Structual Features
648
-
649
-
650
- Chirality Orbital Configuration Connectivity Chains & Branching Rotation Cyclic Features
651
-
652
-
653
- Chirality
654
-
655
- Specified chiral carbon.
656
- [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
657
- Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules whose chirality is unspecified b ut that could otherwise be considered chiral. Also,therefore won't match molecules that would be chiral due to an implicit connection (i.e.i mplicit H).
658
-
659
- "No-conflict" chiral match
660
- C[C@?](F)(Cl)Br
661
- Will match molecules with chiralities as specified or unspecified.
662
-
663
- "No-conflict" chiral match where an H is present
664
- C[C@?H](Cl)Br
665
- Will match molecules with chiralities as specified or unspecified.
666
-
667
-
668
- Orbital Configuration
669
-
670
- sp2 cationic carbon
671
- [$([cX2+](:*):*)]
672
- Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
673
-
674
- Aromatic sp2 carbon.
675
- [$([cX3](:*):*),$([cX2+](:*):*)]
676
- The first recursive SMARTS matches carbons that are three-connected, the second case matches two-connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)
677
-
678
- Any sp2 carbon.
679
- [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
680
- The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case matches two-connected aromatic ca rbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital). The third case matches three-connected non-aromatic carbons ( alkenes). The fourth case matches non-aromatic cationic alkene carbons.
681
-
682
- Any sp2 nitrogen.
683
- [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
684
- Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-connected with 2 aromatic bonds (and a free pair of electrons in a nonbonding orbital, e.g.Pyridine), either aromatic or non-aromatic 2-connected with a double bond (and a free pair of electrons in a nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this form does not exist in reality, SMILES can represent the charge-separated resonance structures as a single uncharged structure), either aromatic or non-aromatic 3-connected cation w/ 1 single bond and 1 double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified), either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous case but R is hydrogen), rspectively.
685
-
686
- Explicit Hydrogen on sp2-Nitrogen
687
- [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)])]
688
- (H must be an isotope or ion)
689
-
690
- sp3 nitrogen
691
- [$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)]
692
- One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not aromatically bonded.
693
-
694
- Explicit Hydrogen on an sp3 N.
695
- [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])]
696
- One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
697
-
698
- sp2 N in N-Oxide
699
- [$([$([NX3]=O),$([NX3+][O-])])]
700
-
701
- sp3 N in N-Oxide Exclusive:
702
- [$([$([NX4]=O),$([NX4+][O-])])]
703
- Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
704
-
705
- sp3 N in N-Oxide Inclusive:
706
- [$([$([NX4]=O),$([NX4+][O-,#0])])]
707
- Hits if O could be present. Hits if * if used in place of O in smiles.
708
-
709
-
710
- Connectivity
711
-
712
- Quaternary Nitrogen
713
- [$([NX4+]),$([NX4]=*)]
714
- Hits non-aromatic Ns.
715
-
716
- Tricoordinate S double bonded to N.
717
- [$([SX3]=N)]
718
-
719
- S double-bonded to Carbon
720
- [$([SX1]=[#6])]
721
- Hits terminal (1-connected S)
722
-
723
- Triply bonded N
724
- [$([NX1]#*)]
725
-
726
- Divalent Oxygen
727
- [$([OX2])]
728
-
729
-
730
- Chains & Branching
731
-
732
- Unbranched_alkane groups.
733
- [R0;D2][R0;D2][R0;D2][R0;D2]
734
- Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
735
-
736
- Unbranched_chain groups.
737
- [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
738
- Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
739
-
740
- Long_chain groups.
741
- [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
742
- Aliphatic chains at-least 8 members long.
743
-
744
- Atom_fragment
745
- [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
746
- (CLOGP definition) A fragment atom is a not an isolating carbon
747
-
748
- Carbon_isolating
749
- [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
750
- This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an aromatic C between two aromati c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero atom.
751
-
752
- Terminal S bonded to P
753
- [$([SX1]~P)]
754
-
755
- Nitrogen on -N-C=N-
756
- [$([NX3]C=N)]
757
-
758
- Nitrogen on -N-N=C-
759
- [$([NX3]N=C)]
760
-
761
- Nitrogen on -N-N=N-
762
- [$([NX3]N=N)]
763
-
764
- Oxygen in -O-C=N-
765
- [$([OX2]C=N)]
766
-
767
-
768
- Rotation
769
-
770
- Rotatable bond
771
- [!$(*#*)&!D1]-!@[!$(*#*)&!D1]
772
- An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring bond to and equivalent atom. Note that logical operators can be applied to bonds ("-&!@"). Here, the overall SMARTS consists of two atoms and one bond. The bond is "site and not ring". *#* any atom triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables us to use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to avoid bonds such as c1ccccc1-C#C which wo be considered rotatable without this specification.
773
-
774
-
775
- Cyclic Features
776
-
777
- Bicyclic
778
- [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
779
- Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
780
-
781
- Ortho
782
- *-!:aa-!:*
783
- Ortho-substituted ring
784
-
785
- Meta
786
- *-!:aaa-!:*
787
- Meta-substituted ring
788
-
789
- Para
790
- *-!:aaaa-!:*
791
- Para-substituted ring
792
-
793
- Acylic-bonds
794
- *!@*
795
-
796
- Single bond and not in a ring
797
- *-!@*
798
-
799
- Non-ring atom
800
- [R0] or [!R]
801
-
802
- Macrocycle groups.
803
- [r;!r3;!r4;!r5;!r6;!r7]
804
-
805
- S in aromatic 5-ring with lone pair
806
- [sX2r5]
807
-
808
- Aromatic 5-Ring O with Lone Pair
809
- [oX2r5]
810
-
811
- N in 5-sided aromatic ring
812
- [nX2r5]
813
-
814
- Spiro-ring center
815
- [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
816
-
817
- N in 5-ring arom
818
- [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
819
-
820
- CIS or TRANS double bond in a ring
821
- */,\[R]=;@[R]/,\*
822
- An isomeric SMARTS consisting of four atoms and three bonds.
823
-
824
- CIS or TRANS double or aromatic bond in a ring
825
- */,\[R]=,:;@[R]/,\*
826
-
827
- Unfused benzene ring
828
- [cR1]1[cR1][cR1][cR1][cR1][cR1]1
829
- To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where each atom is only in one ring:
830
-
831
- Multiple non-fused benzene rings
832
- [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
833
-
834
- Fused benzene rings
835
- c12ccccc1cccc2
836
-
837
-
838
- 4. Meta-SMARTS
839
-
840
-
841
- Amino Acids Recursive or Multiple Tools &Tricks
842
-
843
-
844
- Amino Acids
845
-
846
- Generic amino acid: low specificity.
847
- [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
848
- For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
849
-
850
- A.A. Template for 20 standard a.a.s
851
- [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),
852
- $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
853
- Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
854
-
855
- Proline
856
- [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]
857
-
858
- Glycine
859
- [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
860
-
861
- Other a.a.
862
- [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
863
- Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard a.a. Won't work with Proline or Glycine, they have their own SMARTS (see side chain list). Hits acids and conjugate bases. Hits single a.a.s and specific residues w/i polypeptides (internal, or terminal).
864
- Example usage:
865
- Alanine side chain is [CH3X4]
866
- Alanine Search is [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
867
-
868
- 18_standard_aa_side_chains.
869
- ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),
870
- $([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
871
- $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
872
- $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
873
- [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
874
- $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
875
- $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
876
- $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
877
- $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
878
- $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
879
- Can be any of the standard 18 (Pro & Gly are treated separately) Hits acids and conjugate bases.
880
-
881
- N in Any_standard_amino_acid.
882
- [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]
883
- (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]
884
- (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),
885
- $([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$
886
- ([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
887
- $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
888
- $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
889
- [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
890
- $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
891
- $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
892
- $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
893
- $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
894
- $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
895
- $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
896
- Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains list (or'd together). A gen eric amino acid with any of the 18 side chains or, proline or glycine. Hits "standard" amino acids that have terminally appended groups (i.e . "standard" refers to the side chains). (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern al, or terminal).
897
-
898
- Non-standard amino acid.
899
- [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),
900
- $([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),
901
- $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),
902
- $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3]
903
- (=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
904
- $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:
905
- [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
906
- [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),
907
- $([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
908
- $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
909
- $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[OX2H]),
910
- $([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
911
- $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
912
- $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
913
- Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains). Won't hit amino acids that are non-standard due solely to the fact that groups are terminally-appended to the polypeptide chain (N or C term). format is [$(generic a.a.); !$(not a standard one)] Hits single a.a.s and specific residues w/in polypeptides (internal, or terminal).
914
-
915
-
916
- Recursive or Multiple
917
- Recursive SMARTS: Atoms connected to particular SMARTS
918
-
919
- Ortho
920
- [SMARTS_expression]-!:aa-!:[SMARTS_expression]
921
-
922
- Meta
923
- [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
924
-
925
- Para
926
- [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
927
-
928
- Hydrogen
929
- [$([#1][SMARTS_expression])]
930
- Hydrogen must be explicit i.e. an isotope or charged
931
-
932
- Nitrogen
933
- [$([#7][SMARTS_expression])]
934
-
935
- Oxygen
936
- [$([#8][SMARTS_expression])]
937
-
938
- Fluorine
939
- [$([#9][SMARTS_expression])]
940
-
941
-
942
- Recursive SMARTS: Multiple groups
943
-
944
- Two possible groups
945
- [$(SMARTS_expression_A),$(SMARTS_expression_B)]
946
- Hits atoms in either environment or group of interest, A or B.
947
- Example usages:
948
- Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
949
- Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
950
- Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])])]
951
-
952
- Recursive SMARTS
953
- [$([atom_that_gets_hit][other_atom][other_atom])]
954
- Hits first atom within parenthesis Example usages:
955
- [$([CX3]=[OX1])] hits Carbonyl Carbon [$([OX1]=[CX3])] hits Carbonyl Oxygen
956
-
957
-
958
- Single only, Double only, Single or Double
959
-
960
- Sulfide
961
- [#16X2H0]
962
- (-alkylthio) Won't hit thiols. Hits disulfides too.
963
-
964
- Mono-sulfide
965
- [#16X2H0][!#16]
966
- (alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
967
-
968
- Di-sulfide
969
- [#16X2H0][#16X2H0]
970
- Won't hit thiols. Won't hit mono-sulfides.
971
-
972
- Two sulfides
973
- [#16X2H0][!#16].[#16X2H0][!#16]
974
- Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
975
-
976
- Acid/conj-base
977
- [OX2H,OX1H0-]
978
- Hits acid and conjugate base. acid/base
979
-
980
- Non-acid Oxygen
981
- [OX2H0]
982
-
983
- Acid/base
984
- [H1,H0-]
985
- Works for any atom if base form has no Hs & acid has only one.
986
-
987
-
988
- Muntiple Disconnected Groups
989
-
990
- Two disconnected SMARTS fragments
991
- ([Cl!$(Cl~c)].[c!$(c~Cl)])
992
- A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other. Uses component-level SMARTS. B oth SMARTS fragments must be in the same SMILES target fragment.
993
-
994
- Two disconnected SMARTS fragments
995
- ([Cl]).([c])
996
- Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES fragments.
997
-
998
- Two not-necessarily connected SMARTS fragments
999
- ([Cl].[c])
1000
- Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target fragment.
1001
-
1002
- Two not-necessarily connected fragments
1003
- ([SMARTS_expression]).([SMARTS_expression])
1004
- Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
1005
-
1006
- Two primary or secondary amines
1007
- [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
1008
- Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical patterns.
1009
-
1010
-
1011
- Tools &Tricks
1012
- Alternative/Equivalent Representations
1013
-
1014
- Any carbon aromatic or non-aromatic
1015
- [#6] or [c,C]
1016
-
1017
- SMILES wildcard
1018
- [#0]
1019
- This SMARTS hits the SMILES *
1020
-
1021
- Factoring
1022
- [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
1023
- Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
1024
-
1025
- High-precidence "and"
1026
- [N&X4&+,N&X3&+0] or [NX4+,NX3+0]
1027
- High-precidence "and" (&) is the default logical operator. "Or" (,) is higher precidence than & and low-precidence "and" (;) is lower precidence than &.
1028
-
1029
-
1030
- Hydrogens
1031
-
1032
- Any atom w/ at-least 1 H
1033
- [*!H0,#1]
1034
- In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope). The hydrogen count is instead consi dered a property of an atom. This SMARTS provides a way to effectively hit Hs themselves.
1035
-
1036
- Hs on Carbons
1037
- [#6!H0,#1]
1038
-
1039
- Atoms w/ 1 H
1040
- [H,#1]
1041
-
1042
-
1043
- 5. Electron & Proton Features
1044
-
1045
-
1046
- Acids & Bases Charge H-bond Donors & Acceptors Radicals
1047
-
1048
-
1049
- Acids & Bases
1050
-
1051
- Acid
1052
- [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
1053
- Proton donor
1054
-
1055
- Carboxylic acid
1056
- [CX3](=O)[OX2H1]
1057
- (-oic acid, COOH)
1058
-
1059
- Carboxylic acid or conjugate base.
1060
- [CX3](=O)[OX1H0-,OX2H1]
1061
-
1062
- Hydroxyl_acidic
1063
- [$([OH]-*=[!#6])]
1064
- An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids
1065
-
1066
- Phosphoric_Acid
1067
- [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
1068
- Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit monophosphoric acid anhydride esters (in cluding acidic mono- & di- esters) but will hit some polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe r, di- esters on linear triphosphoric acid and longer). Hits acid and conjugate base.
1069
-
1070
- Sulfonic Acid. High specificity.
1071
- [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H,OX1H0-])]
1072
- Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate base. Hits Both Depiction Fo rms. Hits Arene sulfonic acids.
1073
-
1074
- Acyl Halide
1075
- [CX3](=[OX1])[F,Cl,Br,I]
1076
- (acid halide, -oyl halide)
1077
-
1078
-
1079
- Charge
1080
-
1081
- Anionic divalent Nitrogen
1082
- [NX2-]
1083
-
1084
- Oxenium Oxygen
1085
- [OX2H+]=*
1086
-
1087
- Oxonium Oxygen
1088
- [OX3H2+]
1089
-
1090
- Carbocation
1091
- [#6+]
1092
-
1093
- sp2 cationic carbon.
1094
- [$([cX2+](:*):*)]
1095
- Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
1096
-
1097
- Azide ion.
1098
- [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
1099
- Hits N in azide ion
1100
-
1101
- Zwitterion High Specificity
1102
- [+1]~*~*~[-1]
1103
- +1 charged atom separated by any 3 bonds from a -1 charged atom.
1104
-
1105
- Zwitterion Low Specificity, Crude
1106
- [$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4])]
1107
- Variously charged moieties separated by up to ten bonds.
1108
-
1109
- Zwitterion Low Specificity
1110
- ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
1111
- Variously charged moieties that are within the same molecule but not-necessarily connected. Uses component-level grouping.
1112
-
1113
-
1114
- H-bond Donors & Acceptors
1115
-
1116
- Hydrogen-bond acceptor
1117
- [#6,#7;R0]=[#8]
1118
- Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring, double bonded to an oxyge n.
1119
-
1120
- Hydrogen-bond acceptor
1121
- [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
1122
- A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or sulphur are included. Excluded are halogens, including F, heteroaromatic oxygen, sulphur and pyrrole N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl y included. Zeneca's work would imply that (O=S=O) shoud also be excluded.
1123
-
1124
- Hydrogen-bond donor.
1125
- [!$([#6,H0,-,-2,-3])]
1126
- A H-bond donor is a non-negatively charged heteroatom with at least one H
1127
-
1128
- Hydrogen-bond donor.
1129
- [!H0;#7,#8,#9]
1130
- Must have an N-H bond, an O-H bond, or a F-H bond
1131
-
1132
- Possible intramolecular H-bond
1133
- [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
1134
- Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive SMARTS", where "$()" encloses a valid nested SMARTS and acts syntactically like an atom-primitive in the overall SMARTS. Multiple nesting is allowed.
1135
-
1136
-
1137
- Radicals
1138
-
1139
- Carbon Free-Radical
1140
- [#6;X3v3+0]
1141
- Hits a neutral carbon with three single bonds.
1142
-
1143
- Nitrogen Free-Radical
1144
- [#7;X2v4+0]
1145
- Hits a neutral nitrogen with two single bonds or with a single and a triple bond.