sollamon commited on
Commit
c477432
·
verified ·
1 Parent(s): bc9c368

Updated READNE

Browse files
Files changed (1) hide show
  1. README.md +329 -0
README.md CHANGED
@@ -1,3 +1,332 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ru
4
+ - uk
5
+ - en
6
+ tags:
7
+ - text-classification
8
+ - spam-detection
9
+ - ad-filter
10
+ - telegram
11
+ - moderation
12
+ - anti-spam
13
+ - obfuscation
14
+ library_name: transformers
15
+ pipeline_tag: text-classification
16
+ base_model: cointegrated/rubert-tiny2
17
  license: apache-2.0
18
  ---
19
+
20
+ # floxoris/adrash-v0
21
+
22
+ **Adrash v0** is a compact binary text classification model for detecting advertisements, promo spam, referral spam, Telegram channel promotion, suspicious job spam, and obfuscated ad-like messages.
23
+
24
+ The model is designed for lightweight moderation systems, especially:
25
+
26
+ - Telegram bots
27
+ - Telegram groups
28
+ - Telegram Mini Apps
29
+ - marketplaces
30
+ - comment sections
31
+ - chat systems
32
+ - small moderation APIs
33
+
34
+ **Adrash** means **Ad + Trash**: a small filter that catches advertising garbage before it reaches users.
35
+
36
+ ## Labels
37
+
38
+ | ID | Label | Meaning |
39
+ |---:|---|---|
40
+ | 0 | `clean` | Normal message |
41
+ | 1 | `ad_spam` | Advertisement, promo, referral spam, job spam, channel promotion, suspicious commercial message |
42
+
43
+ ## What Adrash v0 detects
44
+
45
+ Adrash v0 is trained to detect messages like:
46
+
47
+ - Telegram channel promotion
48
+ - referral spam
49
+ - promo-code spam
50
+ - suspicious job offers
51
+ - “work online” spam
52
+ - salary bait messages
53
+ - “write me in DM” spam
54
+ - obfuscated Telegram spam
55
+ - emoji-heavy salary fragments
56
+ - messages with mixed Cyrillic, Latin, and Greek letters
57
+ - messages with hidden Unicode / zero-width characters
58
+
59
+ Examples of target spam:
60
+
61
+ ```text
62
+ РАБОТА ОНЛАЙН 💰
63
+ Ищу людей в команду на обучение
64
+ Опыт не требуется, всему научу
65
+ ЗП 2000-5000р/день
66
+ Связь: @username
67
+ Подпишись на канал и получи бонус
68
+ ```
69
+
70
+ Obfuscated examples:
71
+
72
+ ```text
73
+ Ηa ceгοдня–зaвтpa нужны 2 чeлοвeκa
74
+ ⚠️ЗП в m еcяц 2000💵+
75
+ ➡️Uщy людeй в koмaнду на 0бучenиe
76
+ Εcли гοтοвы выйти — пишитe «+» в личныe cοοбщeния
77
+ ```
78
+
79
+ ## What Adrash v0 is not for
80
+
81
+ Adrash v0 is **not** a general safety model.
82
+
83
+ It is not designed to reliably detect:
84
+
85
+ - toxicity
86
+ - hate speech
87
+ - violent threats
88
+ - illegal activity
89
+ - self-harm
90
+ - sexual content
91
+ - malware
92
+ - political manipulation
93
+ - general abuse
94
+
95
+ For those categories, use a separate safety classifier.
96
+
97
+ ## Recommended thresholds
98
+
99
+ The model outputs probabilities for `clean` and `ad_spam`.
100
+
101
+ Recommended moderation policy:
102
+
103
+ | ad_spam score | Action |
104
+ |---:|---|
105
+ | `>= 0.85` | Block / delete |
106
+ | `0.65 - 0.85` | Send to manual moderation |
107
+ | `< 0.65` | Allow |
108
+
109
+ For production systems, it is better to reduce false positives. Accidentally deleting normal messages is usually worse than missing a small amount of spam.
110
+
111
+ ## Usage with Transformers
112
+
113
+ ```python
114
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
115
+ import torch
116
+
117
+ model_id = "floxoris/adrash-v0"
118
+
119
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
120
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
121
+
122
+ model.eval()
123
+
124
+ text = "РАБОТА ОНЛАЙН 💰 ЗП каждый день, пишите в личку"
125
+
126
+ inputs = tokenizer(
127
+ text,
128
+ return_tensors="pt",
129
+ truncation=True,
130
+ max_length=160,
131
+ )
132
+
133
+ with torch.inference_mode():
134
+ logits = model(**inputs).logits[0]
135
+ probs = torch.softmax(logits, dim=-1)
136
+
137
+ clean_score = float(probs[0])
138
+ ad_spam_score = float(probs[1])
139
+
140
+ label = "ad_spam" if ad_spam_score >= clean_score else "clean"
141
+
142
+ print({
143
+ "label": label,
144
+ "clean": clean_score,
145
+ "ad_spam": ad_spam_score,
146
+ })
147
+ ```
148
+
149
+ ## Usage with pipeline
150
+
151
+ ```python
152
+ from transformers import pipeline
153
+
154
+ classifier = pipeline(
155
+ "text-classification",
156
+ model="floxoris/adrash-v0",
157
+ tokenizer="floxoris/adrash-v0",
158
+ return_all_scores=True,
159
+ )
160
+
161
+ text = "Подпишись на канал и получи бонус"
162
+ result = classifier(text)
163
+
164
+ print(result)
165
+ ```
166
+
167
+ ## Telegram bot moderation example
168
+
169
+ ```python
170
+ def moderation_decision(ad_spam_score: float) -> str:
171
+ if ad_spam_score >= 0.85:
172
+ return "block"
173
+ if ad_spam_score >= 0.65:
174
+ return "moderate"
175
+ return "allow"
176
+ ```
177
+
178
+ For Telegram groups, it is recommended to classify a short message buffer from the same user instead of only one isolated message.
179
+
180
+ Example:
181
+
182
+ ```text
183
+ User sends 3 messages within 20 seconds:
184
+
185
+ 1. РАБОТА ОНЛАЙН 💰
186
+ 2. Опыт не требуется, всему научу
187
+ 3. Связь: @username
188
+ ```
189
+
190
+ Classifying the combined block is usually more reliable than classifying each fragment separately.
191
+
192
+ ## Training data
193
+
194
+ Adrash v0 was trained on a mixture of public spam/ham datasets, Telegram-like datasets, synthetic Telegram-style advertisement examples, clean hard-negative examples, and obfuscation-heavy spam samples.
195
+
196
+ Training sources include:
197
+
198
+ ```text
199
+ thehamkercat/telegram-spam-ham
200
+ mshenoda/spam-messages
201
+ Deysi/spam-detection-dataset
202
+ SetFit/enron_spam
203
+ KSE-RESEARCH-Group/UAReviews
204
+ zefang-liu/phishing-email-dataset
205
+ ucirvine/sms_spam
206
+ SmsSpamCollection
207
+ ScoutieAutoML/russian-news-telegram-dataset
208
+ ScoutieAutoML/cybersecurity_news_telegram_dataset
209
+ ```
210
+
211
+ The training set also includes hard-negative examples such as:
212
+
213
+ ```text
214
+ як зробити реферальну систему в боті?
215
+ потрібно додати кнопку підписатися
216
+ мій Telegram-бот не бачить канал
217
+ скільки коштує реклама в телеграмі?
218
+ це реклама чи нормальний пост?
219
+ ```
220
+
221
+ These examples help reduce false positives on developer, moderation, marketplace, and Telegram-bot related conversations.
222
+
223
+ ## Obfuscation robustness
224
+
225
+ Adrash v0 was trained with examples containing:
226
+
227
+ - zero-width Unicode characters
228
+ - Cyrillic / Latin / Greek homoglyph mixing
229
+ - digits used as letters
230
+ - emoji salary fragments
231
+ - short Telegram spam fragments
232
+ - suspicious job-spam patterns
233
+ - mixed-language spam
234
+ - Telegram invite links
235
+ - username/contact bait
236
+
237
+ Examples:
238
+
239
+ ```text
240
+ ⁠⁠⁠⁠⁠⁠⁠⁠⁠РАБОТА О НЛАЙН 💰
241
+ ➡️Uщy людeй в koмaнду на 0бучenиe
242
+ ⚠️ЗП в m еcяц 2000💵+
243
+ 👀 Bсе что нужно - teлeфoн и жeлаnue paб0taть
244
+ ✉️ Св⁠язь: @username⁠͏‍
245
+ ```
246
+
247
+ ## Evaluation
248
+
249
+ Replace this section with real metrics from the final training run.
250
+
251
+ ```json
252
+ {
253
+ "validation": {
254
+ "eval_precision_ad": "TODO",
255
+ "eval_recall_ad": "TODO",
256
+ "eval_f1_ad": "TODO",
257
+ "eval_false_positive_rate": "TODO",
258
+ "eval_false_negative_rate": "TODO"
259
+ },
260
+ "benchmark": {
261
+ "benchmark_precision_ad": "TODO",
262
+ "benchmark_recall_ad": "TODO",
263
+ "benchmark_f1_ad": "TODO",
264
+ "benchmark_false_positive_rate": "TODO",
265
+ "benchmark_false_negative_rate": "TODO"
266
+ },
267
+ "hard_test": {
268
+ "hard_test_precision_ad": "TODO",
269
+ "hard_test_recall_ad": "TODO",
270
+ "hard_test_f1_ad": "TODO",
271
+ "hard_test_false_positive_rate": "TODO",
272
+ "hard_test_false_negative_rate": "TODO"
273
+ }
274
+ }
275
+ ```
276
+
277
+ ## Limitations
278
+
279
+ Adrash v0 may still fail on:
280
+
281
+ - very short fragments without context
282
+ - new spam formats not present in training data
283
+ - messages that require external context
284
+ - mixed moderation categories, such as toxic spam or illegal offers
285
+ - intentionally adversarial text designed to bypass classifiers
286
+ - messages where spam intent is only clear across multiple user messages
287
+
288
+ For best results, use Adrash v0 together with:
289
+
290
+ - short user message buffering
291
+ - repeated-message detection
292
+ - link/domain checks
293
+ - rate limits
294
+ - admin review for medium-confidence cases
295
+
296
+ ## Model details
297
+
298
+ | Field | Value |
299
+ |---|---|
300
+ | Model name | `floxoris/adrash-v0` |
301
+ | Task | Binary text classification |
302
+ | Labels | `clean`, `ad_spam` |
303
+ | Base model | `cointegrated/rubert-tiny2` |
304
+ | Main languages | Russian, Ukrainian, English |
305
+ | Max length used in training | 160 tokens |
306
+ | Framework | Transformers / PyTorch |
307
+
308
+ ## Example output
309
+
310
+ ```json
311
+ {
312
+ "label": "ad_spam",
313
+ "clean": 0.0214,
314
+ "ad_spam": 0.9786
315
+ }
316
+ ```
317
+
318
+ ## Citation
319
+
320
+ ```bibtex
321
+ @misc{floxoris_adrash_v0,
322
+ title={Adrash v0: Compact Advertisement and Spam Filter},
323
+ author={Floxoris},
324
+ year={2026},
325
+ publisher={Hugging Face},
326
+ howpublished={https://huggingface.co/floxoris/adrash-v0}
327
+ }
328
+ ```
329
+
330
+ ## Disclaimer
331
+
332
+ Adrash v0 is an experimental moderation model. It should not be used as the only moderation layer in high-risk systems. Always test it on your own real messages before production deployment.