| # Floxoris Harmony v1.2 |
|
|
| **Floxoris Harmony v1.2** is a lightweight binary moderation model for fast toxicity detection in Russian and Ukrainian text. |
|
|
| This version is a continued fine-tuning update of **Floxoris Harmony v1.1**, focused on reducing false positives on positive slang, compliments, and short praise phrases while keeping strong detection of insults and rude commands. |
|
|
| Harmony v1.2 is designed for practical moderation systems where speed, low cost, and simple deployment matter. |
|
|
| ## What Is New In v1.2 |
|
|
| Harmony v1.2 improves the behavior of v1.1 in cases where friendly slang or praise could be incorrectly classified as toxic. |
|
|
| Main focus: |
|
|
| - safer handling of positive slang |
| - fewer false positives on compliments |
| - improved distinction between insult patterns and praise patterns |
| - stronger handling of phrases like `харош`, `ну ты харош`, `ты красавчик` |
| - continued support for Russian and Ukrainian moderation |
| - same lightweight binary output: `safe` / `toxic` |
|
|
| This release is mainly an **anti-false-positive patch** for short praise and casual chat slang. |
|
|
| ## Model Task |
|
|
| The model performs binary text classification: |
|
|
| | Class | Label | |
| |---|---| |
| | `0` | `safe` | |
| | `1` | `toxic` | |
|
|
| The model answers: |
|
|
| > Is this message safe or toxic? |
|
|
| ## Intended Use |
|
|
| Floxoris Harmony v1.2 is suitable for: |
|
|
| - Telegram bot moderation |
| - chat message filtering |
| - community moderation tools |
| - AI assistant safety checks |
| - lightweight moderation APIs |
| - first-stage toxicity detection |
| - Russian/Ukrainian text moderation |
|
|
| It works best as a fast first-pass classifier before more complex moderation logic. |
|
|
| ## Example Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_id = "floxoris/harmony-v1.2" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| |
| text = "ну ты харош" |
| |
| inputs = tokenizer( |
| text, |
| return_tensors="pt", |
| truncation=True, |
| padding=True, |
| max_length=128 |
| ) |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| probs = torch.softmax(outputs.logits, dim=-1)[0] |
| |
| safe_score = probs[0].item() |
| toxic_score = probs[1].item() |
| |
| threshold = 0.65 |
| label = "toxic" if toxic_score >= threshold else "safe" |
| |
| print({ |
| "label": label, |
| "safe_score": round(safe_score, 4), |
| "toxic_score": round(toxic_score, 4) |
| }) |
| ``` |
|
|
| ## Recommended Threshold |
|
|
| Suggested default: |
|
|
| ```python |
| TOXIC_THRESHOLD = 0.65 |
| ``` |
|
|
| Suggested behavior: |
|
|
| | Toxic score | Action | |
| |---|---| |
| | `0.00–0.64` | allow | |
| | `0.65–0.87` | warn / review | |
| | `0.88–1.00` | delete / block | |
|
|
| ## Training Focus |
|
|
| Harmony v1.2 was fine-tuned from: |
|
|
| ```text |
| floxoris/harmony-v1.1 |
| ``` |
|
|
| The main training focus was: |
|
|
| - positive slang |
| - friendly short praise |
| - safe casual phrases |
| - Russian compliments |
| - Ukrainian compliments |
| - safe context examples |
| - mild toxic phrases |
| - toxic vs safe phrase contrast |
|
|
| Examples of safe praise targeted in v1.2: |
|
|
| ```text |
| харош |
| хорош |
| ну ты харош |
| капец ты харош |
| ты красавчик |
| ты красава |
| ты молодец |
| ты лучший |
| ты мощный |
| ты гений |
| ти молодець |
| ти красень |
| ти крутий |
| ``` |
|
|
| Examples of toxic phrases that should remain toxic: |
|
|
| ```text |
| заткнись |
| отвали |
| закрой рот |
| ты тупой |
| ты дурак |
| ну ты даун |
| замовкни |
| відвали |
| ти тупий |
| ``` |
|
|
| Examples of safe contrast phrases: |
|
|
| ```text |
| закрой окно пожалуйста |
| пошёл в магазин |
| тупой угол в геометрии |
| не пиши пароль сюда |
| рот болит после стоматолога |
| ``` |
|
|
| ## Difference Between Versions |
|
|
| | Version | Focus | |
| |---|---| |
| | `harmony-v1` | base lightweight toxicity classifier | |
| | `harmony-v1.1` | improved mild toxicity detection | |
| | `harmony-v1.2` | reduced false positives on praise and positive slang | |
|
|
| ## Example Behavior |
|
|
| Expected behavior: |
|
|
| ```text |
| "ну ты харош" → safe |
| "капец ты харош" → safe |
| "ты красавчик" → safe |
| "закрой окно пожалуйста" → safe |
| "закрой рот" → toxic |
| "ну ты даун" → toxic |
| "заткнись" → toxic |
| ``` |
|
|
| ## API-Style Output Example |
|
|
| ```json |
| { |
| "model": "floxoris/harmony-v1.2", |
| "text": "закрой рот", |
| "label": "toxic", |
| "class": 1, |
| "safe_score": 0.18, |
| "toxic_score": 0.82, |
| "threshold": 0.65, |
| "latency_ms": 3.1 |
| } |
| ``` |
|
|
| ## Limitations |
|
|
| - This is a binary classifier and does not separate toxicity categories. |
| - It does not classify hate speech, threats, profanity, spam, or harassment separately. |
| - It may still miss sarcasm, coded abuse, irony, or context-dependent toxicity. |
| - It may produce false positives on unusual slang or jokes. |
| - It may produce false negatives on creative insults. |
| - Very short messages can be ambiguous. |
| - Performance outside Russian and Ukrainian may be weaker. |
| - It should not be the only moderation layer for high-stakes systems. |
|
|
| Human review is recommended for important moderation decisions. |
|
|
| ## Recommended Production Setup |
|
|
| For real-world moderation: |
|
|
| 1. Run Harmony v1.2 on each message. |
| 2. Use a threshold, recommended default: `0.65`. |
| 3. Treat scores between `0.50` and `0.65` as uncertain if needed. |
| 4. Log false positives and false negatives. |
| 5. Periodically fine-tune future versions on real moderation mistakes. |
| 6. Combine with rule-based checks for extreme cases if necessary. |
|
|
| ## Why This Model |
|
|
| Floxoris Harmony v1.2 is built for projects that need: |
|
|
| - fast inference |
| - low hosting cost |
| - compact model size |
| - simple API deployment |
| - Russian/Ukrainian moderation |
| - practical binary toxicity detection |
| - better behavior on casual slang |
|
|
| It is especially useful for Telegram bots, small communities, AI tools, and lightweight moderation APIs. |
|
|
| ## Summary |
|
|
| **Floxoris Harmony v1.2** is a compact Russian/Ukrainian moderation model focused on fast binary toxicity detection. |
|
|
| Compared to v1.1, this version is less likely to mark friendly praise like `харош`, `ну ты харош`, or `ты красавчик` as toxic, while preserving detection of direct insults and rude commands. |