# Floxoris Harmony v1.2 **Floxoris Harmony v1.2** is a lightweight binary moderation model for fast toxicity detection in Russian and Ukrainian text. This version is a continued fine-tuning update of **Floxoris Harmony v1.1**, focused on reducing false positives on positive slang, compliments, and short praise phrases while keeping strong detection of insults and rude commands. Harmony v1.2 is designed for practical moderation systems where speed, low cost, and simple deployment matter. ## What Is New In v1.2 Harmony v1.2 improves the behavior of v1.1 in cases where friendly slang or praise could be incorrectly classified as toxic. Main focus: - safer handling of positive slang - fewer false positives on compliments - improved distinction between insult patterns and praise patterns - stronger handling of phrases like `харош`, `ну ты харош`, `ты красавчик` - continued support for Russian and Ukrainian moderation - same lightweight binary output: `safe` / `toxic` This release is mainly an **anti-false-positive patch** for short praise and casual chat slang. ## Model Task The model performs binary text classification: | Class | Label | |---|---| | `0` | `safe` | | `1` | `toxic` | The model answers: > Is this message safe or toxic? ## Intended Use Floxoris Harmony v1.2 is suitable for: - Telegram bot moderation - chat message filtering - community moderation tools - AI assistant safety checks - lightweight moderation APIs - first-stage toxicity detection - Russian/Ukrainian text moderation It works best as a fast first-pass classifier before more complex moderation logic. ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_id = "floxoris/harmony-v1.2" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) text = "ну ты харош" inputs = tokenizer( text, return_tensors="pt", truncation=True, padding=True, max_length=128 ) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1)[0] safe_score = probs[0].item() toxic_score = probs[1].item() threshold = 0.65 label = "toxic" if toxic_score >= threshold else "safe" print({ "label": label, "safe_score": round(safe_score, 4), "toxic_score": round(toxic_score, 4) }) ``` ## Recommended Threshold Suggested default: ```python TOXIC_THRESHOLD = 0.65 ``` Suggested behavior: | Toxic score | Action | |---|---| | `0.00–0.64` | allow | | `0.65–0.87` | warn / review | | `0.88–1.00` | delete / block | ## Training Focus Harmony v1.2 was fine-tuned from: ```text floxoris/harmony-v1.1 ``` The main training focus was: - positive slang - friendly short praise - safe casual phrases - Russian compliments - Ukrainian compliments - safe context examples - mild toxic phrases - toxic vs safe phrase contrast Examples of safe praise targeted in v1.2: ```text харош хорош ну ты харош капец ты харош ты красавчик ты красава ты молодец ты лучший ты мощный ты гений ти молодець ти красень ти крутий ``` Examples of toxic phrases that should remain toxic: ```text заткнись отвали закрой рот ты тупой ты дурак ну ты даун замовкни відвали ти тупий ``` Examples of safe contrast phrases: ```text закрой окно пожалуйста пошёл в магазин тупой угол в геометрии не пиши пароль сюда рот болит после стоматолога ``` ## Difference Between Versions | Version | Focus | |---|---| | `harmony-v1` | base lightweight toxicity classifier | | `harmony-v1.1` | improved mild toxicity detection | | `harmony-v1.2` | reduced false positives on praise and positive slang | ## Example Behavior Expected behavior: ```text "ну ты харош" → safe "капец ты харош" → safe "ты красавчик" → safe "закрой окно пожалуйста" → safe "закрой рот" → toxic "ну ты даун" → toxic "заткнись" → toxic ``` ## API-Style Output Example ```json { "model": "floxoris/harmony-v1.2", "text": "закрой рот", "label": "toxic", "class": 1, "safe_score": 0.18, "toxic_score": 0.82, "threshold": 0.65, "latency_ms": 3.1 } ``` ## Limitations - This is a binary classifier and does not separate toxicity categories. - It does not classify hate speech, threats, profanity, spam, or harassment separately. - It may still miss sarcasm, coded abuse, irony, or context-dependent toxicity. - It may produce false positives on unusual slang or jokes. - It may produce false negatives on creative insults. - Very short messages can be ambiguous. - Performance outside Russian and Ukrainian may be weaker. - It should not be the only moderation layer for high-stakes systems. Human review is recommended for important moderation decisions. ## Recommended Production Setup For real-world moderation: 1. Run Harmony v1.2 on each message. 2. Use a threshold, recommended default: `0.65`. 3. Treat scores between `0.50` and `0.65` as uncertain if needed. 4. Log false positives and false negatives. 5. Periodically fine-tune future versions on real moderation mistakes. 6. Combine with rule-based checks for extreme cases if necessary. ## Why This Model Floxoris Harmony v1.2 is built for projects that need: - fast inference - low hosting cost - compact model size - simple API deployment - Russian/Ukrainian moderation - practical binary toxicity detection - better behavior on casual slang It is especially useful for Telegram bots, small communities, AI tools, and lightweight moderation APIs. ## Summary **Floxoris Harmony v1.2** is a compact Russian/Ukrainian moderation model focused on fast binary toxicity detection. Compared to v1.1, this version is less likely to mark friendly praise like `харош`, `ну ты харош`, or `ты красавчик` as toxic, while preserving detection of direct insults and rude commands.