| # Floxoris Harmony v0 |
|
|
| **Floxoris Harmony v0** is a lightweight binary toxic moderation model for **Russian and Ukrainian** text. It is designed for fast, low-cost inference in production environments such as Telegram bots, AI assistants, chat filters, and message pre-moderation pipelines. |
|
|
| Built on top of [`gravitee-io/bert-tiny-toxicity`](https://huggingface.co/gravitee-io/bert-tiny-toxicity), the model focuses on practical toxicity detection with a very small footprint of roughly **40-50 MB**, making it suitable for lightweight deployment scenarios. |
|
|
| ## Features |
|
|
| - Binary toxic moderation |
| - Supports **Russian** and **Ukrainian** |
| - Very small and fast for inference |
| - Suitable for real-time moderation pipelines |
| - Easy to deploy in lightweight production systems |
| - Designed for Telegram bots, assistants, and chat filtering |
|
|
| ## Model Details |
|
|
| - **Task:** Binary text classification |
| - **Base model:** `gravitee-io/bert-tiny-toxicity` |
| - **Languages:** Russian, Ukrainian |
| - **Classes:** `not_toxic`, `toxic` |
| - **Model size:** ~40-50 MB |
| - **License:** Apache License 2.0 |
|
|
| ## Labels |
|
|
| The model returns one of two classes: |
|
|
| - `0` = `not_toxic` |
| - `1` = `toxic` |
|
|
| ## Training Details |
|
|
| The model was fine-tuned for binary toxicity classification on a merged multilingual moderation dataset built from: |
|
|
| - `ru.parquet` |
| - `uk.parquet` |
| - `big-ru.parquet` |
|
|
| ### Data Correction |
|
|
| In `big-ru.parquet`, labels were originally inverted: |
|
|
| - `0` = toxic |
| - `1` = safe |
|
|
| This issue was corrected before final training. |
|
|
| ### Final Dataset |
|
|
| After label correction, the datasets were merged, cleaned, and balanced. |
|
|
| - **Total rows:** ~122,000 |
| - **Toxic:** 61,127 |
| - **Safe / Not toxic:** 61,127 |
|
|
| ## Example Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_name = "floxoris/harmony-v0" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| |
| id2label = { |
| 0: "not_toxic", |
| 1: "toxic", |
| } |
| |
| texts = [ |
| "дарова, как день?", |
| "ты дибил?", |
| ] |
| |
| inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") |
| |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| probs = torch.softmax(logits, dim=-1) |
| preds = torch.argmax(probs, dim=-1) |
| |
| for text, pred, prob in zip(texts, preds, probs): |
| label = id2label[pred.item()] |
| confidence = prob[pred.item()].item() |
| print(f"{text} -> {label} ({confidence:.4f})") |
| ``` |
|
|
| ## Example Outputs |
|
|
| Example model behavior on simple test inputs: |
|
|
| ```text |
| "дарова, как день?" |
| -> not_toxic (~0.91) |
| |
| "ты дибил?" |
| -> toxic (~0.80) |
| ``` |
|
|
| These examples are illustrative and should not be treated as a full benchmark. |
|
|
| ## Intended Use |
|
|
| Floxoris Harmony v0 is intended for fast and lightweight toxic moderation in: |
|
|
| - Telegram bots |
| - AI assistants |
| - Chat filtering systems |
| - Message pre-moderation pipelines |
| - Lightweight production deployments |
|
|
| Typical use cases include: |
|
|
| - filtering incoming user messages before they reach a model or operator |
| - flagging potentially toxic content for review |
| - reducing moderation cost in high-volume chat environments |
| - adding a first-pass safety layer to conversational systems |
|
|
| ## Limitations |
|
|
| - This is a **binary** moderation model and does not classify toxicity types |
| - It may miss subtle harassment, sarcasm, or context-dependent abuse |
| - It may produce false positives on slang, irony, or emotionally charged messages |
| - Performance may degrade on domain-specific jargon, mixed-language text, or heavily misspelled input |
| - It is intended as a lightweight moderation layer, not a full safety system |
| - Human review is still recommended for high-stakes moderation decisions |
|
|
| ## License |
|
|
| This model is released under the **Apache License 2.0**. |
|
|
| ## Future Versions |
|
|
| Planned directions for future releases: |
|
|
| - **v1:** improved accuracy and calibration |
| - **v2:** broader multilingual coverage and more robust edge-case handling |
| - future iterations may include better handling of slang, implicit toxicity, and context-aware moderation |
|
|
| ## Summary |
|
|
| Floxoris Harmony v0 is a compact toxic moderation model optimized for practical deployment where **speed, cost, and simplicity** matter. It is best suited as a lightweight first-stage moderation component for Russian and Ukrainian text pipelines. |
|
|