harmony-v0 / README.md
sollamon's picture
Changed to correct names
c63615e verified

Floxoris Harmony v0

Floxoris Harmony v0 is a lightweight binary toxic moderation model for Russian and Ukrainian text. It is designed for fast, low-cost inference in production environments such as Telegram bots, AI assistants, chat filters, and message pre-moderation pipelines.

Built on top of gravitee-io/bert-tiny-toxicity, the model focuses on practical toxicity detection with a very small footprint of roughly 40-50 MB, making it suitable for lightweight deployment scenarios.

Features

  • Binary toxic moderation
  • Supports Russian and Ukrainian
  • Very small and fast for inference
  • Suitable for real-time moderation pipelines
  • Easy to deploy in lightweight production systems
  • Designed for Telegram bots, assistants, and chat filtering

Model Details

  • Task: Binary text classification
  • Base model: gravitee-io/bert-tiny-toxicity
  • Languages: Russian, Ukrainian
  • Classes: not_toxic, toxic
  • Model size: ~40-50 MB
  • License: Apache License 2.0

Labels

The model returns one of two classes:

  • 0 = not_toxic
  • 1 = toxic

Training Details

The model was fine-tuned for binary toxicity classification on a merged multilingual moderation dataset built from:

  • ru.parquet
  • uk.parquet
  • big-ru.parquet

Data Correction

In big-ru.parquet, labels were originally inverted:

  • 0 = toxic
  • 1 = safe

This issue was corrected before final training.

Final Dataset

After label correction, the datasets were merged, cleaned, and balanced.

  • Total rows: ~122,000
  • Toxic: 61,127
  • Safe / Not toxic: 61,127

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "floxoris/harmony-v0"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

id2label = {
    0: "not_toxic",
    1: "toxic",
}

texts = [
    "дарова, как день?",
    "ты дибил?",
]

inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    preds = torch.argmax(probs, dim=-1)

for text, pred, prob in zip(texts, preds, probs):
    label = id2label[pred.item()]
    confidence = prob[pred.item()].item()
    print(f"{text} -> {label} ({confidence:.4f})")

Example Outputs

Example model behavior on simple test inputs:

"дарова, как день?"
-> not_toxic (~0.91)

"ты дибил?"
-> toxic (~0.80)

These examples are illustrative and should not be treated as a full benchmark.

Intended Use

Floxoris Harmony v0 is intended for fast and lightweight toxic moderation in:

  • Telegram bots
  • AI assistants
  • Chat filtering systems
  • Message pre-moderation pipelines
  • Lightweight production deployments

Typical use cases include:

  • filtering incoming user messages before they reach a model or operator
  • flagging potentially toxic content for review
  • reducing moderation cost in high-volume chat environments
  • adding a first-pass safety layer to conversational systems

Limitations

  • This is a binary moderation model and does not classify toxicity types
  • It may miss subtle harassment, sarcasm, or context-dependent abuse
  • It may produce false positives on slang, irony, or emotionally charged messages
  • Performance may degrade on domain-specific jargon, mixed-language text, or heavily misspelled input
  • It is intended as a lightweight moderation layer, not a full safety system
  • Human review is still recommended for high-stakes moderation decisions

License

This model is released under the Apache License 2.0.

Future Versions

Planned directions for future releases:

  • v1: improved accuracy and calibration
  • v2: broader multilingual coverage and more robust edge-case handling
  • future iterations may include better handling of slang, implicit toxicity, and context-aware moderation

Summary

Floxoris Harmony v0 is a compact toxic moderation model optimized for practical deployment where speed, cost, and simplicity matter. It is best suited as a lightweight first-stage moderation component for Russian and Ukrainian text pipelines.