harmony-v1.2 / README.md
sollamon's picture
Add readme
2b4e3bc verified

Floxoris Harmony v1.2

Floxoris Harmony v1.2 is a lightweight binary moderation model for fast toxicity detection in Russian and Ukrainian text.

This version is a continued fine-tuning update of Floxoris Harmony v1.1, focused on reducing false positives on positive slang, compliments, and short praise phrases while keeping strong detection of insults and rude commands.

Harmony v1.2 is designed for practical moderation systems where speed, low cost, and simple deployment matter.

What Is New In v1.2

Harmony v1.2 improves the behavior of v1.1 in cases where friendly slang or praise could be incorrectly classified as toxic.

Main focus:

  • safer handling of positive slang
  • fewer false positives on compliments
  • improved distinction between insult patterns and praise patterns
  • stronger handling of phrases like харош, ну ты харош, ты красавчик
  • continued support for Russian and Ukrainian moderation
  • same lightweight binary output: safe / toxic

This release is mainly an anti-false-positive patch for short praise and casual chat slang.

Model Task

The model performs binary text classification:

Class Label
0 safe
1 toxic

The model answers:

Is this message safe or toxic?

Intended Use

Floxoris Harmony v1.2 is suitable for:

  • Telegram bot moderation
  • chat message filtering
  • community moderation tools
  • AI assistant safety checks
  • lightweight moderation APIs
  • first-stage toxicity detection
  • Russian/Ukrainian text moderation

It works best as a fast first-pass classifier before more complex moderation logic.

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "floxoris/harmony-v1.2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "ну ты харош"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)[0]

safe_score = probs[0].item()
toxic_score = probs[1].item()

threshold = 0.65
label = "toxic" if toxic_score >= threshold else "safe"

print({
    "label": label,
    "safe_score": round(safe_score, 4),
    "toxic_score": round(toxic_score, 4)
})

Recommended Threshold

Suggested default:

TOXIC_THRESHOLD = 0.65

Suggested behavior:

Toxic score Action
0.00–0.64 allow
0.65–0.87 warn / review
0.88–1.00 delete / block

Training Focus

Harmony v1.2 was fine-tuned from:

floxoris/harmony-v1.1

The main training focus was:

  • positive slang
  • friendly short praise
  • safe casual phrases
  • Russian compliments
  • Ukrainian compliments
  • safe context examples
  • mild toxic phrases
  • toxic vs safe phrase contrast

Examples of safe praise targeted in v1.2:

харош
хорош
ну ты харош
капец ты харош
ты красавчик
ты красава
ты молодец
ты лучший
ты мощный
ты гений
ти молодець
ти красень
ти крутий

Examples of toxic phrases that should remain toxic:

заткнись
отвали
закрой рот
ты тупой
ты дурак
ну ты даун
замовкни
відвали
ти тупий

Examples of safe contrast phrases:

закрой окно пожалуйста
пошёл в магазин
тупой угол в геометрии
не пиши пароль сюда
рот болит после стоматолога

Difference Between Versions

Version Focus
harmony-v1 base lightweight toxicity classifier
harmony-v1.1 improved mild toxicity detection
harmony-v1.2 reduced false positives on praise and positive slang

Example Behavior

Expected behavior:

"ну ты харош"             → safe
"капец ты харош"          → safe
"ты красавчик"            → safe
"закрой окно пожалуйста"  → safe
"закрой рот"              → toxic
"ну ты даун"              → toxic
"заткнись"                → toxic

API-Style Output Example

{
  "model": "floxoris/harmony-v1.2",
  "text": "закрой рот",
  "label": "toxic",
  "class": 1,
  "safe_score": 0.18,
  "toxic_score": 0.82,
  "threshold": 0.65,
  "latency_ms": 3.1
}

Limitations

  • This is a binary classifier and does not separate toxicity categories.
  • It does not classify hate speech, threats, profanity, spam, or harassment separately.
  • It may still miss sarcasm, coded abuse, irony, or context-dependent toxicity.
  • It may produce false positives on unusual slang or jokes.
  • It may produce false negatives on creative insults.
  • Very short messages can be ambiguous.
  • Performance outside Russian and Ukrainian may be weaker.
  • It should not be the only moderation layer for high-stakes systems.

Human review is recommended for important moderation decisions.

Recommended Production Setup

For real-world moderation:

  1. Run Harmony v1.2 on each message.
  2. Use a threshold, recommended default: 0.65.
  3. Treat scores between 0.50 and 0.65 as uncertain if needed.
  4. Log false positives and false negatives.
  5. Periodically fine-tune future versions on real moderation mistakes.
  6. Combine with rule-based checks for extreme cases if necessary.

Why This Model

Floxoris Harmony v1.2 is built for projects that need:

  • fast inference
  • low hosting cost
  • compact model size
  • simple API deployment
  • Russian/Ukrainian moderation
  • practical binary toxicity detection
  • better behavior on casual slang

It is especially useful for Telegram bots, small communities, AI tools, and lightweight moderation APIs.

Summary

Floxoris Harmony v1.2 is a compact Russian/Ukrainian moderation model focused on fast binary toxicity detection.

Compared to v1.1, this version is less likely to mark friendly praise like харош, ну ты харош, or ты красавчик as toxic, while preserving detection of direct insults and rude commands.