# Floxoris Harmony v1.2

**Floxoris Harmony v1.2** is a lightweight binary moderation model for fast toxicity detection in Russian and Ukrainian text.

This version is a continued fine-tuning update of **Floxoris Harmony v1.1**, focused on reducing false positives on positive slang, compliments, and short praise phrases while keeping strong detection of insults and rude commands.

Harmony v1.2 is designed for practical moderation systems where speed, low cost, and simple deployment matter.

## What Is New In v1.2

Harmony v1.2 improves the behavior of v1.1 in cases where friendly slang or praise could be incorrectly classified as toxic.

Main focus:

- safer handling of positive slang
- fewer false positives on compliments
- improved distinction between insult patterns and praise patterns
- stronger handling of phrases like `харош`, `ну ты харош`, `ты красавчик`
- continued support for Russian and Ukrainian moderation
- same lightweight binary output: `safe` / `toxic`

This release is mainly an **anti-false-positive patch** for short praise and casual chat slang.

## Model Task

The model performs binary text classification:

| Class | Label |
|---|---|
| `0` | `safe` |
| `1` | `toxic` |

The model answers:

> Is this message safe or toxic?

## Intended Use

Floxoris Harmony v1.2 is suitable for:

- Telegram bot moderation
- chat message filtering
- community moderation tools
- AI assistant safety checks
- lightweight moderation APIs
- first-stage toxicity detection
- Russian/Ukrainian text moderation

It works best as a fast first-pass classifier before more complex moderation logic.

## Example Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "floxoris/harmony-v1.2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "ну ты харош"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)[0]

safe_score = probs[0].item()
toxic_score = probs[1].item()

threshold = 0.65
label = "toxic" if toxic_score >= threshold else "safe"

print({
    "label": label,
    "safe_score": round(safe_score, 4),
    "toxic_score": round(toxic_score, 4)
})
```

## Recommended Threshold

Suggested default:

```python
TOXIC_THRESHOLD = 0.65
```

Suggested behavior:

| Toxic score | Action |
|---|---|
| `0.00–0.64` | allow |
| `0.65–0.87` | warn / review |
| `0.88–1.00` | delete / block |

## Training Focus

Harmony v1.2 was fine-tuned from:

```text
floxoris/harmony-v1.1
```

The main training focus was:

- positive slang
- friendly short praise
- safe casual phrases
- Russian compliments
- Ukrainian compliments
- safe context examples
- mild toxic phrases
- toxic vs safe phrase contrast

Examples of safe praise targeted in v1.2:

```text
харош
хорош
ну ты харош
капец ты харош
ты красавчик
ты красава
ты молодец
ты лучший
ты мощный
ты гений
ти молодець
ти красень
ти крутий
```

Examples of toxic phrases that should remain toxic:

```text
заткнись
отвали
закрой рот
ты тупой
ты дурак
ну ты даун
замовкни
відвали
ти тупий
```

Examples of safe contrast phrases:

```text
закрой окно пожалуйста
пошёл в магазин
тупой угол в геометрии
не пиши пароль сюда
рот болит после стоматолога
```

## Difference Between Versions

| Version | Focus |
|---|---|
| `harmony-v1` | base lightweight toxicity classifier |
| `harmony-v1.1` | improved mild toxicity detection |
| `harmony-v1.2` | reduced false positives on praise and positive slang |

## Example Behavior

Expected behavior:

```text
"ну ты харош"             → safe
"капец ты харош"          → safe
"ты красавчик"            → safe
"закрой окно пожалуйста"  → safe
"закрой рот"              → toxic
"ну ты даун"              → toxic
"заткнись"                → toxic
```

## API-Style Output Example

```json
{
  "model": "floxoris/harmony-v1.2",
  "text": "закрой рот",
  "label": "toxic",
  "class": 1,
  "safe_score": 0.18,
  "toxic_score": 0.82,
  "threshold": 0.65,
  "latency_ms": 3.1
}
```

## Limitations

- This is a binary classifier and does not separate toxicity categories.
- It does not classify hate speech, threats, profanity, spam, or harassment separately.
- It may still miss sarcasm, coded abuse, irony, or context-dependent toxicity.
- It may produce false positives on unusual slang or jokes.
- It may produce false negatives on creative insults.
- Very short messages can be ambiguous.
- Performance outside Russian and Ukrainian may be weaker.
- It should not be the only moderation layer for high-stakes systems.

Human review is recommended for important moderation decisions.

## Recommended Production Setup

For real-world moderation:

1. Run Harmony v1.2 on each message.
2. Use a threshold, recommended default: `0.65`.
3. Treat scores between `0.50` and `0.65` as uncertain if needed.
4. Log false positives and false negatives.
5. Periodically fine-tune future versions on real moderation mistakes.
6. Combine with rule-based checks for extreme cases if necessary.

## Why This Model

Floxoris Harmony v1.2 is built for projects that need:

- fast inference
- low hosting cost
- compact model size
- simple API deployment
- Russian/Ukrainian moderation
- practical binary toxicity detection
- better behavior on casual slang

It is especially useful for Telegram bots, small communities, AI tools, and lightweight moderation APIs.

## Summary

**Floxoris Harmony v1.2** is a compact Russian/Ukrainian moderation model focused on fast binary toxicity detection.

Compared to v1.1, this version is less likely to mark friendly praise like `харош`, `ну ты харош`, or `ты красавчик` as toxic, while preserving detection of direct insults and rude commands.