Changed to correct names

c63615e verified 4 days ago

4.35 kB

	# Floxoris Harmony v0

	Floxoris Harmony v0 is a lightweight binary toxic moderation model for Russian and Ukrainian text. It is designed for fast, low-cost inference in production environments such as Telegram bots, AI assistants, chat filters, and message pre-moderation pipelines.

	Built on top of [`gravitee-io/bert-tiny-toxicity`](https://huggingface.co/gravitee-io/bert-tiny-toxicity), the model focuses on practical toxicity detection with a very small footprint of roughly 40-50 MB, making it suitable for lightweight deployment scenarios.

	## Features

	- Binary toxic moderation
	- Supports Russian and Ukrainian
	- Very small and fast for inference
	- Suitable for real-time moderation pipelines
	- Easy to deploy in lightweight production systems
	- Designed for Telegram bots, assistants, and chat filtering

	## Model Details

	- Task: Binary text classification
	- Base model: `gravitee-io/bert-tiny-toxicity`
	- Languages: Russian, Ukrainian
	- Classes: `not_toxic`, `toxic`
	- Model size: ~40-50 MB
	- License: Apache License 2.0

	## Labels

	The model returns one of two classes:

	- `0` = `not_toxic`
	- `1` = `toxic`

	## Training Details

	The model was fine-tuned for binary toxicity classification on a merged multilingual moderation dataset built from:

	- `ru.parquet`
	- `uk.parquet`
	- `big-ru.parquet`

	### Data Correction

	In `big-ru.parquet`, labels were originally inverted:

	- `0` = toxic
	- `1` = safe

	This issue was corrected before final training.

	### Final Dataset

	After label correction, the datasets were merged, cleaned, and balanced.

	- Total rows: ~122,000
	- Toxic: 61,127
	- Safe / Not toxic: 61,127

	## Example Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "floxoris/harmony-v0"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	id2label = {
	0: "not_toxic",
	1: "toxic",
	}

	texts = [
	"дарова, как день?",
	"ты дибил?",
	]

	inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits
	probs = torch.softmax(logits, dim=-1)
	preds = torch.argmax(probs, dim=-1)

	for text, pred, prob in zip(texts, preds, probs):
	label = id2label[pred.item()]
	confidence = prob[pred.item()].item()
	print(f"{text} -> {label} ({confidence:.4f})")
	```

	## Example Outputs

	Example model behavior on simple test inputs:

	```text
	"дарова, как день?"
	-> not_toxic (~0.91)

	"ты дибил?"
	-> toxic (~0.80)
	```

	These examples are illustrative and should not be treated as a full benchmark.

	## Intended Use

	Floxoris Harmony v0 is intended for fast and lightweight toxic moderation in:

	- Telegram bots
	- AI assistants
	- Chat filtering systems
	- Message pre-moderation pipelines
	- Lightweight production deployments

	Typical use cases include:

	- filtering incoming user messages before they reach a model or operator
	- flagging potentially toxic content for review
	- reducing moderation cost in high-volume chat environments
	- adding a first-pass safety layer to conversational systems

	## Limitations

	- This is a binary moderation model and does not classify toxicity types
	- It may miss subtle harassment, sarcasm, or context-dependent abuse
	- It may produce false positives on slang, irony, or emotionally charged messages
	- Performance may degrade on domain-specific jargon, mixed-language text, or heavily misspelled input
	- It is intended as a lightweight moderation layer, not a full safety system
	- Human review is still recommended for high-stakes moderation decisions

	## License

	This model is released under the Apache License 2.0.

	## Future Versions

	Planned directions for future releases:

	- v1: improved accuracy and calibration
	- v2: broader multilingual coverage and more robust edge-case handling
	- future iterations may include better handling of slang, implicit toxicity, and context-aware moderation

	## Summary

	Floxoris Harmony v0 is a compact toxic moderation model optimized for practical deployment where speed, cost, and simplicity matter. It is best suited as a lightweight first-stage moderation component for Russian and Ukrainian text pipelines.