davidberenstein1957

Upload README.md with huggingface_hub

75808db verified 4 months ago

11.5 kB

	---
	base_model: minishlab/potion-base-4m
	datasets:
	- enguard/multi-lingual-prompt-moderation
	library_name: model2vec
	license: mit
	model_name: enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation
	tags:
	- static-embeddings
	- text-classification
	- model2vec
	---

	# enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation

	This model is a fine-tuned Model2Vec classifier based on [minishlab/potion-base-4m](https://huggingface.co/minishlab/potion-base-4m) for the prompt-hate-speech-binary found in the [enguard/multi-lingual-prompt-moderation](https://huggingface.co/datasets/enguard/multi-lingual-prompt-moderation) dataset.



	## Installation

	```bash
	pip install model2vec[inference]
	```

	## Usage

	```python
	from model2vec.inference import StaticModelPipeline

	model = StaticModelPipeline.from_pretrained(
	"enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation"
	)


	# Supports single texts. Format input as a single text:
	text = "Example sentence"

	model.predict([text])
	model.predict_proba([text])

	```

	## Why should you use these models?

	- Optimized for precision to reduce false positives.
	- Extremely fast inference: up to x500 faster than SetFit.

	## This model variant

	Below is a quick overview of the model variant and core metrics.

	\| Field \| Value \|
	\|---\|---\|
	\| Classifies \| prompt-hate-speech-binary \|
	\| Base Model \| [minishlab/potion-base-4m](https://huggingface.co/minishlab/potion-base-4m) \|
	\| Precision \| 0.9198 \|
	\| Recall \| 0.7831 \|
	\| F1 \| 0.8460 \|

	### Confusion Matrix

	\| True \ Predicted \| FAIL \| PASS \|
	\| --- \| --- \| --- \|
	\| FAIL \| 196 \| 54 \|
	\| PASS \| 17 \| 233 \|

	<details>
	<summary><b>Full metrics (JSON)</b></summary>

	```json
	{
	"FAIL": {
	"precision": 0.9198113207547169,
	"recall": 0.7831325301204819,
	"f1-score": 0.8459869848156182,
	"support": 249.0
	},
	"PASS": {
	"precision": 0.8118466898954704,
	"recall": 0.932,
	"f1-score": 0.8677839851024208,
	"support": 250.0
	},
	"accuracy": 0.8577154308617234,
	"macro avg": {
	"precision": 0.8658290053250937,
	"recall": 0.8575662650602409,
	"f1-score": 0.8568854849590195,
	"support": 499.0
	},
	"weighted avg": {
	"precision": 0.8657208243322486,
	"recall": 0.8577154308617234,
	"f1-score": 0.8569073256406696,
	"support": 499.0
	}
	}
	```
	</details>


	<details>
	<summary><b>Sample Predictions</b></summary>

	\| Text \| True Label \| Predicted Label \|
	\|------\|------------\|-----------------\|
	\| It really only works on bitchy women because they get insulted that I think they look old \| FAIL \| FAIL \|
	\| You mean the Master of Mesopotamia \| PASS \| PASS \|
	\| Oh fuck that's so stupid. You believed his father to be dead and you acted like a decent person. Fuck this gay earth \| FAIL \| FAIL \|
	\| I'm taking it. Thanks for suggestion. \| PASS \| PASS \|
	\| I think you fell for a person, not a gender. \| PASS \| PASS \|
	\| It really only works on bitchy women because they get insulted that I think they look old \| FAIL \| FAIL \|
	</details>


	<details>
	<summary><b>Prediction Speed Benchmarks</b></summary>

	\| Dataset Size \| Time (seconds) \| Predictions/Second \|
	\|--------------\|----------------\|---------------------\|
	\| 1 \| 0.0004 \| 2493.64 \|
	\| 500 \| 0.0402 \| 12439.95 \|
	\| 500 \| 0.0312 \| 16042.59 \|
	</details>


	## Other model variants

	Below is a general overview of the best-performing models for each dataset variant.

	\| Classifies \| Model \| Precision \| Recall \| F1 \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| prompt-harassment-binary \| [enguard/tiny-guard-2m-en-prompt-harassment-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-harassment-binary-moderation) \| 0.8788 \| 0.7180 \| 0.7903 \|
	\| prompt-harmfulness-binary \| [enguard/tiny-guard-2m-en-prompt-harmfulness-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-harmfulness-binary-moderation) \| 0.8543 \| 0.7256 \| 0.7847 \|
	\| prompt-harmfulness-multilabel \| [enguard/tiny-guard-2m-en-prompt-harmfulness-multilabel-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-harmfulness-multilabel-moderation) \| 0.7687 \| 0.5006 \| 0.6064 \|
	\| prompt-hate-speech-binary \| [enguard/tiny-guard-2m-en-prompt-hate-speech-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-hate-speech-binary-moderation) \| 0.9141 \| 0.7269 \| 0.8098 \|
	\| prompt-self-harm-binary \| [enguard/tiny-guard-2m-en-prompt-self-harm-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-self-harm-binary-moderation) \| 0.8929 \| 0.7143 \| 0.7937 \|
	\| prompt-sexual-content-binary \| [enguard/tiny-guard-2m-en-prompt-sexual-content-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-sexual-content-binary-moderation) \| 0.9256 \| 0.8141 \| 0.8663 \|
	\| prompt-violence-binary \| [enguard/tiny-guard-2m-en-prompt-violence-binary-moderation](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-violence-binary-moderation) \| 0.9017 \| 0.7645 \| 0.8275 \|
	\| prompt-harassment-binary \| [enguard/tiny-guard-4m-en-prompt-harassment-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-harassment-binary-moderation) \| 0.8895 \| 0.7160 \| 0.7934 \|
	\| prompt-harmfulness-binary \| [enguard/tiny-guard-4m-en-prompt-harmfulness-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-harmfulness-binary-moderation) \| 0.8565 \| 0.7540 \| 0.8020 \|
	\| prompt-harmfulness-multilabel \| [enguard/tiny-guard-4m-en-prompt-harmfulness-multilabel-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-harmfulness-multilabel-moderation) \| 0.7924 \| 0.5663 \| 0.6606 \|
	\| prompt-hate-speech-binary \| [enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation) \| 0.9198 \| 0.7831 \| 0.8460 \|
	\| prompt-self-harm-binary \| [enguard/tiny-guard-4m-en-prompt-self-harm-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-self-harm-binary-moderation) \| 0.9062 \| 0.8286 \| 0.8657 \|
	\| prompt-sexual-content-binary \| [enguard/tiny-guard-4m-en-prompt-sexual-content-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-sexual-content-binary-moderation) \| 0.9371 \| 0.8468 \| 0.8897 \|
	\| prompt-violence-binary \| [enguard/tiny-guard-4m-en-prompt-violence-binary-moderation](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-violence-binary-moderation) \| 0.8851 \| 0.8370 \| 0.8603 \|
	\| prompt-harassment-binary \| [enguard/tiny-guard-8m-en-prompt-harassment-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-harassment-binary-moderation) \| 0.8895 \| 0.7767 \| 0.8292 \|
	\| prompt-harmfulness-binary \| [enguard/tiny-guard-8m-en-prompt-harmfulness-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-harmfulness-binary-moderation) \| 0.8627 \| 0.7912 \| 0.8254 \|
	\| prompt-harmfulness-multilabel \| [enguard/tiny-guard-8m-en-prompt-harmfulness-multilabel-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-harmfulness-multilabel-moderation) \| 0.7902 \| 0.5926 \| 0.6773 \|
	\| prompt-hate-speech-binary \| [enguard/tiny-guard-8m-en-prompt-hate-speech-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-hate-speech-binary-moderation) \| 0.9152 \| 0.8233 \| 0.8668 \|
	\| prompt-self-harm-binary \| [enguard/tiny-guard-8m-en-prompt-self-harm-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-self-harm-binary-moderation) \| 0.9667 \| 0.8286 \| 0.8923 \|
	\| prompt-sexual-content-binary \| [enguard/tiny-guard-8m-en-prompt-sexual-content-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-sexual-content-binary-moderation) \| 0.9382 \| 0.8881 \| 0.9125 \|
	\| prompt-violence-binary \| [enguard/tiny-guard-8m-en-prompt-violence-binary-moderation](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-violence-binary-moderation) \| 0.9042 \| 0.8551 \| 0.8790 \|
	\| prompt-harassment-binary \| [enguard/small-guard-32m-en-prompt-harassment-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-harassment-binary-moderation) \| 0.8809 \| 0.7964 \| 0.8365 \|
	\| prompt-harmfulness-binary \| [enguard/small-guard-32m-en-prompt-harmfulness-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-harmfulness-binary-moderation) \| 0.8548 \| 0.8239 \| 0.8391 \|
	\| prompt-harmfulness-multilabel \| [enguard/small-guard-32m-en-prompt-harmfulness-multilabel-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-harmfulness-multilabel-moderation) \| 0.8065 \| 0.6494 \| 0.7195 \|
	\| prompt-hate-speech-binary \| [enguard/small-guard-32m-en-prompt-hate-speech-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-hate-speech-binary-moderation) \| 0.9207 \| 0.8394 \| 0.8782 \|
	\| prompt-self-harm-binary \| [enguard/small-guard-32m-en-prompt-self-harm-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-self-harm-binary-moderation) \| 0.9333 \| 0.8000 \| 0.8615 \|
	\| prompt-sexual-content-binary \| [enguard/small-guard-32m-en-prompt-sexual-content-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-sexual-content-binary-moderation) \| 0.9328 \| 0.8847 \| 0.9081 \|
	\| prompt-violence-binary \| [enguard/small-guard-32m-en-prompt-violence-binary-moderation](https://huggingface.co/enguard/small-guard-32m-en-prompt-violence-binary-moderation) \| 0.9077 \| 0.8913 \| 0.8995 \|
	\| prompt-harassment-binary \| [enguard/medium-guard-128m-xx-prompt-harassment-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-harassment-binary-moderation) \| 0.8660 \| 0.8034 \| 0.8336 \|
	\| prompt-harmfulness-binary \| [enguard/medium-guard-128m-xx-prompt-harmfulness-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-harmfulness-binary-moderation) \| 0.8457 \| 0.8074 \| 0.8261 \|
	\| prompt-harmfulness-multilabel \| [enguard/medium-guard-128m-xx-prompt-harmfulness-multilabel-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-harmfulness-multilabel-moderation) \| 0.7795 \| 0.6516 \| 0.7098 \|
	\| prompt-hate-speech-binary \| [enguard/medium-guard-128m-xx-prompt-hate-speech-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-hate-speech-binary-moderation) \| 0.8826 \| 0.8153 \| 0.8476 \|
	\| prompt-self-harm-binary \| [enguard/medium-guard-128m-xx-prompt-self-harm-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-self-harm-binary-moderation) \| 0.9375 \| 0.8571 \| 0.8955 \|
	\| prompt-sexual-content-binary \| [enguard/medium-guard-128m-xx-prompt-sexual-content-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-sexual-content-binary-moderation) \| 0.9153 \| 0.8744 \| 0.8944 \|
	\| prompt-violence-binary \| [enguard/medium-guard-128m-xx-prompt-violence-binary-moderation](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-violence-binary-moderation) \| 0.8821 \| 0.8406 \| 0.8609 \|

	## Resources

	- Awesome AI Guardrails: <https://github.com/enguard-ai/awesome-ai-guardails>
	- Model2Vec: https://github.com/MinishLab/model2vec
	- Docs: https://minish.ai/packages/model2vec/introduction

	## Citation

	If you use this model, please cite Model2Vec:

	```
	@software{minishlab2024model2vec,
	author = {Stephan Tulkens and {van Dongen}, Thomas},
	title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
	year = {2024},
	publisher = {Zenodo},
	doi = {10.5281/zenodo.17270888},
	url = {https://github.com/MinishLab/model2vec},
	license = {MIT}
	}
	```