|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- text-classification |
|
|
- prompt-filtering |
|
|
- moderation |
|
|
- distilbert |
|
|
- transformers |
|
|
datasets: |
|
|
- VerifiedPrompts/cntxt-class-final |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
widget: |
|
|
- text: "Write a LinkedIn post about eco-friendly tech for Gen Z entrepreneurs." |
|
|
example_title: Context-rich prompt |
|
|
- text: "Write something" |
|
|
example_title: Vague prompt |
|
|
--- |
|
|
# π Model Card: CNTXT-Filter-Prompt-Opt |
|
|
|
|
|
## π Model Overview |
|
|
**CNTXT-Filter-Prompt-Opt** is a lightweight, high-accuracy text classification model designed to evaluate the **contextual completeness of user prompts** submitted to LLMs. |
|
|
It acts as a **gatekeeper** before generation, helping eliminate vague or spam-like input and ensuring only quality prompts proceed to LLM2. |
|
|
|
|
|
- **Base model**: `distilbert-base-uncased` |
|
|
- **Trained on**: 200k labeled prompts |
|
|
- **Purpose**: Prompt validation, spam filtering, and context enforcement |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- Pre-processing prompts before LLM2 generation |
|
|
- Blocking unclear or context-poor requests |
|
|
- Structuring user input pipelines in AI apps, bots, and assistants |
|
|
|
|
|
--- |
|
|
|
|
|
## π’ Labels |
|
|
|
|
|
The model classifies prompts into 3 categories: |
|
|
|
|
|
| Label | Description | |
|
|
|-------|-------------| |
|
|
| `has context` | Prompt is clear, actionable, and self-contained | |
|
|
| `missing platform, audience, budget, goal` | Prompt lacks structural clarity | |
|
|
| `Intent is unclear, Please input more context` | Vague or incoherent prompt | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Model**: `distilbert-base-uncased` |
|
|
- **Training method**: Hugging Face AutoTrain |
|
|
- **Dataset size**: 200,000 prompts (curated, curriculum style) |
|
|
- **Epochs**: 3 |
|
|
- **Batch size**: 8 |
|
|
- **Max seq length**: 128 |
|
|
- **Mixed Precision**: `fp16` |
|
|
- **LoRA**: β Disabled |
|
|
- **Optimizer**: AdamW |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Evaluation |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| Accuracy | 1.0 | |
|
|
| F1 (macro/micro/weighted) | 1.0 | |
|
|
| Precision / Recall | 1.0 | |
|
|
| Validation Loss | 0.0 | |
|
|
|
|
|
The model generalizes extremely well on all validation samples. |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ How to Use |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("text-classification", model="VerifiedPrompts/CNTXT-Filter-Prompt-Opt") |
|
|
prompt = "Write a business plan for a freelance app in Canada." |
|
|
result = classifier(prompt) |
|
|
|
|
|
print(result) |
|
|
# [{'label': 'has context', 'score': 0.98}] |
|
|
|