A newer version of the Gradio SDK is available:
6.1.0
title: YOFO Safety Evaluator
emoji: 🛡️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
short_description: Fast & Cheap LLM Safety Judging with YOFO method
YOFO Safety Evaluator 🛡️
This project implements a more efficient way to evaluate the safety of LLM outputs.
Traditionally, if you want to check a chatbot response for 12 different safety issues (violence, hate speech, illegal advice, etc.), you have to ask a "Judge Model" 12 separate questions. That's 12 API calls, 12x the tokens, and 12x the cost.
This project replicates the YOFO (You Only Forward Once) method. Instead of 12 calls, we format the prompt so the model answers all 12 requirements in a single forward pass.
Result: It's about 10x cheaper and 4x faster than standard methods, with comparable accuracy.
How It Works
The core idea is embedding the safety checklist directly into the prompt template.
Standard Approach (N-Call):
- "Does this contain violence?" -> Model generates "No"
- "Does this contain hate speech?" -> Model generates "No" ... (repeat 12 times)
YOFO Approach (Ours): We feed one prompt:
User: [Prompt]
Assistant: [Response]
Safety Check:
1. Violence? [MASK]
2. Hate Speech? [MASK]
...
We then look at the model's logits at the [MASK] positions to instantly extract the Yes/No probabilities for every category simultaneously.
Project Structure
src/: Core implementation code.train.py: Fine-tuning script (using LoRA).inference.py: Single-pass inference logic.benchmark.py: Script to measure speed/cost vs baselines.
data/: Scripts to download and prepare the BeaverTails/Anthropic datasets.app.py: A Gradio web interface to demo the model.
Results
Benchmarked on Qwen2.5-1.5B:
| Method | Tokens per Eval | Cost (est. per 1k) | Speedup |
|---|---|---|---|
| YOFO (Ours) | ~350 | $3.52 | 3.8x |
| Standard Baseline | ~3,600 | $37.09 | 1.0x |
Usage
1. Install dependencies
pip install -r requirements.txt
2. Prepare Data
python scripts/download_datasets.py
python scripts/prepare_data.py
python scripts/map_labels.py
3. Run the Benchmark
python src/benchmark.py
4. Try the Demo
python app.py
Citation
If you use this project or method, please cite the original paper:
@article{yofo2025,
title={You Only Forward Once: An Efficient Compositional Judging Paradigm},
journal={arXiv preprint arXiv:2511.16600},
year={2025},
url={https://arxiv.org/abs/2511.16600}
}
License
MIT