RonniRodriguez's picture
Update citation to correct arXiv paper
001cd60

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: YOFO Safety Evaluator
emoji: 🛡️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
short_description: Fast & Cheap LLM Safety Judging with YOFO method

YOFO Safety Evaluator 🛡️

This project implements a more efficient way to evaluate the safety of LLM outputs.

Traditionally, if you want to check a chatbot response for 12 different safety issues (violence, hate speech, illegal advice, etc.), you have to ask a "Judge Model" 12 separate questions. That's 12 API calls, 12x the tokens, and 12x the cost.

This project replicates the YOFO (You Only Forward Once) method. Instead of 12 calls, we format the prompt so the model answers all 12 requirements in a single forward pass.

Result: It's about 10x cheaper and 4x faster than standard methods, with comparable accuracy.

How It Works

The core idea is embedding the safety checklist directly into the prompt template.

Standard Approach (N-Call):

  1. "Does this contain violence?" -> Model generates "No"
  2. "Does this contain hate speech?" -> Model generates "No" ... (repeat 12 times)

YOFO Approach (Ours): We feed one prompt:

User: [Prompt]
Assistant: [Response]

Safety Check:
1. Violence? [MASK]
2. Hate Speech? [MASK]
...

We then look at the model's logits at the [MASK] positions to instantly extract the Yes/No probabilities for every category simultaneously.

Project Structure

  • src/: Core implementation code.
    • train.py: Fine-tuning script (using LoRA).
    • inference.py: Single-pass inference logic.
    • benchmark.py: Script to measure speed/cost vs baselines.
  • data/: Scripts to download and prepare the BeaverTails/Anthropic datasets.
  • app.py: A Gradio web interface to demo the model.

Results

Benchmarked on Qwen2.5-1.5B:

Method Tokens per Eval Cost (est. per 1k) Speedup
YOFO (Ours) ~350 $3.52 3.8x
Standard Baseline ~3,600 $37.09 1.0x

Usage

1. Install dependencies

pip install -r requirements.txt

2. Prepare Data

python scripts/download_datasets.py
python scripts/prepare_data.py
python scripts/map_labels.py

3. Run the Benchmark

python src/benchmark.py

4. Try the Demo

python app.py

Citation

If you use this project or method, please cite the original paper:

@article{yofo2025,
  title={You Only Forward Once: An Efficient Compositional Judging Paradigm},
  journal={arXiv preprint arXiv:2511.16600},
  year={2025},
  url={https://arxiv.org/abs/2511.16600}
}

License

MIT