Spaces:

RonniRodriguez
/

YOFO_cost_and_speed_analysis

Sleeping

App Files Files Community

YOFO_cost_and_speed_analysis / README.md

RonniRodriguez

Update citation to correct arXiv paper

001cd60 21 days ago

preview code

raw

history blame contribute delete

2.67 kB

	---
	title: YOFO Safety Evaluator
	emoji: 🛡️
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: Fast & Cheap LLM Safety Judging with YOFO method
	---

	# YOFO Safety Evaluator 🛡️

	This project implements a more efficient way to evaluate the safety of LLM outputs.

	Traditionally, if you want to check a chatbot response for 12 different safety issues (violence, hate speech, illegal advice, etc.), you have to ask a "Judge Model" 12 separate questions. That's 12 API calls, 12x the tokens, and 12x the cost.

	This project replicates the YOFO (You Only Forward Once) method. Instead of 12 calls, we format the prompt so the model answers all 12 requirements in a single forward pass.

	Result: It's about 10x cheaper and 4x faster than standard methods, with comparable accuracy.

	## How It Works

	The core idea is embedding the safety checklist directly into the prompt template.

	Standard Approach (N-Call):
	1. "Does this contain violence?" -> Model generates "No"
	2. "Does this contain hate speech?" -> Model generates "No"
	... (repeat 12 times)

	YOFO Approach (Ours):
	We feed one prompt:
	```text
	User: [Prompt]
	Assistant: [Response]

	Safety Check:
	1. Violence? [MASK]
	2. Hate Speech? [MASK]
	...
	```
	We then look at the model's logits at the `[MASK]` positions to instantly extract the Yes/No probabilities for every category simultaneously.

	## Project Structure

	- `src/`: Core implementation code.
	- `train.py`: Fine-tuning script (using LoRA).
	- `inference.py`: Single-pass inference logic.
	- `benchmark.py`: Script to measure speed/cost vs baselines.
	- `data/`: Scripts to download and prepare the BeaverTails/Anthropic datasets.
	- `app.py`: A Gradio web interface to demo the model.

	## Results

	Benchmarked on Qwen2.5-1.5B:

	\| Method \| Tokens per Eval \| Cost (est. per 1k) \| Speedup \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| YOFO (Ours) \| ~350 \| $3.52 \| 3.8x \|
	\| Standard Baseline \| ~3,600 \| $37.09 \| 1.0x \|

	## Usage

	1. Install dependencies
	```bash
	pip install -r requirements.txt
	```

	2. Prepare Data
	```bash
	python scripts/download_datasets.py
	python scripts/prepare_data.py
	python scripts/map_labels.py
	```

	3. Run the Benchmark
	```bash
	python src/benchmark.py
	```

	4. Try the Demo
	```bash
	python app.py
	```

	## Citation

	If you use this project or method, please cite the original paper:

	```bibtex
	@article{yofo2025,
	title={You Only Forward Once: An Efficient Compositional Judging Paradigm},
	journal={arXiv preprint arXiv:2511.16600},
	year={2025},
	url={https://arxiv.org/abs/2511.16600}
	}
	```

	## License
	MIT