Spaces:
Runtime error
Runtime error
| title: AI Evaluation Toolkit | |
| emoji: π― | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: true | |
| short_description: RLHF rating, content policy scoring, obs/inference | |
| # AI Evaluation Toolkit | |
| Interactive demos of the AI training data quality control workflows from [github.com/LaelaZorana](https://github.com/LaelaZorana). | |
| Three tools: | |
| 1. **RLHF Pairwise Rater** β Rate AI responses on 4 axes with self-consistency check | |
| 2. **Content Policy Rater** β Score text against a policy rubric with per-criterion reasoning | |
| 3. **Observation vs Inference** β Practice keeping observations clean of conclusions | |
| Built by [Laela Zorana](https://github.com/LaelaZorana) | [HuggingFace](https://huggingface.co/LaelaZ) | [Kaggle](https://kaggle.com/laelazorana) | |