LaelaZ's picture
fix short_description length
39731fd

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: AI Evaluation Toolkit
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
short_description: RLHF rating, content policy scoring, obs/inference

AI Evaluation Toolkit

Interactive demos of the AI training data quality control workflows from github.com/LaelaZorana.

Three tools:

  1. RLHF Pairwise Rater — Rate AI responses on 4 axes with self-consistency check
  2. Content Policy Rater — Score text against a policy rubric with per-criterion reasoning
  3. Observation vs Inference — Practice keeping observations clean of conclusions

Built by Laela Zorana | HuggingFace | Kaggle