Spaces:

itseffi
/

ai-product-evals-framework

Sleeping

App Files Files Community

ai-product-evals-framework / README.md

itseffi

Update README.md

e055ed8 verified 2 months ago

preview code

raw

history blame contribute delete

1.7 kB

metadata

title: AI Product Evals Framework
emoji: ⚖️
colorFrom: gray
colorTo: purple
sdk: docker
app_port: 3000
pinned: false
short_description: Your AI Product needs Evals
license: mit

AI Product Evals Framework

Unit Tests | Model & Human Eval | A/B Testing

The complete evaluation framework for AI Products.

How It Works

LLM Invocations → Logging Traces → Eval & Curation → Improve Model
       ↑                                                    ↓
       └──────────── Fine-Tune + Prompt Eng. ←──────────────┘

Every production AI system needs a feedback loop. This framework provides three levels of Evaluation:

Level 1: Unit Tests

Write Unit Tests - Define what to test
Create Test Cases - Build dataset (prompts, expected, criteria)
Run & Review - Run in the UI and view pass/fail summary

Level 2: Model & Human Eval

Log Traces - Upload here as JSON/CSV
Look at Traces - Browse and inspect in UI
Model & Human - Model scoring + human accept/reject

Level 3: A/B Testing

Define Variants - Two prompt/system configurations
Run Comparison - Same test cases, both variants
Analyze Results - Winner recommendation

Models

Usage

Choose a mode: Unit Tests, Model & Human Eval, or A/B Testing
Load demo data or upload your own JSON/CSV files
Run evaluations and review results

Development

# Create .env.local with your HuggingFace token
echo "HF_TOKEN=hf_your_token_here" > .env.local

# Install and run
npm install
npm run dev

Visit http://localhost:3000