itseffi's picture
Update README.md
e055ed8 verified
metadata
title: AI Product Evals Framework
emoji: ⚖️
colorFrom: gray
colorTo: purple
sdk: docker
app_port: 3000
pinned: false
short_description: Your AI Product needs Evals
license: mit

AI Product Evals Framework

Unit Tests | Model & Human Eval | A/B Testing

The complete evaluation framework for AI Products.

How It Works

LLM Invocations → Logging Traces → Eval & Curation → Improve Model
       ↑                                                    ↓
       └──────────── Fine-Tune + Prompt Eng. ←──────────────┘

Every production AI system needs a feedback loop. This framework provides three levels of Evaluation:

Level 1: Unit Tests

  1. Write Unit Tests - Define what to test
  2. Create Test Cases - Build dataset (prompts, expected, criteria)
  3. Run & Review - Run in the UI and view pass/fail summary

Level 2: Model & Human Eval

  1. Log Traces - Upload here as JSON/CSV
  2. Look at Traces - Browse and inspect in UI
  3. Model & Human - Model scoring + human accept/reject

Level 3: A/B Testing

  1. Define Variants - Two prompt/system configurations
  2. Run Comparison - Same test cases, both variants
  3. Analyze Results - Winner recommendation

Models

Powered by HuggingFace Inference API - supports the latest open source LLMs.

Usage

  1. Choose a mode: Unit Tests, Model & Human Eval, or A/B Testing
  2. Load demo data or upload your own JSON/CSV files
  3. Run evaluations and review results

Development

# Create .env.local with your HuggingFace token
echo "HF_TOKEN=hf_your_token_here" > .env.local

# Install and run
npm install
npm run dev

Visit http://localhost:3000