Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.15.2
title: Multi-Model LLM Comparison Tool
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
Multi-Model LLM Comparison Tool
UX-first interface to choose the best answer among different LLM models.
Overview
An exploratory tool for comparing responses from multiple Large Language Models (LLMs) side-by-side. Ask a question, select models, and choose which response works best for you. Built with privacy in mind - all data collection is opt-in only.
Features
- π€ Multi-Model Comparison: Query multiple LLMs simultaneously via Groq API
- βοΈ Side-by-Side View: Compare responses in a clean, readable format
- π Adjustable Output Length: Choose Short, Medium, or Full responses
- π Preference Selection: Mark which response works best for you
- π Privacy-First: Opt-in data collection only
- π¨ Clean UX: Minimalist, user-friendly interface
Supported Models
- GPT-4 (via Llama 3.3 70B on Groq)
- Claude 3 Opus (via Llama 3.1 70B on Groq)
- Gemini Pro (via Mixtral 8x7B)
- Llama 2 70B
- Mistral Large (via Mixtral 8x7B)
Note: Model names are mapped to available Groq models for demonstration purposes
Setup
Local Development
Clone the repository
git clone https://huggingface.co/spaces/rdisipio/multimodel-preference-tool cd multimodel-preference-toolInstall dependencies
pip install -r requirements.txtSet up environment variables
cp .env.example .env # Edit .env and add your Groq API keyGet your Groq API key
- Sign up at Groq Console
- Generate an API key
- Add it to your
.envfile
Run the app
python app.py
Hugging Face Spaces Deployment
This app is designed to run on Hugging Face Spaces with Gradio.
Set up your Space
- Create a new Gradio Space on Hugging Face
- Upload
app.py,requirements.txt, andREADME.md
Configure Secrets
- In your Space settings, add a secret:
- Name:
GROQ_API_KEY - Value: Your Groq API key
- Name:
- In your Space settings, add a secret:
Deploy
- Your Space will automatically build and deploy
Usage
- Enter your question in the text area
- Select output length: Short, Medium, or Full
- Choose models to compare (select at least one)
- Click "Compare answers" to see responses
- Review responses side-by-side
- Click "This one works for me" to record your preference (optional)
Privacy
- No data collection by default: Your questions and responses stay private
- Opt-in preferences: Only recorded when you click preference buttons
- Transparent: All data handling is visible in the open-source code
- Local-first: Run locally for complete privacy
Project Background
This tool is part of an exploration in open human feedback collection for LLM evaluation. The goal is to create a lightweight, UX-first interface that makes it easy for users to compare model outputs and provide feedback.
For more details, see the project documentation in the repository.
Built By
Human Feedback Foundation
Linux Foundation AI & Data member
License
See LICENSE file for details.