---
title: Multi-Model LLM Comparison Tool
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "6.3.0"
app_file: app.py
pinned: false
---

# Multi-Model LLM Comparison Tool

UX-first interface to choose the best answer among different LLM models.

## Overview

An exploratory tool for comparing responses from multiple Large Language Models (LLMs) side-by-side. Ask a question, select models, and choose which response works best for you. Built with privacy in mind - all data collection is opt-in only.

## Features

- 🤖 **Multi-Model Comparison**: Query multiple LLMs simultaneously via Groq API
- ⚖️ **Side-by-Side View**: Compare responses in a clean, readable format
- 📏 **Adjustable Output Length**: Choose Short, Medium, or Full responses
- 👍 **Preference Selection**: Mark which response works best for you
- 🔒 **Privacy-First**: Opt-in data collection only
- 🎨 **Clean UX**: Minimalist, user-friendly interface

## Supported Models

- GPT-4 (via Llama 3.3 70B on Groq)
- Claude 3 Opus (via Llama 3.1 70B on Groq)
- Gemini Pro (via Mixtral 8x7B)
- Llama 2 70B
- Mistral Large (via Mixtral 8x7B)

*Note: Model names are mapped to available Groq models for demonstration purposes*

## Setup

### Local Development

1. **Clone the repository**
   ```bash
   git clone https://huggingface.co/spaces/rdisipio/multimodel-preference-tool
   cd multimodel-preference-tool
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**
   ```bash
   cp .env.example .env
   # Edit .env and add your Groq API key
   ```

4. **Get your Groq API key**
   - Sign up at [Groq Console](https://console.groq.com/)
   - Generate an API key
   - Add it to your `.env` file

5. **Run the app**
   ```bash
   python app.py
   ```

### Hugging Face Spaces Deployment

This app is designed to run on Hugging Face Spaces with Gradio.

1. **Set up your Space**
   - Create a new Gradio Space on Hugging Face
   - Upload `app.py`, `requirements.txt`, and `README.md`

2. **Configure Secrets**
   - In your Space settings, add a secret:
     - Name: `GROQ_API_KEY`
     - Value: Your Groq API key

3. **Deploy**
   - Your Space will automatically build and deploy

## Usage

1. **Enter your question** in the text area
2. **Select output length**: Short, Medium, or Full
3. **Choose models** to compare (select at least one)
4. **Click "Compare answers"** to see responses
5. **Review responses** side-by-side
6. **Click "This one works for me"** to record your preference (optional)

## Privacy

- **No data collection by default**: Your questions and responses stay private
- **Opt-in preferences**: Only recorded when you click preference buttons
- **Transparent**: All data handling is visible in the open-source code
- **Local-first**: Run locally for complete privacy

## Project Background

This tool is part of an exploration in open human feedback collection for LLM evaluation. The goal is to create a lightweight, UX-first interface that makes it easy for users to compare model outputs and provide feedback.

For more details, see the project documentation in the repository.

## Built By

Human Feedback Foundation  
Linux Foundation AI & Data member

## License

See [LICENSE](LICENSE) file for details.