rdisipio
update deps
c9873aa

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: Multi-Model LLM Comparison Tool
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false

Multi-Model LLM Comparison Tool

UX-first interface to choose the best answer among different LLM models.

Overview

An exploratory tool for comparing responses from multiple Large Language Models (LLMs) side-by-side. Ask a question, select models, and choose which response works best for you. Built with privacy in mind - all data collection is opt-in only.

Features

  • πŸ€– Multi-Model Comparison: Query multiple LLMs simultaneously via Groq API
  • βš–οΈ Side-by-Side View: Compare responses in a clean, readable format
  • πŸ“ Adjustable Output Length: Choose Short, Medium, or Full responses
  • πŸ‘ Preference Selection: Mark which response works best for you
  • πŸ”’ Privacy-First: Opt-in data collection only
  • 🎨 Clean UX: Minimalist, user-friendly interface

Supported Models

  • GPT-4 (via Llama 3.3 70B on Groq)
  • Claude 3 Opus (via Llama 3.1 70B on Groq)
  • Gemini Pro (via Mixtral 8x7B)
  • Llama 2 70B
  • Mistral Large (via Mixtral 8x7B)

Note: Model names are mapped to available Groq models for demonstration purposes

Setup

Local Development

  1. Clone the repository

    git clone https://huggingface.co/spaces/rdisipio/multimodel-preference-tool
    cd multimodel-preference-tool
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Set up environment variables

    cp .env.example .env
    # Edit .env and add your Groq API key
    
  4. Get your Groq API key

    • Sign up at Groq Console
    • Generate an API key
    • Add it to your .env file
  5. Run the app

    python app.py
    

Hugging Face Spaces Deployment

This app is designed to run on Hugging Face Spaces with Gradio.

  1. Set up your Space

    • Create a new Gradio Space on Hugging Face
    • Upload app.py, requirements.txt, and README.md
  2. Configure Secrets

    • In your Space settings, add a secret:
      • Name: GROQ_API_KEY
      • Value: Your Groq API key
  3. Deploy

    • Your Space will automatically build and deploy

Usage

  1. Enter your question in the text area
  2. Select output length: Short, Medium, or Full
  3. Choose models to compare (select at least one)
  4. Click "Compare answers" to see responses
  5. Review responses side-by-side
  6. Click "This one works for me" to record your preference (optional)

Privacy

  • No data collection by default: Your questions and responses stay private
  • Opt-in preferences: Only recorded when you click preference buttons
  • Transparent: All data handling is visible in the open-source code
  • Local-first: Run locally for complete privacy

Project Background

This tool is part of an exploration in open human feedback collection for LLM evaluation. The goal is to create a lightweight, UX-first interface that makes it easy for users to compare model outputs and provide feedback.

For more details, see the project documentation in the repository.

Built By

Human Feedback Foundation
Linux Foundation AI & Data member

License

See LICENSE file for details.