---
title: LLM Assessment Explorer
emoji: 🫣
colorFrom: purple
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: LLM moderation profiles and judges classification
datasets:
  - PITTI/speechmap-questions
  - PITTI/speechmap-responses-v3
  - PITTI/speechmap-assessments-v3
---

# LLM Assessment Explorer

[Speechmap-judges Demo](https://github.com/user-attachments/assets/f94f0ef9-7ad6-419d-823a-56e828061092)

An interactive TypeScript app for exploring and comparing differences in Large Language Model (LLM) assessments. This tool helps visualize how different "judge" models classify the same LLM-generated responses, providing deep insights into inter-rater reliability and model behavior.

### Core Features

*   **Compare Any Two Judges**: Select any two LLM judges from the dataset to compare their assessments side-by-side.
*   **Filter by Theme**: Narrow down the analysis to specific topics or domains by filtering by question theme.
*   **Sankey Chart**: Visualize the reclassification flow, showing how assessments from Judge 1 are categorized by Judge 2.
*   **Transition Matrix (Heatmap)**: Get a clear, at-a-glance overview of agreement and disagreement between the two selected judges.
*   **Drill-Down to Details**: Click on any chart element to inspect the specific items, including the original question, the LLM's response, and the detailed analysis from both judges.

## Speechmap Data

This application explores datasets derived from xlr8harder's [Speechmap](https://speechmap.ai/) and [llm-compliance](https://github.com/xlr8harder/llm-compliance) projects. The data has been indexed and aggregated for efficient exploration.

The underlying dataset from HuggingFace includes:
*   **2.4k questions**: [speechmap-questions](https://huggingface.co/datasets/PITTI/speechmap-questions)
*   **369k responses**: [speechmap-responses](https://huggingface.co/datasets/PITTI/speechmap-responses-v3)
*   **2.07k LLM-judge assessments**: [speechmap-assessments](https://huggingface.co/datasets/PITTI/speechmap-assessments-v3)
    *   The assessment dataset combines the original assessments from the Speechmap project by `gpt-4o`, assessments by `mistral-small-3.1-2503`, `mistral-small-3.2-2506`, `gemma3-27b-it`, `deepseek-v3.2`, `qwen3-next-80B-A3B-instruct` and manual annotations.

## Quick Start

### Prerequisites

You need to have [Node.js](https://nodejs.org/) (which includes npm) installed on your machine. Requires Node version >=20.15.1  

### Installation & Setup

1.  **Clone the repository:**
    ```sh
    git clone https://github.com/pappitti/speechmap-judges.git
    cd speechmap-judges
    ```

2.  **Vite Dev Mode**  
    **Install Dependencies:** 
    ```sh
    npm install
    ```

    **Fetch Data and Build the Database:**  
    This command downloads the Parquet datasets from Hugging Face and creates a local `database.duckdb` file at the root of the project.

    ```sh
    npm run db:rebuild
    ```  
    This project includes a branch running on duckdb-wasm. That branch does not require this step 3 : you can run `npm run dev` directly after `npm install` (or `npm run build` and then `npm run preview` for production). However, that branch was never merged with the main branch because database persistence is tricky with duckdb-wasm so, right now, the database must be built again each time the app is started, which is really bad UX. IndexedDB is not an option ; more work is required on that branch.  
    _Also, duckdb-wasm in not as fast as expected for a database of this size_

    **Run the application:**  
    This command starts the React frontend development server.

    ```sh
    npm run dev
    ```
    Open [http://localhost:5173](http://localhost:5173) (or the URL provided in your terminal) to view it in your browser.

3.  **Production Build (Docker)**
    ```sh
    docker build -t speechmap-judges-prod .
    ```

    **Run the application:**
    ```sh
    docker run -p 7860:7860 --rm --name speechmap-judges-container speechmap-judges-prod
    ```
    Open [http://localhost:7860](http://localhost:7860) to view it in your browser.


## Acknowledgments

Whether you want to promote free speech or moderation, understanding biases in LLMs—and in the case of this project, biases in LLM-judges—is critical. Against this backdrop, the Speechmap project by xlr8harder is a very important initiative.