Spaces:

PITTI
/

speechmap-judges

Running

App Files Files Community

speechmap-judges / README.md

pappitti

change README header

8a5be25 3 months ago

preview code

raw

history blame contribute delete

4.42 kB

	---
	title: LLM Assessment Explorer
	emoji: 🫣
	colorFrom: purple
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	license: apache-2.0
	short_description: LLM moderation profiles and judges classification
	datasets:
	- PITTI/speechmap-questions
	- PITTI/speechmap-responses-v3
	- PITTI/speechmap-assessments-v3
	---

	# LLM Assessment Explorer

	[Speechmap-judges Demo](https://github.com/user-attachments/assets/f94f0ef9-7ad6-419d-823a-56e828061092)

	An interactive TypeScript app for exploring and comparing differences in Large Language Model (LLM) assessments. This tool helps visualize how different "judge" models classify the same LLM-generated responses, providing deep insights into inter-rater reliability and model behavior.

	### Core Features

	* Compare Any Two Judges: Select any two LLM judges from the dataset to compare their assessments side-by-side.
	* Filter by Theme: Narrow down the analysis to specific topics or domains by filtering by question theme.
	* Sankey Chart: Visualize the reclassification flow, showing how assessments from Judge 1 are categorized by Judge 2.
	* Transition Matrix (Heatmap): Get a clear, at-a-glance overview of agreement and disagreement between the two selected judges.
	* Drill-Down to Details: Click on any chart element to inspect the specific items, including the original question, the LLM's response, and the detailed analysis from both judges.

	## Speechmap Data

	This application explores datasets derived from xlr8harder's [Speechmap](https://speechmap.ai/) and [llm-compliance](https://github.com/xlr8harder/llm-compliance) projects. The data has been indexed and aggregated for efficient exploration.

	The underlying dataset from HuggingFace includes:
	* 2.4k questions: [speechmap-questions](https://huggingface.co/datasets/PITTI/speechmap-questions)
	* 369k responses: [speechmap-responses](https://huggingface.co/datasets/PITTI/speechmap-responses-v3)
	* 2.07k LLM-judge assessments: [speechmap-assessments](https://huggingface.co/datasets/PITTI/speechmap-assessments-v3)
	* The assessment dataset combines the original assessments from the Speechmap project by `gpt-4o`, assessments by `mistral-small-3.1-2503`, `mistral-small-3.2-2506`, `gemma3-27b-it`, `deepseek-v3.2`, `qwen3-next-80B-A3B-instruct` and manual annotations.

	## Quick Start

	### Prerequisites

	You need to have [Node.js](https://nodejs.org/) (which includes npm) installed on your machine. Requires Node version >=20.15.1

	### Installation & Setup

	1. Clone the repository:
	```sh
	git clone https://github.com/pappitti/speechmap-judges.git
	cd speechmap-judges
	```

	2. Vite Dev Mode
	Install Dependencies:
	```sh
	npm install
	```

	Fetch Data and Build the Database:
	This command downloads the Parquet datasets from Hugging Face and creates a local `database.duckdb` file at the root of the project.

	```sh
	npm run db:rebuild
	```
	This project includes a branch running on duckdb-wasm. That branch does not require this step 3 : you can run `npm run dev` directly after `npm install` (or `npm run build` and then `npm run preview` for production). However, that branch was never merged with the main branch because database persistence is tricky with duckdb-wasm so, right now, the database must be built again each time the app is started, which is really bad UX. IndexedDB is not an option ; more work is required on that branch.
	_Also, duckdb-wasm in not as fast as expected for a database of this size_

	Run the application:
	This command starts the React frontend development server.

	```sh
	npm run dev
	```
	Open [http://localhost:5173](http://localhost:5173) (or the URL provided in your terminal) to view it in your browser.

	3. Production Build (Docker)
	```sh
	docker build -t speechmap-judges-prod .
	```

	Run the application:
	```sh
	docker run -p 7860:7860 --rm --name speechmap-judges-container speechmap-judges-prod
	```
	Open [http://localhost:7860](http://localhost:7860) to view it in your browser.


	## Acknowledgments

	Whether you want to promote free speech or moderation, understanding biases in LLMs—and in the case of this project, biases in LLM-judges—is critical. Against this backdrop, the Speechmap project by xlr8harder is a very important initiative.