Spaces:

build-small-hackathon
/

InflectionLM

Running on Zero

App Files Files Community

InflectionLM / README.md

cafierom

Update README.md

068e790 verified 13 days ago

preview code

Raw

History Blame Contribute Delete

4.45 kB

	---
	title: InflectionLM
	emoji: 🐨
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 6.18.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	license: mit
	short_description: Finds inflection points in model outputs and displays scores
	tags:
	- track:backyard
	---

	# InflectionLM: Output and Token Visualization

	InflectionLM is a tool designed to visualize "inflection points" in Large Language Model (LLM) generation. By leveraging independent sampling, the tool generates multiple alternative output paths and identifies tokens where the model's confidence was low, highlighting these as potential points where the generation could have diverged.

	This is built for all of my colleagues. As University lecturers, we want students to know that AI/LLMs can generate lots of different answers for a given prompt...some right and some wrong. So I built a tool that can demonstrate just that and is easy enough for anyone to use with no coding involved. InflectionLM generates multiple
	responses using top-p and top-k as well as temperature, and displays not only all the different responses, but probability score scores for the entire response as well as for each individual token. The inflections implied in the name our places in the text where a really low probability token was used, so we can see where the response
	took a turn that it might not have taken with a different flip of the coin. The user can run with greedy decoding as as well as temperatures up to 1.5 to see how that affects the variability of responses.

	[Watch a video demo on LinkedIn!](https://www.linkedin.com/posts/mauricio-cafiero-5481259b_buildsmallhackathon-gemma4-huggingface-ugcPost-7472245848584683521-L4hx)

	## 🌟 Features

	- Diverse Response Generation: Generates 3 independent responses for every prompt using the GEMMA 4 model, utilizing Top-P (Nucleus) and Top-K sampling to maximize diversity.
	- Confidence Highlighting: Automatically highlights tokens with a probability score below 0.6 in red, marking them as "inflection points."
	- Interactive Visualization:
	- A Gradio-based GUI to input prompts and view results.
	- Ability to switch between the 3 generated responses.
	- Toggleable detailed view showing exact probability scores for every token in a response.
	- Customizable Generation: Adjustable temperature settings to control the randomness and diversity of the output.
	- UI Preferences: Support for both light and dark modes.

	## 🛠️ Tech Stack

	- Language: Python
	- LLM: [GEMMA 4](https://huggingface.co/google/gemma-4-31B-it) (via Hugging Face `transformers`)
	- Deep Learning Framework: PyTorch
	- UI Framework: Gradio
	- Compute: Optimized for GPU usage (via `@spaces.GPU` for ZeroGPU environments)

	## 📁 Project Structure

	```text
	InflectionLM/
	├── app.py # Gradio web application and UI logic
	├── inflections_funcs.py # Core logic for model loading, generation, and scoring
	├── requirements.txt # Project dependencies
	└── Stochastic_parrot.JPG # UI asset image
	```

	## 🚀 Getting Started

	### Prerequisites

	- Python 3.10+
	- A GPU with sufficient VRAM to load GEMMA 4 (31B)
	- Hugging Face account and access to the GEMMA 4 model

	### Installation

	1. Clone the repository:
	```bash
	git clone <repo-url>
	cd InflectionLM
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	### Running the Application

	Ensure your environment variables for Hugging Face are set (e.g., `HUGGING_FACE_HUB_TOKEN`), then run:

	```bash
	python app.py
	```

	The application will start a Gradio server. Open the provided URL in your browser to begin experimenting.

	## 📖 How it Works

	The tool uses Multinomial Sampling with Top-P and Top-K filtering to generate distinct sequences. Instead of a deterministic beam search, it samples from the probability distribution to find diverse but high-quality responses. After generation, it extracts the softmax probability of each chosen token from the model's output scores.

	Tokens with a probability $< 0.6$ are considered "unconfident." In linguistic terms, these are the "inflections"—moments where the model's internal probability distribution was flatter, meaning alternative tokens were almost as likely, leading to potential variations in the final text.

	--

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference