Spaces:

google
/

embeddinggemma-tuning-lab

Running

File size: 7,023 Bytes

ba200cc
d6c6a2d
bc0c1ad
64ae41c
 
 
ba200cc
ac603d8
ba200cc
 
9d4e14d
7fd0b51
4cda898
66de24a
ba200cc
 
d6c6a2d
64ae41c
825adfe
64ae41c
 
 
 
 
 
 
 
 
 
 
 
 
80f7c5f
 
 
64ae41c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c5fe0f
 
64ae41c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c5fe0f
 
64ae41c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dad587a
64ae41c
7f82e34
64ae41c
7f82e34
beabfb7
 
 
 
 
dad587a
 
beabfb7
64ae41c
7f82e34
64ae41c
dad587a
64ae41c
 
 
80f7c5f

---
title: EmbeddingGemma Tuning Lab
short_description: Fine-tune EmbeddingGemma to understand your personal taste
emoji: 😻
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
 - manage-repos
license: apache-2.0
---

# 🤖 EmbeddingGemma Tuning Lab: Fine-Tuning and Mood Reader

This project provides a set of tools to fine-tune EmbeddingGemma to understand your personal taste in Hacker News titles and then use it to score and rank new articles based on their "vibe."

It includes three main applications:
1.  A **Gradio App** for interactive fine-tuning, evaluation, and real-time "vibe checks."
2.  An interactive **Command-Line (CLI) App** for viewing and scrolling through the scored feed directly in your terminal.
3.  A **Flask App** for a simple, deployable web "mood reader" that displays the live HN feed.

---

## ✨ Features

* **Interactive Fine-Tuning:** Use a Gradio interface to select your favorite Hacker News titles and fine-tune the `google/embeddinggemma-300m` model on your preferences.
* **Semantic Search Evaluation:** See the immediate impact of your training by comparing semantic search results before and after fine-tuning.
* **Data & Model Management:** Easily import additional training data, export the generated dataset, and download the fine-tuned model as a ZIP file.
* **Hacker News Similarity Check:** View the live Hacker News feed with each story scored and color-coded based on the current model's understanding of your taste.
* **Similarity Lamp:** Input any news title or text to get a real-time similarity score (its "vibe") against your personalized anchor.
* **Interactive CLI:** A terminal-based mood reader with color-coded output, scrolling, and live refresh capabilities.
* **Standalone Flask App:** A lightweight, read-only web app to continuously display the scored HN feed, perfect for simple deployment.

---

## 🔧 How It Works

The core idea is to measure the "vibe" of a news title by calculating the semantic similarity between its embedding and the embedding of a fixed anchor phrase, defined in `config.py` as **`MY_FAVORITE_NEWS`**.

1.  **Embedding:** The `sentence-transformers` library is used to convert news titles and the anchor phrase into high-dimensional vectors (embeddings).
2.  **Scoring:** The cosine similarity (or dot product on normalized embeddings) between a title's embedding and the anchor's embedding is calculated. A higher score means a better "vibe."
3.  **Fine-Tuning:** The Gradio app generates a contrastive learning dataset from your selections.
    * **Positive Pairs:** (`MY_FAVORITE_NEWS`, `[A title you selected]`)
    * **Negative Pairs:** (`MY_FAVORITE_NEWS`, `[A title you did not select]`)
4.  **Training:** The model is trained using `MultipleNegativesRankingLoss`, which fine-tunes it to pull the embeddings of your "favorite" titles closer to the anchor phrase and push the others away.

## 🚀 Getting Started

### 1. Prerequisites
* Python 3.12+
* Git

### 2. Installation

```bash
# Clone the repository
git clone https://huggingface.co/spaces/bebechien/news-vibe-checker
cd news-vibe-checker

# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

# Install the required packages
pip install -r requirements.txt
````

### 3\. (Optional) Hugging Face Authentication

If you plan to use gated models or push your fine-tuned model to the Hugging Face Hub, you need to authenticate.

```bash
# Set your Hugging Face token as an environment variable
export HF_TOKEN="your_hf_token_here"
```

-----

## 🖥️ Running the Applications

You can run any of the three applications depending on your needs.

### Option A: Interactive Fine-Tuning (Gradio App)

This is the main application for creating and evaluating a personalized model.

**▶️ To run:**

```bash
python app.py
```

Navigate to the local URL provided (e.g., `http://127.0.0.1:7860`).

### Option B: Interactive Terminal Viewer (CLI App)

This app runs directly in your terminal, allowing you to quickly see and scroll through the scored Hacker News feed.

![image](cli.png)

**▶️ To run:**

```bash
python cli_mood_reader.py
```

**Interactive Controls:**

  * **[↑|↓]** arrow keys to scroll through the story list.
  * **[SPACE]** to refresh the feed with the latest stories.
  * **[q]** to quit the application.

You can also start it with options:

```bash
# Specify a different model from Hugging Face
python cli_mood_reader.py --model google/embeddinggemma-300m

# Show 10 stories per screen instead of the default 15
python cli_mood_reader.py --top 10
```

### Option C: Standalone Web Viewer (Flask App)

This app is a simple, read-only web page that fetches and displays the scored HN feed. It's ideal for deploying a finished model.

![image](flask.png)

**▶️ To run:**

```bash
# (Optional) Specify a model from the Hugging Face Hub
export MOOD_MODEL="bebechien/embedding-gemma-finetuned-hn"

# Run the Flask server
python flask_app.py
```

Navigate to `http://127.0.0.1:5000` to see the results.

-----

## ⚙️ Configuration

Key parameters can be adjusted in `config.py`:

  * `MODEL_NAME`: The base model to use for fine-tuning (e.g., `'google/embeddinggemma-300m'`).
  * `QUERY_ANCHOR`: The anchor text used for similarity scoring (e.g., `"MY_FAVORITE_NEWS"`).
  * `DEFAULT_MOOD_READER_MODEL`: The default model used by the Flask and CLI apps.
  * `HN_RSS_URL`: The RSS feed URL.
  * `CACHE_DURATION_SECONDS`: How long to cache the RSS feed data.

-----

## 📂 File Structure

```
.
├── app.py                  # Main Gradio application entry point
├── cli_mood_reader.py      # Interactive command-line mood reader
├── cli.png                 # Screenshot for CLI app
├── flask_app.py            # Standalone Flask application for mood reading
├── flask.png               # Screenshot for Flask app
├── src/                    # Source code for the application
│   ├── config.py           # Central configuration for all modules
│   ├── data_fetcher.py     # Fetches and caches the Hacker News RSS feed
│   ├── hn_mood_reader.py   # Core logic for fetching and scoring
│   ├── model_trainer.py    # Handles model loading and fine-tuning
│   ├── session_manager.py  # Manages user sessions and application state
│   ├── ui.py               # Defines the Gradio user interface
│   └── vibe_logic.py       # Calculates similarity scores and "vibe" status
├── requirements.txt        # Python package dependencies
├── example_training_dataset.csv # Example dataset for training
├── README.md               # This file
├── artifacts/              # Stores session-specific fine-tuned models and datasets (generated)
└── templates/              # HTML templates for the Flask app
    ├── index.html
    └── error.html
```