File size: 7,023 Bytes
ba200cc d6c6a2d bc0c1ad 64ae41c ba200cc ac603d8 ba200cc 9d4e14d 7fd0b51 4cda898 66de24a ba200cc d6c6a2d 64ae41c 825adfe 64ae41c 80f7c5f 64ae41c 4c5fe0f 64ae41c 4c5fe0f 64ae41c dad587a 64ae41c 7f82e34 64ae41c 7f82e34 beabfb7 dad587a beabfb7 64ae41c 7f82e34 64ae41c dad587a 64ae41c 80f7c5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
---
title: EmbeddingGemma Tuning Lab
short_description: Fine-tune EmbeddingGemma to understand your personal taste
emoji: π»
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- manage-repos
license: apache-2.0
---
# π€ EmbeddingGemma Tuning Lab: Fine-Tuning and Mood Reader
This project provides a set of tools to fine-tune EmbeddingGemma to understand your personal taste in Hacker News titles and then use it to score and rank new articles based on their "vibe."
It includes three main applications:
1. A **Gradio App** for interactive fine-tuning, evaluation, and real-time "vibe checks."
2. An interactive **Command-Line (CLI) App** for viewing and scrolling through the scored feed directly in your terminal.
3. A **Flask App** for a simple, deployable web "mood reader" that displays the live HN feed.
---
## β¨ Features
* **Interactive Fine-Tuning:** Use a Gradio interface to select your favorite Hacker News titles and fine-tune the `google/embeddinggemma-300m` model on your preferences.
* **Semantic Search Evaluation:** See the immediate impact of your training by comparing semantic search results before and after fine-tuning.
* **Data & Model Management:** Easily import additional training data, export the generated dataset, and download the fine-tuned model as a ZIP file.
* **Hacker News Similarity Check:** View the live Hacker News feed with each story scored and color-coded based on the current model's understanding of your taste.
* **Similarity Lamp:** Input any news title or text to get a real-time similarity score (its "vibe") against your personalized anchor.
* **Interactive CLI:** A terminal-based mood reader with color-coded output, scrolling, and live refresh capabilities.
* **Standalone Flask App:** A lightweight, read-only web app to continuously display the scored HN feed, perfect for simple deployment.
---
## π§ How It Works
The core idea is to measure the "vibe" of a news title by calculating the semantic similarity between its embedding and the embedding of a fixed anchor phrase, defined in `config.py` as **`MY_FAVORITE_NEWS`**.
1. **Embedding:** The `sentence-transformers` library is used to convert news titles and the anchor phrase into high-dimensional vectors (embeddings).
2. **Scoring:** The cosine similarity (or dot product on normalized embeddings) between a title's embedding and the anchor's embedding is calculated. A higher score means a better "vibe."
3. **Fine-Tuning:** The Gradio app generates a contrastive learning dataset from your selections.
* **Positive Pairs:** (`MY_FAVORITE_NEWS`, `[A title you selected]`)
* **Negative Pairs:** (`MY_FAVORITE_NEWS`, `[A title you did not select]`)
4. **Training:** The model is trained using `MultipleNegativesRankingLoss`, which fine-tunes it to pull the embeddings of your "favorite" titles closer to the anchor phrase and push the others away.
## π Getting Started
### 1. Prerequisites
* Python 3.12+
* Git
### 2. Installation
```bash
# Clone the repository
git clone https://huggingface.co/spaces/bebechien/news-vibe-checker
cd news-vibe-checker
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
# Install the required packages
pip install -r requirements.txt
````
### 3\. (Optional) Hugging Face Authentication
If you plan to use gated models or push your fine-tuned model to the Hugging Face Hub, you need to authenticate.
```bash
# Set your Hugging Face token as an environment variable
export HF_TOKEN="your_hf_token_here"
```
-----
## π₯οΈ Running the Applications
You can run any of the three applications depending on your needs.
### Option A: Interactive Fine-Tuning (Gradio App)
This is the main application for creating and evaluating a personalized model.
**βΆοΈ To run:**
```bash
python app.py
```
Navigate to the local URL provided (e.g., `http://127.0.0.1:7860`).
### Option B: Interactive Terminal Viewer (CLI App)
This app runs directly in your terminal, allowing you to quickly see and scroll through the scored Hacker News feed.

**βΆοΈ To run:**
```bash
python cli_mood_reader.py
```
**Interactive Controls:**
* **[β|β]** arrow keys to scroll through the story list.
* **[SPACE]** to refresh the feed with the latest stories.
* **[q]** to quit the application.
You can also start it with options:
```bash
# Specify a different model from Hugging Face
python cli_mood_reader.py --model google/embeddinggemma-300m
# Show 10 stories per screen instead of the default 15
python cli_mood_reader.py --top 10
```
### Option C: Standalone Web Viewer (Flask App)
This app is a simple, read-only web page that fetches and displays the scored HN feed. It's ideal for deploying a finished model.

**βΆοΈ To run:**
```bash
# (Optional) Specify a model from the Hugging Face Hub
export MOOD_MODEL="bebechien/embedding-gemma-finetuned-hn"
# Run the Flask server
python flask_app.py
```
Navigate to `http://127.0.0.1:5000` to see the results.
-----
## βοΈ Configuration
Key parameters can be adjusted in `config.py`:
* `MODEL_NAME`: The base model to use for fine-tuning (e.g., `'google/embeddinggemma-300m'`).
* `QUERY_ANCHOR`: The anchor text used for similarity scoring (e.g., `"MY_FAVORITE_NEWS"`).
* `DEFAULT_MOOD_READER_MODEL`: The default model used by the Flask and CLI apps.
* `HN_RSS_URL`: The RSS feed URL.
* `CACHE_DURATION_SECONDS`: How long to cache the RSS feed data.
-----
## π File Structure
```
.
βββ app.py # Main Gradio application entry point
βββ cli_mood_reader.py # Interactive command-line mood reader
βββ cli.png # Screenshot for CLI app
βββ flask_app.py # Standalone Flask application for mood reading
βββ flask.png # Screenshot for Flask app
βββ src/ # Source code for the application
β βββ config.py # Central configuration for all modules
β βββ data_fetcher.py # Fetches and caches the Hacker News RSS feed
β βββ hn_mood_reader.py # Core logic for fetching and scoring
β βββ model_trainer.py # Handles model loading and fine-tuning
β βββ session_manager.py # Manages user sessions and application state
β βββ ui.py # Defines the Gradio user interface
β βββ vibe_logic.py # Calculates similarity scores and "vibe" status
βββ requirements.txt # Python package dependencies
βββ example_training_dataset.csv # Example dataset for training
βββ README.md # This file
βββ artifacts/ # Stores session-specific fine-tuned models and datasets (generated)
βββ templates/ # HTML templates for the Flask app
βββ index.html
βββ error.html
```
|