A newer version of the Gradio SDK is available:
6.5.1
title: EmbeddingGemma Tuning Lab
short_description: Fine-tune EmbeddingGemma to understand your personal taste
emoji: π»
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- manage-repos
license: apache-2.0
π€ EmbeddingGemma Tuning Lab: Fine-Tuning and Mood Reader
This project provides a set of tools to fine-tune EmbeddingGemma to understand your personal taste in Hacker News titles and then use it to score and rank new articles based on their "vibe."
It includes three main applications:
- A Gradio App for interactive fine-tuning, evaluation, and real-time "vibe checks."
- An interactive Command-Line (CLI) App for viewing and scrolling through the scored feed directly in your terminal.
- A Flask App for a simple, deployable web "mood reader" that displays the live HN feed.
β¨ Features
- Interactive Fine-Tuning: Use a Gradio interface to select your favorite Hacker News titles and fine-tune the
google/embeddinggemma-300mmodel on your preferences. - Semantic Search Evaluation: See the immediate impact of your training by comparing semantic search results before and after fine-tuning.
- Data & Model Management: Easily import additional training data, export the generated dataset, and download the fine-tuned model as a ZIP file.
- Hacker News Similarity Check: View the live Hacker News feed with each story scored and color-coded based on the current model's understanding of your taste.
- Similarity Lamp: Input any news title or text to get a real-time similarity score (its "vibe") against your personalized anchor.
- Interactive CLI: A terminal-based mood reader with color-coded output, scrolling, and live refresh capabilities.
- Standalone Flask App: A lightweight, read-only web app to continuously display the scored HN feed, perfect for simple deployment.
π§ How It Works
The core idea is to measure the "vibe" of a news title by calculating the semantic similarity between its embedding and the embedding of a fixed anchor phrase, defined in config.py as MY_FAVORITE_NEWS.
- Embedding: The
sentence-transformerslibrary is used to convert news titles and the anchor phrase into high-dimensional vectors (embeddings). - Scoring: The cosine similarity (or dot product on normalized embeddings) between a title's embedding and the anchor's embedding is calculated. A higher score means a better "vibe."
- Fine-Tuning: The Gradio app generates a contrastive learning dataset from your selections.
- Positive Pairs: (
MY_FAVORITE_NEWS,[A title you selected]) - Negative Pairs: (
MY_FAVORITE_NEWS,[A title you did not select])
- Positive Pairs: (
- Training: The model is trained using
MultipleNegativesRankingLoss, which fine-tunes it to pull the embeddings of your "favorite" titles closer to the anchor phrase and push the others away.
π Getting Started
1. Prerequisites
- Python 3.12+
- Git
2. Installation
# Clone the repository
git clone https://huggingface.co/spaces/bebechien/news-vibe-checker
cd news-vibe-checker
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
# Install the required packages
pip install -r requirements.txt
3. (Optional) Hugging Face Authentication
If you plan to use gated models or push your fine-tuned model to the Hugging Face Hub, you need to authenticate.
# Set your Hugging Face token as an environment variable
export HF_TOKEN="your_hf_token_here"
π₯οΈ Running the Applications
You can run any of the three applications depending on your needs.
Option A: Interactive Fine-Tuning (Gradio App)
This is the main application for creating and evaluating a personalized model.
βΆοΈ To run:
python app.py
Navigate to the local URL provided (e.g., http://127.0.0.1:7860).
Option B: Interactive Terminal Viewer (CLI App)
This app runs directly in your terminal, allowing you to quickly see and scroll through the scored Hacker News feed.
βΆοΈ To run:
python cli_mood_reader.py
Interactive Controls:
- [β|β] arrow keys to scroll through the story list.
- [SPACE] to refresh the feed with the latest stories.
- [q] to quit the application.
You can also start it with options:
# Specify a different model from Hugging Face
python cli_mood_reader.py --model google/embeddinggemma-300m
# Show 10 stories per screen instead of the default 15
python cli_mood_reader.py --top 10
Option C: Standalone Web Viewer (Flask App)
This app is a simple, read-only web page that fetches and displays the scored HN feed. It's ideal for deploying a finished model.
βΆοΈ To run:
# (Optional) Specify a model from the Hugging Face Hub
export MOOD_MODEL="bebechien/embedding-gemma-finetuned-hn"
# Run the Flask server
python flask_app.py
Navigate to http://127.0.0.1:5000 to see the results.
βοΈ Configuration
Key parameters can be adjusted in config.py:
MODEL_NAME: The base model to use for fine-tuning (e.g.,'google/embeddinggemma-300m').QUERY_ANCHOR: The anchor text used for similarity scoring (e.g.,"MY_FAVORITE_NEWS").DEFAULT_MOOD_READER_MODEL: The default model used by the Flask and CLI apps.HN_RSS_URL: The RSS feed URL.CACHE_DURATION_SECONDS: How long to cache the RSS feed data.
π File Structure
.
βββ app.py # Main Gradio application entry point
βββ cli_mood_reader.py # Interactive command-line mood reader
βββ cli.png # Screenshot for CLI app
βββ flask_app.py # Standalone Flask application for mood reading
βββ flask.png # Screenshot for Flask app
βββ src/ # Source code for the application
β βββ config.py # Central configuration for all modules
β βββ data_fetcher.py # Fetches and caches the Hacker News RSS feed
β βββ hn_mood_reader.py # Core logic for fetching and scoring
β βββ model_trainer.py # Handles model loading and fine-tuning
β βββ session_manager.py # Manages user sessions and application state
β βββ ui.py # Defines the Gradio user interface
β βββ vibe_logic.py # Calculates similarity scores and "vibe" status
βββ requirements.txt # Python package dependencies
βββ example_training_dataset.csv # Example dataset for training
βββ README.md # This file
βββ artifacts/ # Stores session-specific fine-tuned models and datasets (generated)
βββ templates/ # HTML templates for the Flask app
βββ index.html
βββ error.html

