--- title: EmbeddingGemma Tuning Lab short_description: Fine-tune EmbeddingGemma to understand your personal taste emoji: 😻 colorFrom: green colorTo: indigo sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false hf_oauth: true hf_oauth_scopes: - manage-repos license: apache-2.0 --- # 🤖 EmbeddingGemma Tuning Lab: Fine-Tuning and Mood Reader This project provides a set of tools to fine-tune EmbeddingGemma to understand your personal taste in Hacker News titles and then use it to score and rank new articles based on their "vibe." It includes three main applications: 1. A **Gradio App** for interactive fine-tuning, evaluation, and real-time "vibe checks." 2. An interactive **Command-Line (CLI) App** for viewing and scrolling through the scored feed directly in your terminal. 3. A **Flask App** for a simple, deployable web "mood reader" that displays the live HN feed. --- ## ✨ Features * **Interactive Fine-Tuning:** Use a Gradio interface to select your favorite Hacker News titles and fine-tune the `google/embeddinggemma-300m` model on your preferences. * **Semantic Search Evaluation:** See the immediate impact of your training by comparing semantic search results before and after fine-tuning. * **Data & Model Management:** Easily import additional training data, export the generated dataset, and download the fine-tuned model as a ZIP file. * **Hacker News Similarity Check:** View the live Hacker News feed with each story scored and color-coded based on the current model's understanding of your taste. * **Similarity Lamp:** Input any news title or text to get a real-time similarity score (its "vibe") against your personalized anchor. * **Interactive CLI:** A terminal-based mood reader with color-coded output, scrolling, and live refresh capabilities. * **Standalone Flask App:** A lightweight, read-only web app to continuously display the scored HN feed, perfect for simple deployment. --- ## 🔧 How It Works The core idea is to measure the "vibe" of a news title by calculating the semantic similarity between its embedding and the embedding of a fixed anchor phrase, defined in `config.py` as **`MY_FAVORITE_NEWS`**. 1. **Embedding:** The `sentence-transformers` library is used to convert news titles and the anchor phrase into high-dimensional vectors (embeddings). 2. **Scoring:** The cosine similarity (or dot product on normalized embeddings) between a title's embedding and the anchor's embedding is calculated. A higher score means a better "vibe." 3. **Fine-Tuning:** The Gradio app generates a contrastive learning dataset from your selections. * **Positive Pairs:** (`MY_FAVORITE_NEWS`, `[A title you selected]`) * **Negative Pairs:** (`MY_FAVORITE_NEWS`, `[A title you did not select]`) 4. **Training:** The model is trained using `MultipleNegativesRankingLoss`, which fine-tunes it to pull the embeddings of your "favorite" titles closer to the anchor phrase and push the others away. ## 🚀 Getting Started ### 1. Prerequisites * Python 3.12+ * Git ### 2. Installation ```bash # Clone the repository git clone https://huggingface.co/spaces/bebechien/news-vibe-checker cd news-vibe-checker # Create and activate a virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` # Install the required packages pip install -r requirements.txt ```` ### 3\. (Optional) Hugging Face Authentication If you plan to use gated models or push your fine-tuned model to the Hugging Face Hub, you need to authenticate. ```bash # Set your Hugging Face token as an environment variable export HF_TOKEN="your_hf_token_here" ``` ----- ## 🖥️ Running the Applications You can run any of the three applications depending on your needs. ### Option A: Interactive Fine-Tuning (Gradio App) This is the main application for creating and evaluating a personalized model. **▶️ To run:** ```bash python app.py ``` Navigate to the local URL provided (e.g., `http://127.0.0.1:7860`). ### Option B: Interactive Terminal Viewer (CLI App) This app runs directly in your terminal, allowing you to quickly see and scroll through the scored Hacker News feed. ![image](cli.png) **▶️ To run:** ```bash python cli_mood_reader.py ``` **Interactive Controls:** * **[↑|↓]** arrow keys to scroll through the story list. * **[SPACE]** to refresh the feed with the latest stories. * **[q]** to quit the application. You can also start it with options: ```bash # Specify a different model from Hugging Face python cli_mood_reader.py --model google/embeddinggemma-300m # Show 10 stories per screen instead of the default 15 python cli_mood_reader.py --top 10 ``` ### Option C: Standalone Web Viewer (Flask App) This app is a simple, read-only web page that fetches and displays the scored HN feed. It's ideal for deploying a finished model. ![image](flask.png) **▶️ To run:** ```bash # (Optional) Specify a model from the Hugging Face Hub export MOOD_MODEL="bebechien/embedding-gemma-finetuned-hn" # Run the Flask server python flask_app.py ``` Navigate to `http://127.0.0.1:5000` to see the results. ----- ## ⚙️ Configuration Key parameters can be adjusted in `config.py`: * `MODEL_NAME`: The base model to use for fine-tuning (e.g., `'google/embeddinggemma-300m'`). * `QUERY_ANCHOR`: The anchor text used for similarity scoring (e.g., `"MY_FAVORITE_NEWS"`). * `DEFAULT_MOOD_READER_MODEL`: The default model used by the Flask and CLI apps. * `HN_RSS_URL`: The RSS feed URL. * `CACHE_DURATION_SECONDS`: How long to cache the RSS feed data. ----- ## 📂 File Structure ``` . ├── app.py # Main Gradio application entry point ├── cli_mood_reader.py # Interactive command-line mood reader ├── cli.png # Screenshot for CLI app ├── flask_app.py # Standalone Flask application for mood reading ├── flask.png # Screenshot for Flask app ├── src/ # Source code for the application │ ├── config.py # Central configuration for all modules │ ├── data_fetcher.py # Fetches and caches the Hacker News RSS feed │ ├── hn_mood_reader.py # Core logic for fetching and scoring │ ├── model_trainer.py # Handles model loading and fine-tuning │ ├── session_manager.py # Manages user sessions and application state │ ├── ui.py # Defines the Gradio user interface │ └── vibe_logic.py # Calculates similarity scores and "vibe" status ├── requirements.txt # Python package dependencies ├── example_training_dataset.csv # Example dataset for training ├── README.md # This file ├── artifacts/ # Stores session-specific fine-tuned models and datasets (generated) └── templates/ # HTML templates for the Flask app ├── index.html └── error.html ```