Spaces:
Sleeping
Sleeping
| title: Social Media Virality Assistant | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.9.0 | |
| app_file: app.py | |
| pinned: false | |
| # π Social Media Virality Assistant | |
| A machine learning-powered tool that helps content creators predict and optimize their video virality potential using trained **XGBoost** model and **Google Gemini AI**. | |
| ## ποΈ Architecture & Pipeline | |
| This project consists of two main components: a training pipeline (`model-prep.py`) and an inference application (`app.py`). | |
| ### 1. Training Pipeline (`model-prep.py`) | |
| the `model-prep.py` script handles the end-to-end model creation process: | |
| 1. **Cloud Data Loading**: It fetches the latest synthetic dataset directly from **Hugging Face** (`MatanKriel/social-assitent-synthetic-data`). | |
| 2. **Embedding Benchmark**: It evaluates 3 state-of-the-art models (`MiniLM`, `mpnet-base`, `bge-small`) using **Silhouette Score** on **Composite Labels** (`Category_ViralClass`). | |
| * *Why?* Instead of just clustering by topic (e.g., "Gaming"), this forces the model to distinguish between "Viral Gaming Videos" and "Average Gaming Videos". | |
| * *Selection*: Automatically picks the best model for this high-resolution task. | |
| 3. **Feature Engineering**: | |
| * Encodes categorical inputs: `category`, `gender`, `day_of_week`, `age`. | |
| * Combines text embeddings with metadata (`followers`, `duration`, `hour`). | |
| 4. **Model Training**: Trains and compares three regression algorithms: | |
| * Linear Regression | |
| * Random Forest | |
| * **XGBoost (Winner)**: Selected for having the lowest RMSE. | |
| 5. **Artifact Generation**: Saves the trained model locally (`viral_model.pkl`) and generates performance plots (`project_plots/`). | |
| ### 2. Inference Application (`app.py`) | |
| The `app.py` script runs a **Gradio** web interface that pulls artifacts from the cloud at startup: | |
| 1. **Initialization**: | |
| * Downloads the trained `viral_model.pkl` from Hugging Face (`MatanKriel/social-assitent-viral-predictor`). | |
| * Downloads the dataset to build a Knowledge Base. | |
| * Generates embeddings on-the-fly for the Knowledge Base. | |
| 2. **Core Features**: | |
| * **Virality Prediction**: Predicts raw view counts based on your draft description and stats. | |
| * **AI Optimization**: Uses **Google Gemini** to rewrite your description with viral hooks and hashtags with the context of top 3 similar videos from the dataset. | |
| * **Semantic Search**: Finds similar successful videos from the knowledge base using Cosine Similarity. | |
| --- | |
| ## π Model Performance | |
| The training script (`model-prep.py`) automatically generates these benchmarks: | |
| ### Embedding Model Comparison | |
| We selected the embedding model that best balances speed and semantic understanding. | |
|  | |
| ### Regression Model Comparison | |
| We chose the regressor with the lowest error (RMSE) and highest explained variance (RΒ²). | |
|  | |
| --- | |
| ## π οΈ Tech Stack | |
| This project is built using: | |
| * **App**: `gradio`, `google-generativeai` | |
| * **ML**: `xgboost`, `scikit-learn`, `sentence-transformers` | |
| * **Data**: `pandas`, `numpy` | |
| * **Cloud**: `huggingface_hub`, `datasets` | |
| --- | |