Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
title: Social Media Virality Assistant
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.9.0
app_file: app.py
pinned: false
π Social Media Virality Assistant
A machine learning-powered tool that helps content creators predict and optimize their video virality potential using trained XGBoost model and Google Gemini AI.
ποΈ Architecture & Pipeline
This project consists of two main components: a training pipeline (model-prep.py) and an inference application (app.py).
1. Training Pipeline (model-prep.py)
the model-prep.py script handles the end-to-end model creation process:
- Cloud Data Loading: It fetches the latest synthetic dataset directly from Hugging Face (
MatanKriel/social-assitent-synthetic-data). - Embedding Benchmark: It evaluates 3 state-of-the-art models (
MiniLM,mpnet-base,bge-small) using Silhouette Score on Composite Labels (Category_ViralClass).- Why? Instead of just clustering by topic (e.g., "Gaming"), this forces the model to distinguish between "Viral Gaming Videos" and "Average Gaming Videos".
- Selection: Automatically picks the best model for this high-resolution task.
- Feature Engineering:
- Encodes categorical inputs:
category,gender,day_of_week,age. - Combines text embeddings with metadata (
followers,duration,hour).
- Encodes categorical inputs:
- Model Training: Trains and compares three regression algorithms:
- Linear Regression
- Random Forest
- XGBoost (Winner): Selected for having the lowest RMSE.
- Artifact Generation: Saves the trained model locally (
viral_model.pkl) and generates performance plots (project_plots/).
2. Inference Application (app.py)
The app.py script runs a Gradio web interface that pulls artifacts from the cloud at startup:
- Initialization:
- Downloads the trained
viral_model.pklfrom Hugging Face (MatanKriel/social-assitent-viral-predictor). - Downloads the dataset to build a Knowledge Base.
- Generates embeddings on-the-fly for the Knowledge Base.
- Downloads the trained
- Core Features:
- Virality Prediction: Predicts raw view counts based on your draft description and stats.
- AI Optimization: Uses Google Gemini to rewrite your description with viral hooks and hashtags with the context of top 3 similar videos from the dataset.
- Semantic Search: Finds similar successful videos from the knowledge base using Cosine Similarity.
π Model Performance
The training script (model-prep.py) automatically generates these benchmarks:
Embedding Model Comparison
We selected the embedding model that best balances speed and semantic understanding.

Regression Model Comparison
We chose the regressor with the lowest error (RMSE) and highest explained variance (RΒ²).

π οΈ Tech Stack
This project is built using:
- App:
gradio,google-generativeai - ML:
xgboost,scikit-learn,sentence-transformers - Data:
pandas,numpy - Cloud:
huggingface_hub,datasets