Spaces:

MatanKriel
/

social-assistent

Sleeping

App Files Files Community

social-assistent / README.md

Matan Kriel

updated clustering metric in model test

2f9170f 17 days ago

preview code

raw

history blame contribute delete

3.27 kB

	---
	title: Social Media Virality Assistant
	emoji: 🚀
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.9.0
	app_file: app.py
	pinned: false
	---

	# 🚀 Social Media Virality Assistant

	A machine learning-powered tool that helps content creators predict and optimize their video virality potential using trained XGBoost model and Google Gemini AI.

	## 🏗️ Architecture & Pipeline

	This project consists of two main components: a training pipeline (`model-prep.py`) and an inference application (`app.py`).

	### 1. Training Pipeline (`model-prep.py`)
	the `model-prep.py` script handles the end-to-end model creation process:

	1. Cloud Data Loading: It fetches the latest synthetic dataset directly from Hugging Face (`MatanKriel/social-assitent-synthetic-data`).
	2. Embedding Benchmark: It evaluates 3 state-of-the-art models (`MiniLM`, `mpnet-base`, `bge-small`) using Silhouette Score on Composite Labels (`Category_ViralClass`).
	* Why? Instead of just clustering by topic (e.g., "Gaming"), this forces the model to distinguish between "Viral Gaming Videos" and "Average Gaming Videos".
	* Selection: Automatically picks the best model for this high-resolution task.
	3. Feature Engineering:
	* Encodes categorical inputs: `category`, `gender`, `day_of_week`, `age`.
	* Combines text embeddings with metadata (`followers`, `duration`, `hour`).
	4. Model Training: Trains and compares three regression algorithms:
	* Linear Regression
	* Random Forest
	* XGBoost (Winner): Selected for having the lowest RMSE.
	5. Artifact Generation: Saves the trained model locally (`viral_model.pkl`) and generates performance plots (`project_plots/`).

	### 2. Inference Application (`app.py`)
	The `app.py` script runs a Gradio web interface that pulls artifacts from the cloud at startup:

	1. Initialization:
	* Downloads the trained `viral_model.pkl` from Hugging Face (`MatanKriel/social-assitent-viral-predictor`).
	* Downloads the dataset to build a Knowledge Base.
	* Generates embeddings on-the-fly for the Knowledge Base.
	2. Core Features:
	* Virality Prediction: Predicts raw view counts based on your draft description and stats.
	* AI Optimization: Uses Google Gemini to rewrite your description with viral hooks and hashtags with the context of top 3 similar videos from the dataset.
	* Semantic Search: Finds similar successful videos from the knowledge base using Cosine Similarity.

	---

	## 📊 Model Performance

	The training script (`model-prep.py`) automatically generates these benchmarks:

	### Embedding Model Comparison
	We selected the embedding model that best balances speed and semantic understanding.
	![Embedding Benchmark](project_plots/embedding_benchmark.png)

	### Regression Model Comparison
	We chose the regressor with the lowest error (RMSE) and highest explained variance (R²).
	![Model Comparison](project_plots/regression_comparison.png)

	---

	## 🛠️ Tech Stack
	This project is built using:
	* App: `gradio`, `google-generativeai`
	* ML: `xgboost`, `scikit-learn`, `sentence-transformers`
	* Data: `pandas`, `numpy`
	* Cloud: `huggingface_hub`, `datasets`

	---