YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
IMDB Sentiment Analysis with TF-IDF + Neural Network
This project trains a simple neural network for binary sentiment analysis on an IMDB review dataset. It is designed for a GitHub CI/CD workflow that automatically trains the model and uploads the artifacts to Hugging Face Hub.
Project idea
The model uses:
- TF-IDF text vectorization for movie reviews.
- A feedforward neural network implemented with PyTorch.
- Binary Cross Entropy with logits for positive/negative classification.
- GitHub Actions for automatic training and deployment.
Repository structure
.
βββ data/
β βββ imdb_balanced_10k.csv or imdb_top_500.csv
βββ model/
β βββ model.pt
β βββ vectorizer.pkl
β βββ config.json
β βββ metrics.json
βββ train.py
βββ predict.py
βββ requirements.txt
βββ .github/workflows/train-and-upload.yml
Dataset
Put one of these files inside the data/ folder:
imdb_balanced_10k.csvimdb_top_500.csv
The code automatically detects common text columns such as review or text, and label columns such as sentiment or label. Labels can be positive/negative or 1/0.
Train locally
pip install -r requirements.txt
python train.py
The training script saves:
model/model.ptmodel/vectorizer.pklmodel/config.jsonmodel/metrics.json
Predict locally
python predict.py "This movie is wonderful and exciting."
CI/CD deployment
On every push to main or master, GitHub Actions will:
- Install dependencies.
- Run
python train.py. - Run a small prediction test.
- Upload the trained model artifacts to Hugging Face Hub.
Before pushing, edit this line in .github/workflows/train-and-upload.yml:
repo_id = "YOUR_HF_USERNAME/imdb-tfidf-neural-net"
Replace YOUR_HF_USERNAME with your Hugging Face username.
Also add a GitHub Actions secret:
HF_TOKEN = your Hugging Face write token
Final submission
Submit:
- GitHub repository URL
- Hugging Face model repository URL
- Downloads last month
- 45