Instructions to use yunqili4/cs410-final-project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yunqili4/cs410-final-project with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="yunqili4/cs410-final-project")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("yunqili4/cs410-final-project", dtype="auto") - Notebooks
- Google Colab
- Kaggle
CS410 Final Project -- Amazon Review Summarization and Sentiment Analysis
Dataset
Available here, we use four categories, All_Beauty, Digital_Music, Handmade_Product, and Health_and_Personal_Care.
Workflow
Data preprocessing -> Sentiment classification (group positive and negative reviews to proceed) -> Fine-tuning summarization model on training data -> Evaluate summarization model on test data.
Models
- Sentiment classification uses pre-trained DistillBERT.
- Review summarization uses facebook/bart-large-cnn fine-tuned on category of review dataset.
Layout
checkpointsfolder contains fine-tuned models for each specific categories of dataset.srcfolder contains source code.docsrecords experiments results.
Usage
Run sentiment classification
python src/classification.py [category]
Run fine-tuning
python src/finetune.py [category]
Run summarization, you should firstly obtain an Anthropic Claude API key, and
export ANTHROPIC_API_KEY='your-api-key-here'
then
python src/summarization.py
Model tree for yunqili4/cs410-final-project
Base model
facebook/bart-large-cnn