|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
# Sentiment Analysis Model (Vibescribe) |
|
|
|
|
|
Vibescribe built with Hugging Face Transformers, fine-tuned on IMDB reviews. |
|
|
|
|
|
## Setup |
|
|
|
|
|
1. Clone the repository: |
|
|
```bash |
|
|
git clone https://github.com/your-username/sentiment-analysis |
|
|
cd sentiment-analysis |
|
|
``` |
|
|
|
|
|
2. Create virtual environment: |
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
``` |
|
|
|
|
|
3. Install dependencies: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
4. Log in to Hugging Face: |
|
|
```bash |
|
|
huggingface-cli login |
|
|
``` |
|
|
|
|
|
## Project Structure |
|
|
``` |
|
|
sentiment-analysis/ |
|
|
βββ requirements.txt |
|
|
βββ train.py |
|
|
βββ inference.py |
|
|
βββ utils.py |
|
|
βββ README.md |
|
|
``` |
|
|
|
|
|
## Files to Create |
|
|
|
|
|
### requirements.txt |
|
|
``` |
|
|
transformers==4.37.2 |
|
|
datasets==2.16.1 |
|
|
torch==2.1.2 |
|
|
scikit-learn==1.4.0 |
|
|
``` |
|
|
|
|
|
### utils.py |
|
|
```python |
|
|
from sklearn.metrics import accuracy_score, precision_recall_fscore_support |
|
|
|
|
|
def compute_metrics(pred): |
|
|
labels = pred.label_ids |
|
|
preds = pred.predictions.argmax(-1) |
|
|
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary') |
|
|
return { |
|
|
'accuracy': accuracy_score(labels, preds), |
|
|
'f1': f1, |
|
|
'precision': precision, |
|
|
'recall': recall |
|
|
} |
|
|
``` |
|
|
|
|
|
### inference.py |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
def load_model(model_path): |
|
|
return pipeline("sentiment-analysis", model=model_path) |
|
|
|
|
|
def predict(classifier, text): |
|
|
return classifier(text) |
|
|
|
|
|
if __name__ == "__main__": |
|
|
model_path = "your-username/sentiment-analysis-model" |
|
|
classifier = load_model(model_path) |
|
|
|
|
|
# Example prediction |
|
|
text = "This movie was really great!" |
|
|
result = predict(classifier, text) |
|
|
print(f"Text: {text}\nSentiment: {result}") |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
1. Update model configuration in `train.py`: |
|
|
```python |
|
|
training_args = TrainingArguments( |
|
|
output_dir="sentiment-analysis-model", |
|
|
hub_model_id="your-username/sentiment-analysis-model", # Change this |
|
|
... |
|
|
) |
|
|
``` |
|
|
|
|
|
2. Start training: |
|
|
```bash |
|
|
python train.py |
|
|
``` |
|
|
|
|
|
## Making Predictions |
|
|
|
|
|
```python |
|
|
from inference import load_model, predict |
|
|
|
|
|
classifier = load_model("your-username/sentiment-analysis-model") |
|
|
result = predict(classifier, "Your text here") |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- Base model: DistilBERT |
|
|
- Dataset: IMDB Reviews |
|
|
- Task: Binary sentiment classification (positive/negative) |
|
|
- Training time: ~2-3 hours on GPU |
|
|
- Model size: ~260MB |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
- Accuracy: ~91-93% |
|
|
- F1 Score: ~91-92% |
|
|
- Precision: ~90-91% |
|
|
- Recall: ~91-92% |
|
|
|
|
|
## Contributing |
|
|
|
|
|
1. Fork the repository |
|
|
2. Create feature branch |
|
|
3. Commit changes |
|
|
4. Push to branch |
|
|
5. Open pull request |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |