File size: 2,661 Bytes

f1d8a98
 
 
c809907
f1d8a98
c809907
f1d8a98

---
license: mit
---
# Sentiment Analysis Model (Vibescribe)

Vibescribe built with Hugging Face Transformers, fine-tuned on IMDB reviews.

## Setup

1. Clone the repository:
```bash
git clone https://github.com/your-username/sentiment-analysis
cd sentiment-analysis
```

2. Create virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

4. Log in to Hugging Face:
```bash
huggingface-cli login
```

## Project Structure
```
sentiment-analysis/
├── requirements.txt
├── train.py
├── inference.py
├── utils.py
└── README.md
```

## Files to Create

### requirements.txt
```
transformers==4.37.2
datasets==2.16.1
torch==2.1.2
scikit-learn==1.4.0
```

### utils.py
```python
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    return {
        'accuracy': accuracy_score(labels, preds),
        'f1': f1,
        'precision': precision,
        'recall': recall
    }
```

### inference.py
```python
from transformers import pipeline

def load_model(model_path):
    return pipeline("sentiment-analysis", model=model_path)

def predict(classifier, text):
    return classifier(text)

if __name__ == "__main__":
    model_path = "your-username/sentiment-analysis-model"
    classifier = load_model(model_path)
    
    # Example prediction
    text = "This movie was really great!"
    result = predict(classifier, text)
    print(f"Text: {text}\nSentiment: {result}")
```

## Training

1. Update model configuration in `train.py`:
```python
training_args = TrainingArguments(
    output_dir="sentiment-analysis-model",
    hub_model_id="your-username/sentiment-analysis-model",  # Change this
    ...
)
```

2. Start training:
```bash
python train.py
```

## Making Predictions

```python
from inference import load_model, predict

classifier = load_model("your-username/sentiment-analysis-model")
result = predict(classifier, "Your text here")
```

## Model Details

- Base model: DistilBERT
- Dataset: IMDB Reviews
- Task: Binary sentiment classification (positive/negative)
- Training time: ~2-3 hours on GPU
- Model size: ~260MB

## Performance Metrics

- Accuracy: ~91-93%
- F1 Score: ~91-92%
- Precision: ~90-91%
- Recall: ~91-92%

## Contributing

1. Fork the repository
2. Create feature branch
3. Commit changes
4. Push to branch
5. Open pull request

## License

MIT License