File size: 2,661 Bytes
f1d8a98 c809907 f1d8a98 c809907 f1d8a98 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | ---
license: mit
---
# Sentiment Analysis Model (Vibescribe)
Vibescribe built with Hugging Face Transformers, fine-tuned on IMDB reviews.
## Setup
1. Clone the repository:
```bash
git clone https://github.com/your-username/sentiment-analysis
cd sentiment-analysis
```
2. Create virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Log in to Hugging Face:
```bash
huggingface-cli login
```
## Project Structure
```
sentiment-analysis/
βββ requirements.txt
βββ train.py
βββ inference.py
βββ utils.py
βββ README.md
```
## Files to Create
### requirements.txt
```
transformers==4.37.2
datasets==2.16.1
torch==2.1.2
scikit-learn==1.4.0
```
### utils.py
```python
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
return {
'accuracy': accuracy_score(labels, preds),
'f1': f1,
'precision': precision,
'recall': recall
}
```
### inference.py
```python
from transformers import pipeline
def load_model(model_path):
return pipeline("sentiment-analysis", model=model_path)
def predict(classifier, text):
return classifier(text)
if __name__ == "__main__":
model_path = "your-username/sentiment-analysis-model"
classifier = load_model(model_path)
# Example prediction
text = "This movie was really great!"
result = predict(classifier, text)
print(f"Text: {text}\nSentiment: {result}")
```
## Training
1. Update model configuration in `train.py`:
```python
training_args = TrainingArguments(
output_dir="sentiment-analysis-model",
hub_model_id="your-username/sentiment-analysis-model", # Change this
...
)
```
2. Start training:
```bash
python train.py
```
## Making Predictions
```python
from inference import load_model, predict
classifier = load_model("your-username/sentiment-analysis-model")
result = predict(classifier, "Your text here")
```
## Model Details
- Base model: DistilBERT
- Dataset: IMDB Reviews
- Task: Binary sentiment classification (positive/negative)
- Training time: ~2-3 hours on GPU
- Model size: ~260MB
## Performance Metrics
- Accuracy: ~91-93%
- F1 Score: ~91-92%
- Precision: ~90-91%
- Recall: ~91-92%
## Contributing
1. Fork the repository
2. Create feature branch
3. Commit changes
4. Push to branch
5. Open pull request
## License
MIT License |