--- license: mit --- # Sentiment Analysis Model (Vibescribe) Vibescribe built with Hugging Face Transformers, fine-tuned on IMDB reviews. ## Setup 1. Clone the repository: ```bash git clone https://github.com/your-username/sentiment-analysis cd sentiment-analysis ``` 2. Create virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` 4. Log in to Hugging Face: ```bash huggingface-cli login ``` ## Project Structure ``` sentiment-analysis/ ├── requirements.txt ├── train.py ├── inference.py ├── utils.py └── README.md ``` ## Files to Create ### requirements.txt ``` transformers==4.37.2 datasets==2.16.1 torch==2.1.2 scikit-learn==1.4.0 ``` ### utils.py ```python from sklearn.metrics import accuracy_score, precision_recall_fscore_support def compute_metrics(pred): labels = pred.label_ids preds = pred.predictions.argmax(-1) precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary') return { 'accuracy': accuracy_score(labels, preds), 'f1': f1, 'precision': precision, 'recall': recall } ``` ### inference.py ```python from transformers import pipeline def load_model(model_path): return pipeline("sentiment-analysis", model=model_path) def predict(classifier, text): return classifier(text) if __name__ == "__main__": model_path = "your-username/sentiment-analysis-model" classifier = load_model(model_path) # Example prediction text = "This movie was really great!" result = predict(classifier, text) print(f"Text: {text}\nSentiment: {result}") ``` ## Training 1. Update model configuration in `train.py`: ```python training_args = TrainingArguments( output_dir="sentiment-analysis-model", hub_model_id="your-username/sentiment-analysis-model", # Change this ... ) ``` 2. Start training: ```bash python train.py ``` ## Making Predictions ```python from inference import load_model, predict classifier = load_model("your-username/sentiment-analysis-model") result = predict(classifier, "Your text here") ``` ## Model Details - Base model: DistilBERT - Dataset: IMDB Reviews - Task: Binary sentiment classification (positive/negative) - Training time: ~2-3 hours on GPU - Model size: ~260MB ## Performance Metrics - Accuracy: ~91-93% - F1 Score: ~91-92% - Precision: ~90-91% - Recall: ~91-92% ## Contributing 1. Fork the repository 2. Create feature branch 3. Commit changes 4. Push to branch 5. Open pull request ## License MIT License