--- language: en tags: - sentiment-analysis - imdb - distilbert - transformers license: apache-2.0 datasets: - imdb --- # DistilBERT Sentiment Analysis Model This model is a fine-tuned version of `distilbert-base-uncased` for binary sentiment classification on the IMDB movie reviews dataset. ## Model Details ### Model Description - **Model type**: DistilBERT (transformer-based) - **Task**: Binary sentiment classification (positive/negative) - **Base Model**: `distilbert-base-uncased` - **Language**: English ### Training Details #### Training Data - **Dataset**: IMDB Movie Reviews - **Training Samples**: 16,000 - **Validation Samples**: 4,000 - **Test Samples**: 5,000 - **Class Distribution**: 50% positive, 50% negative #### Training Procedure - **Epochs**: 3 - **Batch Size**: 16 - **Learning Rate**: 2e-05 - **Max Sequence Length**: 512 - **Optimizer**: AdamW with weight decay (0.01) - **Scheduler**: Linear with 10% warmup #### Evaluation Results - **Test Accuracy**: 0.9460 - **Test F1 Score**: 0.9723 - **Best Validation Accuracy**: 0.9300 - **Training Time**: ~6 minutes on Google Colab T4 GPU ## How to Use ### Direct Inference ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "Hums003/distilbert-imdb-sentiment" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare text text = "This movie was absolutely fantastic! I loved every minute of it." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) # Interpret results sentiment = "positive" if predictions[0][1] > 0.5 else "negative" confidence = predictions[0][1].item() if predictions[0][1] > 0.5 else predictions[0][0].item() print(f"Sentiment: {sentiment} (confidence: {confidence:.2%})") ```