---
language: en
tags:
- sentiment-analysis
- imdb
- distilbert
- transformers
license: apache-2.0
datasets:
- imdb
---

# DistilBERT Sentiment Analysis Model

This model is a fine-tuned version of `distilbert-base-uncased` for binary sentiment classification on the IMDB movie reviews dataset.

## Model Details

### Model Description
- **Model type**: DistilBERT (transformer-based)
- **Task**: Binary sentiment classification (positive/negative)
- **Base Model**: `distilbert-base-uncased`
- **Language**: English

### Training Details

#### Training Data
- **Dataset**: IMDB Movie Reviews
- **Training Samples**: 16,000
- **Validation Samples**: 4,000
- **Test Samples**: 5,000
- **Class Distribution**: 50% positive, 50% negative

#### Training Procedure
- **Epochs**: 3
- **Batch Size**: 16
- **Learning Rate**: 2e-05
- **Max Sequence Length**: 512
- **Optimizer**: AdamW with weight decay (0.01)
- **Scheduler**: Linear with 10% warmup

#### Evaluation Results
- **Test Accuracy**: 0.9460
- **Test F1 Score**: 0.9723
- **Best Validation Accuracy**: 0.9300
- **Training Time**: ~6 minutes on Google Colab T4 GPU

## How to Use

### Direct Inference
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "Hums003/distilbert-imdb-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare text
text = "This movie was absolutely fantastic! I loved every minute of it."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Interpret results
sentiment = "positive" if predictions[0][1] > 0.5 else "negative"
confidence = predictions[0][1].item() if predictions[0][1] > 0.5 else predictions[0][0].item()
print(f"Sentiment: {sentiment} (confidence: {confidence:.2%})")
```