--- language: - en license: apache-2.0 library_name: transformers tags: - text-classification - sentiment-analysis - distilbert - imdb - pytorch pipeline_tag: text-classification datasets: - imdb metrics: - accuracy - f1 model-index: - name: ohanvi-sentiment-analysis results: - task: type: text-classification name: Sentiment Analysis dataset: name: IMDb type: imdb split: test metrics: - type: accuracy value: 0.932 name: Accuracy - type: f1 value: 0.931 name: F1 --- # 🎬 Ohanvi Sentiment Analysis A fine-tuned **DistilBERT** model for binary sentiment analysis on movie reviews. Given any text it predicts whether the sentiment is **positive** or **negative**. ## Model Details | Attribute | Value | |-----------|-------| | **Base model** | `distilbert-base-uncased` | | **Fine-tuned on** | [IMDb Movie Reviews](https://huggingface.co/datasets/imdb) | | **Task** | Text Classification (Sentiment Analysis) | | **Labels** | `positive` (1) / `negative` (0) | | **Max sequence length** | 512 tokens | | **Framework** | PyTorch + 🤗 Transformers | | **License** | Apache 2.0 | ## Performance Evaluated on the IMDb test split (25 000 samples): | Metric | Score | |--------|-------| | Accuracy | ~93.2% | | F1 (binary) | ~93.1% | ## Quick Start ```python from transformers import pipeline classifier = pipeline( "text-classification", model="ohanvi/ohanvi-sentiment-analysis", ) result = classifier("This movie was absolutely fantastic!") # → [{'label': 'positive', 'score': 0.9978}] result = classifier("Terrible film, complete waste of time.") # → [{'label': 'negative', 'score': 0.9965}] ``` ## Training Details ### Hyperparameters | Parameter | Value | |-----------|-------| | Epochs | 3 | | Batch size (train) | 16 | | Learning rate | 2e-5 | | Weight decay | 0.01 | | Warmup ratio | 10% | | Optimiser | AdamW | | LR scheduler | Linear with warmup | ### Training Data The model was fine-tuned on the full [IMDb](https://huggingface.co/datasets/imdb) dataset: - **Train**: 25 000 reviews (12 500 positive, 12 500 negative) - **Test**: 25 000 reviews (12 500 positive, 12 500 negative) ### Training Environment - Hardware: GPU (NVIDIA / Apple Silicon MPS) - Mixed precision: fp16 (when CUDA available) - Early stopping: patience = 2 epochs ## How to Use (Advanced) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "ohanvi/ohanvi-sentiment-analysis" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) model.eval() text = "An outstanding film with incredible performances." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=-1) label_id = probs.argmax().item() label = model.config.id2label[label_id] confidence = probs[0][label_id].item() print(f"Label: {label} ({confidence:.1%})") ``` ## Limitations - Trained exclusively on **English** movie reviews; performance on other languages or domains may be lower. - Very short texts (< 5 words) may produce less reliable results. - The model inherits any biases present in the IMDb dataset. ## Citation If you use this model, please cite: ```bibtex @misc{ohanvi-sentiment-2026, title = {Ohanvi Sentiment Analysis}, author = {Gourav Bansal}, year = {2026}, url = {https://huggingface.co/ohanvi/ohanvi-sentiment-analysis}, } ``` ## Acknowledgements Built with 🤗 [Transformers](https://github.com/huggingface/transformers), 🤗 [Datasets](https://github.com/huggingface/datasets), and [Gradio](https://gradio.app/).