File size: 3,871 Bytes
9264d5f
044555a
d77c52b
044555a
9264d5f
044555a
d77c52b
 
 
 
 
044555a
 
d77c52b
044555a
d77c52b
 
044555a
d77c52b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9264d5f
 
044555a
9264d5f
044555a
 
9264d5f
 
 
044555a
 
 
 
 
 
 
 
 
9264d5f
044555a
9264d5f
044555a
9264d5f
044555a
 
 
 
9264d5f
044555a
9264d5f
044555a
 
9264d5f
044555a
 
d77c52b
044555a
9264d5f
044555a
 
9264d5f
044555a
 
 
9264d5f
 
 
044555a
9264d5f
044555a
 
 
 
 
 
 
 
 
9264d5f
044555a
9264d5f
044555a
 
 
9264d5f
044555a
9264d5f
044555a
 
 
9264d5f
044555a
9264d5f
044555a
 
 
9264d5f
d77c52b
044555a
 
 
9264d5f
044555a
 
9264d5f
044555a
 
9264d5f
044555a
 
 
 
9264d5f
044555a
 
9264d5f
044555a
9264d5f
044555a
 
 
9264d5f
044555a
9264d5f
044555a
9264d5f
044555a
 
 
 
 
d77c52b
044555a
 
9264d5f
044555a
9264d5f
044555a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - text-classification
  - sentiment-analysis
  - distilbert
  - imdb
  - pytorch
pipeline_tag: text-classification
datasets:
  - imdb
metrics:
  - accuracy
  - f1
model-index:
  - name: ohanvi-sentiment-analysis
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          name: IMDb
          type: imdb
          split: test
        metrics:
          - type: accuracy
            value: 0.932
            name: Accuracy
          - type: f1
            value: 0.931
            name: F1
---

# 🎬 Ohanvi Sentiment Analysis

A fine-tuned **DistilBERT** model for binary sentiment analysis on movie reviews.
Given any text it predicts whether the sentiment is **positive** or **negative**.

## Model Details

| Attribute | Value |
|-----------|-------|
| **Base model** | `distilbert-base-uncased` |
| **Fine-tuned on** | [IMDb Movie Reviews](https://huggingface.co/datasets/imdb) |
| **Task** | Text Classification (Sentiment Analysis) |
| **Labels** | `positive` (1) / `negative` (0) |
| **Max sequence length** | 512 tokens |
| **Framework** | PyTorch + 🤗 Transformers |
| **License** | Apache 2.0 |

## Performance

Evaluated on the IMDb test split (25 000 samples):

| Metric | Score |
|--------|-------|
| Accuracy | ~93.2% |
| F1 (binary) | ~93.1% |

## Quick Start

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="ohanvi/ohanvi-sentiment-analysis",
)

result = classifier("This movie was absolutely fantastic!")
# → [{'label': 'positive', 'score': 0.9978}]

result = classifier("Terrible film, complete waste of time.")
# → [{'label': 'negative', 'score': 0.9965}]
```

## Training Details

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 3 |
| Batch size (train) | 16 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 10% |
| Optimiser | AdamW |
| LR scheduler | Linear with warmup |

### Training Data

The model was fine-tuned on the full [IMDb](https://huggingface.co/datasets/imdb) dataset:
- **Train**: 25 000 reviews (12 500 positive, 12 500 negative)
- **Test**: 25 000 reviews (12 500 positive, 12 500 negative)

### Training Environment

- Hardware: GPU (NVIDIA / Apple Silicon MPS)
- Mixed precision: fp16 (when CUDA available)
- Early stopping: patience = 2 epochs

## How to Use (Advanced)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "ohanvi/ohanvi-sentiment-analysis"
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

text   = "An outstanding film with incredible performances."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

probs      = torch.softmax(logits, dim=-1)
label_id   = probs.argmax().item()
label      = model.config.id2label[label_id]
confidence = probs[0][label_id].item()

print(f"Label: {label}  ({confidence:.1%})")
```

## Limitations

- Trained exclusively on **English** movie reviews; performance on other languages or domains may be lower.
- Very short texts (< 5 words) may produce less reliable results.
- The model inherits any biases present in the IMDb dataset.

## Citation

If you use this model, please cite:

```bibtex
@misc{ohanvi-sentiment-2026,
  title   = {Ohanvi Sentiment Analysis},
  author  = {Gourav Bansal},
  year    = {2026},
  url     = {https://huggingface.co/ohanvi/ohanvi-sentiment-analysis},
}
```

## Acknowledgements

Built with 🤗 [Transformers](https://github.com/huggingface/transformers),
🤗 [Datasets](https://github.com/huggingface/datasets), and
[Gradio](https://gradio.app/).