File size: 2,107 Bytes
ef10414
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
language: bn
tags:
- hate-speech-detection
- bangla
- bert
- binary-classification
license: mit
---

# Bangla Hate Speech Detection Model

This model is fine-tuned for binary hate speech detection in Bangla text.

## Model Description

- **Base Model**: sagorsarker/bangla-bert-base
- **Task**: Binary Classification (Hate Speech vs Non-Hate Speech)
- **Language**: Bangla (Bengali)
- **Training Method**: Baseline training only (original behavior)

## Training Details

### Training Hyperparameters

- **Batch Size**: 64
- **Learning Rate**: 3e-05
- **Epochs**: 30
- **Max Sequence Length**: 128
- **Dropout**: 0.1
- **Weight Decay**: 0.01
- **Warmup Ratio**: 0.1

### Training Data

- **K-Fold Cross-Validation**: 5 folds
- **Stratification**: binary

## Performance

*Add your metrics here after training*

## Usage

```python
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn as nn
import json

# Load model components
encoder = AutoModel.from_pretrained("path/to/model")

with open("path/to/model/classifier_config.json", 'r') as f:
    c_config = json.load(f)

classifier = nn.Sequential(
    nn.Linear(c_config['hidden_size'], 256),
    nn.ReLU(),
    nn.Dropout(0.1),
    nn.Linear(256, c_config['num_labels'])
)
classifier.load_state_dict(torch.load("path/to/model/classifier.pt"))

tokenizer = AutoTokenizer.from_pretrained("path/to/model")

# Predict
def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = encoder(**inputs)
        cls_embedding = outputs.last_hidden_state[:, 0, :]
        logits = classifier(cls_embedding)
        prob = torch.sigmoid(logits).item()
    return prob

text = "আপনার বাংলা টেক্সট এখানে"
prob = predict(text)
print(f"Hate Speech Probability: {prob:.4f}")
```

## Citation

If you use this model, please cite:

```bibtex
@misc{bangla-hate-speech-model,
  author = {Nabil},
  title = {Bangla Hate Speech Detection Model},
  year = {2026},
  publisher = {HuggingFace},
}
```

## License

MIT License