--- language: bn tags: - hate-speech-detection - bangla - bert - binary-classification license: mit --- # Bangla Hate Speech Detection Model This model is fine-tuned for binary hate speech detection in Bangla text. ## Model Description - **Base Model**: microsoft/xtremedistil-l12-h384-uncased - **Task**: Binary Classification (Hate Speech vs Non-Hate Speech) - **Language**: Bangla (Bengali) - **Training Method**: Baseline training only (original behavior) ## Training Details ### Training Hyperparameters - **Batch Size**: 32 - **Learning Rate**: 3e-05 - **Epochs**: 30 - **Max Sequence Length**: 128 - **Dropout**: 0.1 - **Weight Decay**: 0.01 - **Warmup Ratio**: 0.1 ### Training Data - **K-Fold Cross-Validation**: 5 folds - **Stratification**: binary ## Performance *Add your metrics here after training* ## Usage ```python from transformers import AutoModel, AutoTokenizer import torch import torch.nn as nn import json # Load model components encoder = AutoModel.from_pretrained("path/to/model") with open("path/to/model/classifier_config.json", 'r') as f: c_config = json.load(f) classifier = nn.Sequential( nn.Linear(c_config['hidden_size'], 256), nn.ReLU(), nn.Dropout(0.1), nn.Linear(256, c_config['num_labels']) ) classifier.load_state_dict(torch.load("path/to/model/classifier.pt")) tokenizer = AutoTokenizer.from_pretrained("path/to/model") # Predict def predict(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): outputs = encoder(**inputs) cls_embedding = outputs.last_hidden_state[:, 0, :] logits = classifier(cls_embedding) prob = torch.sigmoid(logits).item() return prob text = "আপনার বাংলা টেক্সট এখানে" prob = predict(text) print(f"Hate Speech Probability: {prob:.4f}") ``` ## Citation If you use this model, please cite: ```bibtex @misc{bangla-hate-speech-model, author = {Nabil}, title = {Bangla Hate Speech Detection Model}, year = {2026}, publisher = {HuggingFace}, } ``` ## License MIT License