🔥 ALFAAZ — Punjabi Hate Speech & Sarcasm Detection
ALFAAZ is an AI-powered Punjabi NLP system designed to classify Punjabi text into:
- ✅ Non-Toxic Content
- ⚠️ Toxic / Hate Speech
- 🎭 Sarcastic Content
The system uses a fine-tuned MuRIL transformer model for contextual Punjabi text understanding and real-time inference.
🌐 Live Demo
Frontend Website
🧠 Model Overview
| Parameter | Value |
|---|---|
| Base Model | google/muril-base-cased |
| Architecture | Transformer (12 Layers) |
| Classes | Non-Toxic, Toxic, Sarcastic |
| Dataset Size | 2,820 Punjabi Entries |
| Max Sequence Length | 128 |
| Optimizer | AdamW |
| Accuracy | 88.4% |
| Weighted F1 Score | 0.883 |
⚙️ Pipeline
Punjabi Input Text
↓
Language Validation
↓
Preprocessing Pipeline
↓
MuRIL Tokenization
↓
Fine-Tuned MuRIL Model
↓
Softmax Classification
↓
Prediction Response
🧹 Preprocessing Steps
The preprocessing pipeline includes:
- Unicode normalization
- URL removal
- Mention removal
- Noise cleaning
- Stopword filtering
- Token preparation
The system also supports limited Punjabi-English code-mixed text.
🚀 API Usage
Endpoint
POST /predict
Example Request
{
"text": "ਤੂੰ ਬਿਲਕੁਲ ਬੇਕਾਰ ਹੈਂ"
}
Example Response
{
"label": "Toxic",
"confidence": 96.4,
"probabilities": {
"Non-Toxic": 1.2,
"Toxic": 96.4,
"Sarcastic": 2.4
}
}
💻 Tech Stack
Machine Learning
- PyTorch
- HuggingFace Transformers
- MuRIL
Backend
- Flask API
- Docker
NLP Libraries
- IndicNLP
- NLTK
- NumPy
- Pandas
⚠️ Important Notes
- The model is optimized for Punjabi Gurmukhi text.
- Unsupported scripts such as Hindi Devanagari, Arabic, and Urdu are rejected.
- Predictions may vary for highly ambiguous sarcastic expressions.
📌 Research Objective
The project aims to address the lack of advanced moderation systems for Punjabi regional language content and contribute toward safer multilingual social media environments.
👨💻 Author
Amandeep Singh
AI/ML Developer • NLP Enthusiast • Full Stack Developer
⭐ Support
If you found this project useful:
- ⭐ Star the GitHub repository
- 🚀 Share the project
- 🧠 Contribute improvements
🔗 GitHub Repository
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support