🔥 ALFAAZ — Punjabi Hate Speech & Sarcasm Detection

ALFAAZ is an AI-powered Punjabi NLP system designed to classify Punjabi text into:

✅ Non-Toxic Content
⚠️ Toxic / Hate Speech
🎭 Sarcastic Content

The system uses a fine-tuned MuRIL transformer model for contextual Punjabi text understanding and real-time inference.

🌐 Live Demo

Frontend Website

https://alfaaz-ai.vercel.app/

🧠 Model Overview

Parameter	Value
Base Model	google/muril-base-cased
Architecture	Transformer (12 Layers)
Classes	Non-Toxic, Toxic, Sarcastic
Dataset Size	2,820 Punjabi Entries
Max Sequence Length	128
Optimizer	AdamW
Accuracy	88.4%
Weighted F1 Score	0.883

⚙️ Pipeline

Punjabi Input Text
        ↓
Language Validation
        ↓
Preprocessing Pipeline
        ↓
MuRIL Tokenization
        ↓
Fine-Tuned MuRIL Model
        ↓
Softmax Classification
        ↓
Prediction Response

🧹 Preprocessing Steps

The preprocessing pipeline includes:

Unicode normalization
URL removal
Mention removal
Noise cleaning
Stopword filtering
Token preparation

The system also supports limited Punjabi-English code-mixed text.

🚀 API Usage

Endpoint

POST /predict

Example Request

{
  "text": "ਤੂੰ ਬਿਲਕੁਲ ਬੇਕਾਰ ਹੈਂ"
}

Example Response

{
  "label": "Toxic",
  "confidence": 96.4,
  "probabilities": {
    "Non-Toxic": 1.2,
    "Toxic": 96.4,
    "Sarcastic": 2.4
  }
}

💻 Tech Stack

Machine Learning

PyTorch
HuggingFace Transformers
MuRIL

Backend

Flask API
Docker

NLP Libraries

IndicNLP
NLTK
NumPy
Pandas

⚠️ Important Notes

The model is optimized for Punjabi Gurmukhi text.
Unsupported scripts such as Hindi Devanagari, Arabic, and Urdu are rejected.
Predictions may vary for highly ambiguous sarcastic expressions.

📌 Research Objective

The project aims to address the lack of advanced moderation systems for Punjabi regional language content and contribute toward safer multilingual social media environments.

👨‍💻 Author

Amandeep Singh

AI/ML Developer • NLP Enthusiast • Full Stack Developer

⭐ Support

If you found this project useful:

⭐ Star the GitHub repository
🚀 Share the project
🧠 Contribute improvements

🔗 GitHub Repository

https://github.com/DevxAman/Alfaaz-AI

Downloads last month: 13

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

DevxAman
/

punjabi-hate-speech-muril