🔥 ALFAAZ — Punjabi Hate Speech & Sarcasm Detection

ALFAAZ is an AI-powered Punjabi NLP system designed to classify Punjabi text into:

  • ✅ Non-Toxic Content
  • ⚠️ Toxic / Hate Speech
  • 🎭 Sarcastic Content

The system uses a fine-tuned MuRIL transformer model for contextual Punjabi text understanding and real-time inference.


🌐 Live Demo

Frontend Website

https://alfaaz-ai.vercel.app/


🧠 Model Overview

Parameter Value
Base Model google/muril-base-cased
Architecture Transformer (12 Layers)
Classes Non-Toxic, Toxic, Sarcastic
Dataset Size 2,820 Punjabi Entries
Max Sequence Length 128
Optimizer AdamW
Accuracy 88.4%
Weighted F1 Score 0.883

⚙️ Pipeline

Punjabi Input Text
        ↓
Language Validation
        ↓
Preprocessing Pipeline
        ↓
MuRIL Tokenization
        ↓
Fine-Tuned MuRIL Model
        ↓
Softmax Classification
        ↓
Prediction Response

🧹 Preprocessing Steps

The preprocessing pipeline includes:

  • Unicode normalization
  • URL removal
  • Mention removal
  • Noise cleaning
  • Stopword filtering
  • Token preparation

The system also supports limited Punjabi-English code-mixed text.


🚀 API Usage

Endpoint

POST /predict

Example Request

{
  "text": "ਤੂੰ ਬਿਲਕੁਲ ਬੇਕਾਰ ਹੈਂ"
}

Example Response

{
  "label": "Toxic",
  "confidence": 96.4,
  "probabilities": {
    "Non-Toxic": 1.2,
    "Toxic": 96.4,
    "Sarcastic": 2.4
  }
}

💻 Tech Stack

Machine Learning

  • PyTorch
  • HuggingFace Transformers
  • MuRIL

Backend

  • Flask API
  • Docker

NLP Libraries

  • IndicNLP
  • NLTK
  • NumPy
  • Pandas

⚠️ Important Notes

  • The model is optimized for Punjabi Gurmukhi text.
  • Unsupported scripts such as Hindi Devanagari, Arabic, and Urdu are rejected.
  • Predictions may vary for highly ambiguous sarcastic expressions.

📌 Research Objective

The project aims to address the lack of advanced moderation systems for Punjabi regional language content and contribute toward safer multilingual social media environments.


👨‍💻 Author

Amandeep Singh

AI/ML Developer • NLP Enthusiast • Full Stack Developer


⭐ Support

If you found this project useful:

  • ⭐ Star the GitHub repository
  • 🚀 Share the project
  • 🧠 Contribute improvements

🔗 GitHub Repository

https://github.com/DevxAman/Alfaaz-AI

Downloads last month
13
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using DevxAman/punjabi-hate-speech-muril 1