Indic Profanity Detector - DistilBERT Multilingual

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased for profanity detection in Malayalam and other Indic languages. It classifies text as either safe or not safe.

The training process incorporated several best practices to handle class imbalance and improve robustness, including stratified data splitting, class weighting, and early stopping.

Model Details

  • Model Type: Text Classification (Sequence Classification)
  • Base Model: distilbert/distilbert-base-multilingual-cased
  • Languages: Primarily Malayalam, with multilingual capabilities.
  • Labels: safe (LABEL_0), not safe (LABEL_1)
  • Frameworks: Transformers, PyTorch

Intended Uses & Limitations

Intended Uses

This model is intended for content moderation tasks, such as:

  • Filtering profane language in user-generated content (comments, forums, chat).
  • Acting as a guardrail for Large Language Models (LLMs) to prevent the generation of toxic or unsafe content.
  • Automated content review pipelines.

Limitations

  • Bias towards "Not Safe": The model shows a tendency to misclassify neutral or uncommon safe phrases in Malayalam as "not safe". This suggests a need for a more diverse "safe" vocabulary in the training data. Users should tune the decision threshold for their specific use case.
  • Primary Language: While built on a multilingual base, the fine-tuning data is primarily Malayalam. Performance on other Indic languages is not guaranteed and should be evaluated before deployment.
  • Nuance and Context: The model may struggle with sarcasm, reclaimed slurs, or other context-dependent forms of profanity.
  • Data Bias: The model's behavior is a reflection of the mangalathkedar/multilingual-indic-profane dataset. Any biases present in the dataset will be inherited by the model.

Evaluation Results

The model was evaluated on a held-out test set of 912 samples. It demonstrated strong performance in identifying both safe and profane content.

Overall Performance Metrics

Metric Score
Accuracy 0.9419
F1-Score 0.9235
Precision 0.9581
Recall 0.8914

Performance Per Class

The model shows high precision for the not safe class, meaning when it flags content as profane, it is highly likely to be correct. The recall for the safe class is excellent, indicating it correctly identifies most safe content.

Class Precision Recall F1-Score Support
safe 0.9325 0.9747 0.9531 553
not safe 0.9581 0.8914 0.9235 359

Confusion Matrix

The confusion matrix provides a detailed breakdown of the model's predictions versus the actual labels.

Predicted: safe Predicted: not safe
Actual: safe 539 (TN) 14 (FP)
Actual: not safe 39 (FN) 320 (TP)

Key takeaways from the matrix:

  • False Positives (FP): 14 - The model incorrectly flagged 14 safe texts as not safe.
  • False Negatives (FN): 39 - The model missed 39 not safe texts, classifying them as safe. This is the more critical error for content moderation systems.

How to use

You can use this model with the transformers pipeline for easy inference.

from transformers import pipeline

# Use device=0 for GPU
classifier = pipeline("text-classification", model="mangalathkedar/profanity-detector-distilbert-multilingual", device=-1)

# Profane example
profane_text = "poda patti"
result_profane = classifier(profane_text)
print(f"Text: '{profane_text}' -> {result_profane}")
# >> Text: 'പോയി ഊമ്പിക്കോ' -> [{'label': 'not safe', 'score': 0.9...}]


# Safe example
safe_text = "നല്ല ദിവസം"
result_safe = classifier(safe_text)
print(f"Text: '{safe_text}' -> {result_safe}")
# >> Text: 'നല്ല ദിവസം' -> [{'label': 'safe', 'score': 0.8...}]
Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mangalathkedar/profanity-detector-distilbert-multilingual

Finetuned
(386)
this model

Dataset used to train mangalathkedar/profanity-detector-distilbert-multilingual