Indic Profanity Detector - DistilBERT Multilingual

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased for profanity detection in Malayalam and other Indic languages. It classifies text as either safe or not safe.

The training process incorporated several best practices to handle class imbalance and improve robustness, including stratified data splitting, class weighting, and early stopping.

Model Details

Model Type: Text Classification (Sequence Classification)
Base Model: distilbert/distilbert-base-multilingual-cased
Languages: Primarily Malayalam, with multilingual capabilities.
Labels: safe (LABEL_0), not safe (LABEL_1)
Frameworks: Transformers, PyTorch

Intended Uses & Limitations

Intended Uses

This model is intended for content moderation tasks, such as:

Filtering profane language in user-generated content (comments, forums, chat).
Acting as a guardrail for Large Language Models (LLMs) to prevent the generation of toxic or unsafe content.
Automated content review pipelines.

Limitations

Bias towards "Not Safe": The model shows a tendency to misclassify neutral or uncommon safe phrases in Malayalam as "not safe". This suggests a need for a more diverse "safe" vocabulary in the training data. Users should tune the decision threshold for their specific use case.
Primary Language: While built on a multilingual base, the fine-tuning data is primarily Malayalam. Performance on other Indic languages is not guaranteed and should be evaluated before deployment.
Nuance and Context: The model may struggle with sarcasm, reclaimed slurs, or other context-dependent forms of profanity.
Data Bias: The model's behavior is a reflection of the mangalathkedar/multilingual-indic-profane dataset. Any biases present in the dataset will be inherited by the model.

Evaluation Results

The model was evaluated on a held-out test set of 912 samples. It demonstrated strong performance in identifying both safe and profane content.

Overall Performance Metrics

Metric	Score
Accuracy	0.9419
F1-Score	0.9235
Precision	0.9581
Recall	0.8914

Performance Per Class

The model shows high precision for the not safe class, meaning when it flags content as profane, it is highly likely to be correct. The recall for the safe class is excellent, indicating it correctly identifies most safe content.

Class	Precision	Recall	F1-Score	Support
safe	0.9325	0.9747	0.9531	553
not safe	0.9581	0.8914	0.9235	359

Confusion Matrix

The confusion matrix provides a detailed breakdown of the model's predictions versus the actual labels.

	Predicted: safe	Predicted: not safe
Actual: safe	539 (TN)	14 (FP)
Actual: not safe	39 (FN)	320 (TP)

Key takeaways from the matrix:

False Positives (FP): 14 - The model incorrectly flagged 14 safe texts as not safe.
False Negatives (FN): 39 - The model missed 39 not safe texts, classifying them as safe. This is the more critical error for content moderation systems.

How to use

You can use this model with the transformers pipeline for easy inference.

from transformers import pipeline

# Use device=0 for GPU
classifier = pipeline("text-classification", model="mangalathkedar/profanity-detector-distilbert-multilingual", device=-1)

# Profane example
profane_text = "poda patti"
result_profane = classifier(profane_text)
print(f"Text: '{profane_text}' -> {result_profane}")
# >> Text: 'പോയി ഊമ്പിക്കോ' -> [{'label': 'not safe', 'score': 0.9...}]


# Safe example
safe_text = "നല്ല ദിവസം"
result_safe = classifier(safe_text)
print(f"Text: '{safe_text}' -> {result_safe}")
# >> Text: 'നല്ല ദിവസം' -> [{'label': 'safe', 'score': 0.8...}]

Downloads last month: 8

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for mangalathkedar/profanity-detector-distilbert-multilingual

Base model

distilbert/distilbert-base-multilingual-cased

Finetuned

(444)

this model

mangalathkedar
/

profanity-detector-distilbert-multilingual