---
license: apache-2.0
language:
- en
base_model:
- answerdotai/ModernBERT-base
---
# 🚸 NSFK Detection (`yasserrmd/nsfk-detection`)

**NSFK Detection** is a robust transformer-based text classification model designed to identify content that is **Not Suitable for Kids** (NSFK), built with a **three-category system**:

- ✅ `suitable_for_kids`
- 🚫 `not_suitable_for_kids`
- ❓ `uncertain` (confidence-based)

> Fine-tuned on 60K examples and evaluated on a 1000-sample test set with high accuracy and safety guarantees, this model is ideal for content moderation in educational platforms, video platforms, and chatbot systems.

---

## 🔧 Usage Example

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json

model_name = "yasserrmd/nsfk-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

label_map = {"True": 0, "False": 1}

id_to_label = {i: label for label, i in label_map.items()}

threshold = 0.7  # Confidence threshold for classification

def classify(text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)[0]
    pred_id = torch.argmax(probs).item()
    confidence = probs[pred_id].item()
    return (id_to_label[pred_id] if confidence >= threshold else "uncertain", confidence)

text = "The movie contained graphic violence."
label, confidence = classify(text)
print(f"Label: {label}, Confidence: {confidence:.2f}")
```

---

## 📊 Performance Summary

**Evaluation Dataset**: 1,000 samples (500 per class)  
**Confidence Threshold**: `0.7`

| Metric                     | Value    |
|----------------------------|----------|
| Accuracy (excluding uncertain) | **92.91%** |
| Precision (NSFK)           | 99.00%   |
| Recall (NSFK)              | 85.00%   |
| F1 Score (NSFK)            | 92.00%   |
| Uncertain Predictions      | 11.20%   |

---

## 🔎 Uncertainty Distribution

Among 112 uncertain cases:

- 🔥 **Conflict/War**: 36%
- ⚖️ **Legal/Crime**: 11%
- 🏛️ **Political**: 6%
- 🧪 **Educational (Borderline)**: 6%
- 🧠 **Other Sensitive/Controversial Topics**: 38%

These cases are ideal for **manual review pipelines**.

---

## ✅ Key Benefits

- **Three-label output** prevents overconfident mistakes
- **High recall and precision** on critical unsafe content
- **Safe defaults** — never misclassifies safe content as unsafe
- **Adaptable threshold** based on domain risk (e.g., `0.75` for children-only platforms)

---

## 🧠 Learn More

See the [Large-Scale Analysis Report (PDF)](./large_scale_analysis_report.pdf) for detailed metrics, sample predictions, and category-wise breakdowns.

---

## 👨‍💻 Author

**Mohamed Yasser**  
 
🔗 [LinkedIn](https://www.linkedin.com/in/moyasser/)  
📣 [WhatsApp Channel](https://whatsapp.com/channel/0029Va4f8B65PO15XY31uP3d)