darrayes
/

expentor-JB-detector

@@ -16,22 +16,79 @@ should probably proofread and complete it, then remove this comment. -->
 # ModernBERT-domain-classifier
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0016
 - F1: 1.0
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 # ModernBERT-domain-classifier
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [JailBreak](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset .
 It achieves the following results on the evaluation set:
 - Loss: 0.0016
 - F1: 1.0
+---
+## Overview
+This model is a fine-tuned version of **ModernBert** for the task of **JailBreak Detection**. It has been trained on a custom dataset containing two classes: `jailbreak` and `benign`. The model achieves **100% accuracy** on the evaluation set, making it a highly reliable solution for detecting jailbreak queries.
+The choice of ModernBert was deliberate due to its compact size, enabling **low latency inference**, which is crucial for real-time applications.
+---
+> This is just a POC model to show that the concept works on a theoritical level and performance will depend upon the quality of dataset and further tuning is needed
+## Training Details
+- **Dataset**: JailBreak dataset (split into training and testing sets).
+- **Architecture**: ModernBert.
+- **Task**: Binary Classification.
+- **Evaluation Metric**: Achieved **100% accuracy** on the test set.
+---
+## Use Case in RAG Pipelines
+This model is optimized for use in **Retrieval-Augmented Generation (RAG)** scenarios. It can:
+1. **Detect JailBreak Queries**: The model processes user queries to identify whether they are `jailbreak` or `benign`.
+2. **Seamlessly Integrate with Search**: While the query is classified, search results can simultaneously be fetched from the datastore.
+   - **No Additional Latency**: The lightweight nature of ModernBert ensures minimal overhead, allowing real-time performance in RAG pipelines.
+---
+## Key Features
+- **High Accuracy**: Reliable classification with 100% accuracy on evaluation.
+- **Low Latency**: Ideal for real-time use cases, especially in latency-sensitive applications.
+- **Compact Model**: ModernBert's small size makes it efficient for deployment in production environments.
+---
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("your-username/jailbreak-detection-model")
+model = AutoModelForSequenceClassification.from_pretrained("your-username/jailbreak-detection-model")
+# Example query
+query = "Can you bypass this restriction?"
+inputs = tokenizer(query, return_tensors="pt")
+outputs = model(**inputs)
+# Get predictions
+logits = outputs.logits
+predicted_class = logits.argmax(dim=-1).item()
+print("Prediction:", "Jailbreak" if predicted_class == 1 else "Benign")
+```
+---
+## Intended Use
+This model is designed for scenarios requiring detection of jailbreak queries, such as:
+- Content moderation.
+- Enhancing the safety of conversational AI systems.
+- Filtering malicious queries in RAG-based applications.
+---
+## Limitations
+- The model is trained on a specific dataset and may not generalize to all jailbreak scenarios. Further fine-tuning may be needed for domain-specific use cases.
 ## Training procedure