Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- project-droid/DroidCollection
|
| 5 |
+
base_model:
|
| 6 |
+
- answerdotai/ModernBERT-base
|
| 7 |
+
pipeline_tag: text-classification
|
| 8 |
+
---
|
| 9 |
|
| 10 |
+
# DroidDetect-Base
|
| 11 |
+
|
| 12 |
+
This is a text classification model based on `answerdotai/ModernBERT-base`, fine-tuned to distinguish between **human-written**, **AI-refined**, **Adversarial** and **AI-generated** code.
|
| 13 |
+
|
| 14 |
+
The model was trained on the `DroidCollection` dataset. It's designed as a **4-class classifier** to address the core task of AI code detection.
|
| 15 |
+
|
| 16 |
+
A key feature of this model is its training objective, which combines standard **Cross-Entropy Loss** with a **Batch-Hard Triplet Loss**. This contrastive loss component encourages the model to learn more discriminative embeddings by pushing representations of human vs. machine code further apart in the vector space.
|
| 17 |
+
|
| 18 |
+
***
|
| 19 |
+
|
| 20 |
+
## Model Details
|
| 21 |
+
|
| 22 |
+
* **Base Model:** `answerdotai/ModernBERT-base`
|
| 23 |
+
* **Loss Function:** `Total Loss = CrossEntropyLoss + 0.1 * TripletLoss`
|
| 24 |
+
* **Dataset:** Filtered training set of the [DroidCollection](https://huggingface.co/datasets/project-droid/DroidCollection).
|
| 25 |
+
|
| 26 |
+
#### Label Mapping
|
| 27 |
+
|
| 28 |
+
The model predicts one of 4 classes. The mapping from ID to label is as follows:
|
| 29 |
+
|
| 30 |
+
```json
|
| 31 |
+
{
|
| 32 |
+
"0": "HUMAN_GENERATED",
|
| 33 |
+
"1": "MACHINE_GENERATED",
|
| 34 |
+
"2": "MACHINE_REFINED",
|
| 35 |
+
"3": "MACHINE_GENERATED_ADVERSARIAL",
|
| 36 |
+
}
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Model Code
|
| 40 |
+
|
| 41 |
+
The following code can be used for reproducibility:
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
TEXT_EMBEDDING_DIM = 768
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
class TLModel(nn.Module):
|
| 48 |
+
def __init__(self, text_encoder, projection_dim=128, num_classes=NUM_CLASSES, class_weights=None):
|
| 49 |
+
super().__init__()
|
| 50 |
+
self.text_encoder = text_encoder
|
| 51 |
+
self.num_classes = num_classes
|
| 52 |
+
text_output_dim = TEXT_EMBEDDING_DIM
|
| 53 |
+
self.additional_loss = losses.BatchHardSoftMarginTripletLoss(self.text_encoder)
|
| 54 |
+
|
| 55 |
+
self.text_projection = nn.Linear(text_output_dim, projection_dim)
|
| 56 |
+
self.classifier = nn.Linear(projection_dim, num_classes)
|
| 57 |
+
self.class_weights = class_weights
|
| 58 |
+
|
| 59 |
+
def forward(self, labels=None, input_ids=None, attention_mask=None):
|
| 60 |
+
actual_labels = labels
|
| 61 |
+
sentence_embeddings = self.text_encoder(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state
|
| 62 |
+
sentence_embeddings = sentence_embeddings.mean(dim=1)
|
| 63 |
+
projected_text = F.relu(self.text_projection(sentence_embeddings))
|
| 64 |
+
logits = self.classifier(projected_text)
|
| 65 |
+
loss = None
|
| 66 |
+
cross_entropy_loss = None
|
| 67 |
+
contrastive_loss = None
|
| 68 |
+
|
| 69 |
+
if actual_labels is not None:
|
| 70 |
+
loss_fct_ce = nn.CrossEntropyLoss(weight=self.class_weights.to(logits.device) if self.class_weights is not None else None)
|
| 71 |
+
cross_entropy_loss = loss_fct_ce(logits.view(-1, self.num_classes), actual_labels.view(-1))
|
| 72 |
+
contrastive_loss = self.additional_loss.batch_hard_triplet_loss(embeddings=projected_text, labels=actual_labels)
|
| 73 |
+
lambda_contrast = 0.1
|
| 74 |
+
loss = cross_entropy_loss + lambda_contrast * contrastive_loss
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
output = {"logits": logits, "fused_embedding": projected_text}
|
| 78 |
+
if loss is not None:
|
| 79 |
+
output["loss"] = loss
|
| 80 |
+
if cross_entropy_loss is not None:
|
| 81 |
+
output["cross_entropy_loss"] = cross_entropy_loss
|
| 82 |
+
if contrastive_loss is not None:
|
| 83 |
+
output["contrastive_loss"] = contrastive_loss
|
| 84 |
+
|
| 85 |
+
return output
|
| 86 |
+
```
|