| --- |
| language: en |
| tags: |
| - text-classification |
| - ai-text-detection |
| - pytorch |
| license: mit |
| --- |
| |
| # Moodlerz/bert-detector-eli5 |
|
|
| ## What is this? |
| This model was fine-tuned as part of a research project comparing transformer-based |
| AI-text detectors across two benchmark datasets: **HC3** and **ELI5**. |
|
|
| The task is binary classification: |
| - **Label 0** β Human-written text |
| - **Label 1** β LLM-generated text |
|
|
| ## Model details |
| BERT (bert-base-uncased) fine-tuned on ELI5 for AI-text detection. Binary classifier: human (0) vs LLM-generated (1). Trained with 1 epoch, dropout=0.2, early stopping on ROC-AUC. |
|
|
| ## Training setup |
| | Setting | Value | |
| |---|---| |
| | Epochs | 1 | |
| | Batch size (train) | 16 | |
| | Learning rate | 2e-5 | |
| | Warmup steps | 500 | |
| | Weight decay | 0.01 | |
| | Dropout | 0.2 | |
| | Max seq length | 512 | |
| | Validation split | 10% | |
| | Best model metric | ROC-AUC | |
|
|
| ## Datasets |
| - **HC3** β Human ChatGPT Comparison Corpus |
| - **ELI5** β Explain Like I'm 5 (Reddit QA dataset) |
| Cross-dataset evaluation (e.g. trained on HC3, tested on ELI5) was used to |
| measure generalisability of each detector. |
|
|
| ## How to load |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model = AutoModelForSequenceClassification.from_pretrained("Moodlerz/bert-detector-eli5") |
| tokenizer = AutoTokenizer.from_pretrained("Moodlerz/bert-detector-eli5") |
| |
| text = "Your input text here" |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| prob_llm = torch.softmax(logits, dim=-1)[0][1].item() |
| print(f"P(LLM-generated): {prob_llm:.4f}") |
| ``` |
|
|
| ## Notes |
| - Local training dir: `./models/BERT_eli5` |
| - All models in this series are private repos under `Moodlerz`. |
| - Part of a larger study β do not use for production content moderation without further evaluation. |
|
|