|
|
--- |
|
|
license: mit |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
datasets: |
|
|
- shahxeebhassan/human_vs_ai_sentences |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
## Model Description |
|
|
This model is a fine-tuned BERT model for AI content detection. |
|
|
|
|
|
## Training Data |
|
|
The model was trained on a [<span style="color: blue;">dataset</span> |
|
|
](https://huggingface.co/datasets/shahxeebhassan/human_vs_ai_sentences) of over 100,000 sentences, each labeled as either AI-generated or human-written. This approach allows the model to predict the nature of each individual sentence, which is particularly useful for highlighting AI-written content within larger texts. |
|
|
|
|
|
## Evaluation Metrics |
|
|
The model achieved an accuracy of 90% on the validation & test set. |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
import torch |
|
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
|
|
|
tokenizer = BertTokenizer.from_pretrained("shahxeebhassan/bert_base_ai_content_detector") |
|
|
model = BertForSequenceClassification.from_pretrained("shahxeebhassan/bert_base_ai_content_detector") |
|
|
|
|
|
inputs = tokenizer("Distance learning will not benefit students because the students are not able to develop as good of a relationship with their teachers.", return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
|
|
|
probabilities = torch.softmax(logits, dim=1).cpu().numpy() |
|
|
|
|
|
predicted_label = probabilities.argmax(axis=1) |
|
|
|
|
|
print(f"Predicted label for the input text: {predicted_label[0]}") |