Add dataset to README.md

5546e81 verified 4 months ago

1.13 kB

license: apache-2.0
language:
  - en
base_model: bert-base-cased
library_name: peft
tags:
  - base_model:adapter:bert-base-cased
  - lora
  - transformers
datasets:
  - gouwsxander/wikipedia-human-ai

Slop Detector

A simple model to detect AI-written content.

Model Details

This model is a PEFT of bert-base-cased.

It was trained on a dataset of scraped paragraphs from Wikipedia, as well as AI rewrites of said paragraphs.

More detail about the training data can be found here.

Usage

The basic usage is as follows:

from transformers import pipeline

classifier = pipeline("text-classification", model="gouwsxander/slop-detector-bert")

inputs: str | list[str] | dict[str, str] | list[dict[str, str]] = ...
results = classifier(inputs)

Texts with label LABEL_1 are estimated to be produced by an AI.

You may get a warning about weights being initialized, but these can be ignored.

Limitations

512 token context window
Trained one paragraph at a time
Only trained on Wikipedia-style text
Only trained on AI data generated by GPT 5 Nano