| --- |
| language: |
| - zh |
| - en |
| tags: |
| - embeddings |
| - retrieval |
| - transformer-free |
| - safetensors |
| - edge-ai |
| license: mit |
| --- |
| |
| # CleanOwl — AI Slop Detector |
|
|
| **I hate AI-SLOP SO I MADE THIS.** |
|
|
|  |
|
|
| CleanOwl is a lightweight **human-likeness scoring engine**. |
|
|
| It estimates how “human” a piece of text feels — not by classification, |
| but by analyzing structural signals such as: |
|
|
| - token distribution irregularity |
| - semantic continuity |
| - punctuation behavior |
|
|
| No transformers. No fine-tuning. Just statistical signals. |
|
|
| --- |
|
|
| ## Performance |
|
|
| - ~0.04 ms per token |
| - ~0.8 ms per sentence (typical) |
| - ~120 ms startup |
|
|
| Runs entirely on CPU. |
|
|
| Linear time complexity: O(n) |
|
|
| --- |
|
|
| ## 🧠 What it actually measures |
|
|
| CleanOwl does **not** directly detect AI. |
|
|
| Instead, it measures how **smooth vs irregular** a piece of writing is: |
|
|
| - Human writing → irregular, biased, “spiky” |
| - AI / formal text → smooth, evenly distributed |
|
|
| --- |
|
|
| ## 📊 Score Interpretation |
|
|
| | Score | Meaning | |
| |------|--------| |
| | < 60 | Likely AI-generated / formal text | |
| | 60–75 | Mixed / ambiguous | |
| | > 75 | Likely human-like message | |
|
|
| > This is a heuristic scoring system, not a classifier. |
|
|
| --- |
|
|
| ## ⚠️ Limitations |
|
|
| - Short sentences may be unstable |
| - Highly polished human writing (e.g. essays, Wikipedia) may look AI-like |
| - AI can mimic human irregularity |
|
|
| This is a **lightweight detector**, not a definitive AI classifier. |
|
|
| --- |
|
|
| ## Quickstart |
|
|
| ### 1️⃣ Install |
|
|
| ```bash |
| git clone https://huggingface.co/WangKaiLin/CleanOwl-AI-Slop-Detector |
| cd CleanOwl-AI-Slop-Detector |
| |
| pip install numpy safetensors fastapi uvicorn |
| ``` |
|
|
| ### 2️⃣ Run Local API |
|
|
| ```bash |
| uvicorn app:app --host 127.0.0.1 --port 8000 --reload |
| ``` |
|
|
| Open in browser: |
|
|
| http://127.0.0.1:8000/docs |
|
|
| If you see /detect, the API is running correctly. |
|
|
| ### 3️⃣ Chrome Extension Setup |
|
|
| CleanOwl works via a local API + Chrome extension. |
|
|
| Open Chrome: |
| chrome://extensions/ |
| Enable Developer Mode (top right) |
| Click Load unpacked |
| Select: |
| CleanOwl-AI-Slop-Detector/extension/ |
| Refresh any webpage (Ctrl + R) |
|
|
| 👉 CleanOwl will now analyze the page automatically. |
|
|
| ### 🔒 Privacy |
|
|
| CleanOwl runs entirely on your local machine. |
| No data is sent to any external server. |
|
|
| ### Usage |
| ```bash |
| # CLI (scoring) |
| python ai_score.py |
| |
| # Embedding demo |
| python quickstart.py |
| ``` |
|
|
| ## Extension perform |
|
|
|  |
|
|
| ## Example(ai_score.py) |
| |
| ```bash |
| 請輸入文字:先思考:在 AI 時代,什麼樣的人才不會被取代?我的答案是:具備溝通能力的人、擁有韌性的人,以及始終願意站在第一線的人。 |
| |
| human score: 47.13 |
| label: ai_slop_like |
| |
| 請輸入文字:身為專業的肥宅 都會把脂肪放在身上 |
| |
| human score: 76.88 |
| label: maybe_human_like |
| ``` |
| |
| ## Repository Structure |
| |
| ```bash |
| CleanOwl-AI-Slop-Detector/ |
| ├─ ai_score.py # scoring logic (CleanOwl core) |
| ├─ quickstart.py # embedding demo CLI |
| ├─ engine.py # PipeOwl tokenizer + embedding loader |
| ├─ pipeowl.safetensors # embeddings + delta_field |
| ├─ tokenizer.json |
| ├─ ptt.npy # style field (PTT-like distribution) |
| ├─ config.json |
| ├─ app.py # FastAPI server |
| ├─ requirements.txt |
| ├─ extension/ |
| │ ├─ content.js # Chrome content script |
| │ └─ manifest.json |
| ├─ example.md |
| ├─ README.md |
| └─ LICENSE |
| ``` |
| |
| ## LICENSE |
| |
| MIT |