Update README.md
Browse files
README.md
CHANGED
|
@@ -1,93 +1,93 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
- zh
|
| 4 |
-
- en
|
| 5 |
-
tags:
|
| 6 |
-
- embeddings
|
| 7 |
-
- retrieval
|
| 8 |
-
- transformer-free
|
| 9 |
-
- safetensors
|
| 10 |
-
- edge-ai
|
| 11 |
-
license: mit
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
# CleanOwl-0.1
|
| 15 |
-
|
| 16 |
-
**I
|
| 17 |
-
|
| 18 |
-
CleanOwl is a lightweight human-likeness scoring engine.
|
| 19 |
-
|
| 20 |
-
It detects whether a sentence feels like a natural human message or AI-generated content, using:
|
| 21 |
-
|
| 22 |
-
- token distribution irregularity
|
| 23 |
-
- semantic continuity
|
| 24 |
-
- punctuation behavior
|
| 25 |
-
|
| 26 |
-
No transformer. No fine-tuning. Pure statistical signals.
|
| 27 |
-
|
| 28 |
-
## Score Interpretation
|
| 29 |
-
|
| 30 |
-
| Score | Meaning |
|
| 31 |
-
|------|--------|
|
| 32 |
-
| < 60 | Likely AI-generated / formal text |
|
| 33 |
-
| 60–75 | Mixed / ambiguous |
|
| 34 |
-
| > 75 | Likely human-like message |
|
| 35 |
-
|
| 36 |
-
Note: This is not a classifier, but a heuristic scoring system.
|
| 37 |
-
|
| 38 |
-
## Limitations
|
| 39 |
-
|
| 40 |
-
- Short sentences may be misclassified
|
| 41 |
-
- Highly polished human writing (e.g. essays) may look like AI
|
| 42 |
-
- AI can sometimes mimic human irregularity
|
| 43 |
-
|
| 44 |
-
This is a lightweight detector, not a definitive AI classifier.
|
| 45 |
-
|
| 46 |
-
## Quickstart
|
| 47 |
-
|
| 48 |
-
```bash
|
| 49 |
-
git clone https://huggingface.co/WangKaiLin/CleanOwl-0.1
|
| 50 |
-
cd CleanOwl-0.1
|
| 51 |
-
|
| 52 |
-
pip install numpy safetensors
|
| 53 |
-
|
| 54 |
-
python ai_score.py
|
| 55 |
-
|
| 56 |
-
# or embedding entry
|
| 57 |
-
python quickstart.py
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
## Example:
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
```bash
|
| 64 |
-
請輸入文字:先思考:在 AI 時代,什麼樣的人才不會被取代?我的答案是:具備溝通能力的人、擁有韌性的人,以及始終願意站在第一線的人。
|
| 65 |
-
|
| 66 |
-
human score: 47.13
|
| 67 |
-
label: ai_slop_like
|
| 68 |
-
|
| 69 |
-
請輸入文字:身為專業的肥宅 都會把脂肪放在身上
|
| 70 |
-
|
| 71 |
-
human score: 76.88
|
| 72 |
-
label: maybe_human_like
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
## Repository Structure
|
| 76 |
-
|
| 77 |
-
```bash
|
| 78 |
-
CleanOwl-0.1/
|
| 79 |
-
├─ ai_score.py # human score / ai slop score
|
| 80 |
-
├─ quickstart.py # demo CLI
|
| 81 |
-
├─ engine.py # PipeOwl tokenizer + emb loader
|
| 82 |
-
├─ pipeowl.safetensors # embeddings + delta_field
|
| 83 |
-
├─ tokenizer.json
|
| 84 |
-
├─ ptt.npy # style field
|
| 85 |
-
├─ config.json
|
| 86 |
-
├─ README.md
|
| 87 |
-
├─ example.md
|
| 88 |
-
└─ LICENSE
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
## LICENSE
|
| 92 |
-
|
| 93 |
MIT
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- zh
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- embeddings
|
| 7 |
+
- retrieval
|
| 8 |
+
- transformer-free
|
| 9 |
+
- safetensors
|
| 10 |
+
- edge-ai
|
| 11 |
+
license: mit
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# CleanOwl-0.1
|
| 15 |
+
|
| 16 |
+
**I HATE AI-SLOP SO I MADE THIS.**
|
| 17 |
+
|
| 18 |
+
CleanOwl is a lightweight human-likeness scoring engine.
|
| 19 |
+
|
| 20 |
+
It detects whether a sentence feels like a natural human message or AI-generated content, using:
|
| 21 |
+
|
| 22 |
+
- token distribution irregularity
|
| 23 |
+
- semantic continuity
|
| 24 |
+
- punctuation behavior
|
| 25 |
+
|
| 26 |
+
No transformer. No fine-tuning. Pure statistical signals.
|
| 27 |
+
|
| 28 |
+
## Score Interpretation
|
| 29 |
+
|
| 30 |
+
| Score | Meaning |
|
| 31 |
+
|------|--------|
|
| 32 |
+
| < 60 | Likely AI-generated / formal text |
|
| 33 |
+
| 60–75 | Mixed / ambiguous |
|
| 34 |
+
| > 75 | Likely human-like message |
|
| 35 |
+
|
| 36 |
+
Note: This is not a classifier, but a heuristic scoring system.
|
| 37 |
+
|
| 38 |
+
## Limitations
|
| 39 |
+
|
| 40 |
+
- Short sentences may be misclassified
|
| 41 |
+
- Highly polished human writing (e.g. essays) may look like AI
|
| 42 |
+
- AI can sometimes mimic human irregularity
|
| 43 |
+
|
| 44 |
+
This is a lightweight detector, not a definitive AI classifier.
|
| 45 |
+
|
| 46 |
+
## Quickstart
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
git clone https://huggingface.co/WangKaiLin/CleanOwl-0.1
|
| 50 |
+
cd CleanOwl-0.1
|
| 51 |
+
|
| 52 |
+
pip install numpy safetensors
|
| 53 |
+
|
| 54 |
+
python ai_score.py
|
| 55 |
+
|
| 56 |
+
# or embedding entry
|
| 57 |
+
python quickstart.py
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Example:
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
```bash
|
| 64 |
+
請輸入文字:先思考:在 AI 時代,什麼樣的人才不會被取代?我的答案是:具備溝通能力的人、擁有韌性的人,以及始終願意站在第一線的人。
|
| 65 |
+
|
| 66 |
+
human score: 47.13
|
| 67 |
+
label: ai_slop_like
|
| 68 |
+
|
| 69 |
+
請輸入文字:身為專業的肥宅 都會把脂肪放在身上
|
| 70 |
+
|
| 71 |
+
human score: 76.88
|
| 72 |
+
label: maybe_human_like
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
## Repository Structure
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
CleanOwl-0.1/
|
| 79 |
+
├─ ai_score.py # human score / ai slop score
|
| 80 |
+
├─ quickstart.py # demo CLI
|
| 81 |
+
├─ engine.py # PipeOwl tokenizer + emb loader
|
| 82 |
+
├─ pipeowl.safetensors # embeddings + delta_field
|
| 83 |
+
├─ tokenizer.json
|
| 84 |
+
├─ ptt.npy # style field
|
| 85 |
+
├─ config.json
|
| 86 |
+
├─ README.md
|
| 87 |
+
├─ example.md
|
| 88 |
+
└─ LICENSE
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## LICENSE
|
| 92 |
+
|
| 93 |
MIT
|