WangKaiLin commited on
Commit
2b4b6b4
·
verified ·
1 Parent(s): 1591c0c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -92
README.md CHANGED
@@ -1,93 +1,93 @@
1
- ---
2
- language:
3
- - zh
4
- - en
5
- tags:
6
- - embeddings
7
- - retrieval
8
- - transformer-free
9
- - safetensors
10
- - edge-ai
11
- license: mit
12
- ---
13
-
14
- # CleanOwl-0.1
15
-
16
- **I hate AI-SLOP SO I MADE THIS.**
17
-
18
- CleanOwl is a lightweight human-likeness scoring engine.
19
-
20
- It detects whether a sentence feels like a natural human message or AI-generated content, using:
21
-
22
- - token distribution irregularity
23
- - semantic continuity
24
- - punctuation behavior
25
-
26
- No transformer. No fine-tuning. Pure statistical signals.
27
-
28
- ## Score Interpretation
29
-
30
- | Score | Meaning |
31
- |------|--------|
32
- | < 60 | Likely AI-generated / formal text |
33
- | 60–75 | Mixed / ambiguous |
34
- | > 75 | Likely human-like message |
35
-
36
- Note: This is not a classifier, but a heuristic scoring system.
37
-
38
- ## Limitations
39
-
40
- - Short sentences may be misclassified
41
- - Highly polished human writing (e.g. essays) may look like AI
42
- - AI can sometimes mimic human irregularity
43
-
44
- This is a lightweight detector, not a definitive AI classifier.
45
-
46
- ## Quickstart
47
-
48
- ```bash
49
- git clone https://huggingface.co/WangKaiLin/CleanOwl-0.1
50
- cd CleanOwl-0.1
51
-
52
- pip install numpy safetensors
53
-
54
- python ai_score.py
55
-
56
- # or embedding entry
57
- python quickstart.py
58
- ```
59
-
60
- ## Example:
61
-
62
-
63
- ```bash
64
- 請輸入文字:先思考:在 AI 時代,什麼樣的人才不會被取代?我的答案是:具備溝通能力的人、擁有韌性的人,以及始終願意站在第一線的人。
65
-
66
- human score: 47.13
67
- label: ai_slop_like
68
-
69
- 請輸入文字:身為專業的肥宅 都會把脂肪放在身上
70
-
71
- human score: 76.88
72
- label: maybe_human_like
73
- ```
74
-
75
- ## Repository Structure
76
-
77
- ```bash
78
- CleanOwl-0.1/
79
- ├─ ai_score.py # human score / ai slop score
80
- ├─ quickstart.py # demo CLI
81
- ├─ engine.py # PipeOwl tokenizer + emb loader
82
- ├─ pipeowl.safetensors # embeddings + delta_field
83
- ├─ tokenizer.json
84
- ├─ ptt.npy # style field
85
- ├─ config.json
86
- ├─ README.md
87
- ├─ example.md
88
- └─ LICENSE
89
- ```
90
-
91
- ## LICENSE
92
-
93
  MIT
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - en
5
+ tags:
6
+ - embeddings
7
+ - retrieval
8
+ - transformer-free
9
+ - safetensors
10
+ - edge-ai
11
+ license: mit
12
+ ---
13
+
14
+ # CleanOwl-0.1
15
+
16
+ **I HATE AI-SLOP SO I MADE THIS.**
17
+
18
+ CleanOwl is a lightweight human-likeness scoring engine.
19
+
20
+ It detects whether a sentence feels like a natural human message or AI-generated content, using:
21
+
22
+ - token distribution irregularity
23
+ - semantic continuity
24
+ - punctuation behavior
25
+
26
+ No transformer. No fine-tuning. Pure statistical signals.
27
+
28
+ ## Score Interpretation
29
+
30
+ | Score | Meaning |
31
+ |------|--------|
32
+ | < 60 | Likely AI-generated / formal text |
33
+ | 60–75 | Mixed / ambiguous |
34
+ | > 75 | Likely human-like message |
35
+
36
+ Note: This is not a classifier, but a heuristic scoring system.
37
+
38
+ ## Limitations
39
+
40
+ - Short sentences may be misclassified
41
+ - Highly polished human writing (e.g. essays) may look like AI
42
+ - AI can sometimes mimic human irregularity
43
+
44
+ This is a lightweight detector, not a definitive AI classifier.
45
+
46
+ ## Quickstart
47
+
48
+ ```bash
49
+ git clone https://huggingface.co/WangKaiLin/CleanOwl-0.1
50
+ cd CleanOwl-0.1
51
+
52
+ pip install numpy safetensors
53
+
54
+ python ai_score.py
55
+
56
+ # or embedding entry
57
+ python quickstart.py
58
+ ```
59
+
60
+ ## Example:
61
+
62
+
63
+ ```bash
64
+ 請輸入文字:先思考:在 AI 時代,什麼樣的人才不會被取代?我的答案是:具備溝通能力的人、擁有韌性的人,以及始終願意站在第一線的人。
65
+
66
+ human score: 47.13
67
+ label: ai_slop_like
68
+
69
+ 請輸入文字:身為專業的肥宅 都會把脂肪放在身上
70
+
71
+ human score: 76.88
72
+ label: maybe_human_like
73
+ ```
74
+
75
+ ## Repository Structure
76
+
77
+ ```bash
78
+ CleanOwl-0.1/
79
+ ├─ ai_score.py # human score / ai slop score
80
+ ├─ quickstart.py # demo CLI
81
+ ├─ engine.py # PipeOwl tokenizer + emb loader
82
+ ├─ pipeowl.safetensors # embeddings + delta_field
83
+ ├─ tokenizer.json
84
+ ├─ ptt.npy # style field
85
+ ├─ config.json
86
+ ├─ README.md
87
+ ├─ example.md
88
+ └─ LICENSE
89
+ ```
90
+
91
+ ## LICENSE
92
+
93
  MIT