Fix gate logic docs: only scientific_paper passes, posters are blocked
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ thumbnail: PubGuard.png
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
-
PubGuard is a lightweight, CPU-optimized document classifier that screens PDF text to determine whether it represents a genuine scientific publication. It runs as **Step 0** in the PubVerse + 42DeepThought pipeline, rejecting
|
| 28 |
|
| 29 |
Three classification heads provide a multi-dimensional screening verdict:
|
| 30 |
|
|
@@ -85,12 +85,13 @@ The `doc_type` head additionally receives 14 structural features (section headin
|
|
| 85 |
|
| 86 |
## Gate Logic
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
```
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
|
|
|
| 94 |
```
|
| 95 |
|
| 96 |
AI detection and toxicity are **informational by default** β reported but not blocking.
|
|
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
+
PubGuard is a lightweight, CPU-optimized document classifier that screens PDF text to determine whether it represents a genuine scientific publication. It runs as **Step 0** in the PubVerse + 42DeepThought pipeline, rejecting non-publications (posters, abstracts, flyers, invoices) before expensive downstream processing (VLM feature extraction, graph construction, GNN scoring).
|
| 28 |
|
| 29 |
Three classification heads provide a multi-dimensional screening verdict:
|
| 30 |
|
|
|
|
| 85 |
|
| 86 |
## Gate Logic
|
| 87 |
|
| 88 |
+
Only `scientific_paper` passes the gate. Everything else β posters, standalone abstracts, junk β is blocked. The PubVerse pipeline processes **publications only**.
|
| 89 |
|
| 90 |
+
```
|
| 91 |
+
scientific_paper β β
PASS
|
| 92 |
+
poster β β BLOCKED (classified, but not a publication)
|
| 93 |
+
abstract_only β β BLOCKED
|
| 94 |
+
junk β β BLOCKED
|
| 95 |
```
|
| 96 |
|
| 97 |
AI detection and toxicity are **informational by default** β reported but not blocking.
|