jimnoneill commited on
Commit
181aa2b
Β·
verified Β·
1 Parent(s): b5810c8

Fix gate logic docs: only scientific_paper passes, posters are blocked

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -24,7 +24,7 @@ thumbnail: PubGuard.png
24
 
25
  ## Model Description
26
 
27
- PubGuard is a lightweight, CPU-optimized document classifier that screens PDF text to determine whether it represents a genuine scientific publication. It runs as **Step 0** in the PubVerse + 42DeepThought pipeline, rejecting junk (flyers, invoices, non-scholarly PDFs) before expensive downstream processing (VLM feature extraction, graph construction, GNN scoring).
28
 
29
  Three classification heads provide a multi-dimensional screening verdict:
30
 
@@ -85,12 +85,13 @@ The `doc_type` head additionally receives 14 structural features (section headin
85
 
86
  ## Gate Logic
87
 
88
- Both `scientific_paper` and `poster` classifications **pass** the gate (both are valid scientific content). Only `abstract_only` and `junk` are blocked:
89
 
90
- ```python
91
- verdict = guard.screen(text)
92
- # verdict['pass'] = True if doc_type in ('scientific_paper', 'poster')
93
- # verdict['pass'] = False if doc_type in ('abstract_only', 'junk')
 
94
  ```
95
 
96
  AI detection and toxicity are **informational by default** β€” reported but not blocking.
 
24
 
25
  ## Model Description
26
 
27
+ PubGuard is a lightweight, CPU-optimized document classifier that screens PDF text to determine whether it represents a genuine scientific publication. It runs as **Step 0** in the PubVerse + 42DeepThought pipeline, rejecting non-publications (posters, abstracts, flyers, invoices) before expensive downstream processing (VLM feature extraction, graph construction, GNN scoring).
28
 
29
  Three classification heads provide a multi-dimensional screening verdict:
30
 
 
85
 
86
  ## Gate Logic
87
 
88
+ Only `scientific_paper` passes the gate. Everything else β€” posters, standalone abstracts, junk β€” is blocked. The PubVerse pipeline processes **publications only**.
89
 
90
+ ```
91
+ scientific_paper β†’ βœ… PASS
92
+ poster β†’ ❌ BLOCKED (classified, but not a publication)
93
+ abstract_only β†’ ❌ BLOCKED
94
+ junk β†’ ❌ BLOCKED
95
  ```
96
 
97
  AI detection and toxicity are **informational by default** β€” reported but not blocking.