Text Classification
Model2Vec
English
poster-sentry
document-classification
scientific-posters
multimodal
poster-detection
machine-actionable
FAIR-data
posters-science
quality-control
Instructions to use fairdataihub/poster-sentry with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Model2Vec
How to use fairdataihub/poster-sentry with Model2Vec:
from model2vec import StaticModel model = StaticModel.from_pretrained("fairdataihub/poster-sentry") - Notebooks
- Google Colab
- Kaggle
Commit ·
9633129
1
Parent(s): e95d35a
Soften training data verbiage: posters are repository-labeled, not verified
Browse files
README.md
CHANGED
|
@@ -102,10 +102,10 @@ Trained on **3,606 real documents** — zero synthetic data:
|
|
| 102 |
|
| 103 |
| Class | Count | Source |
|
| 104 |
|-------|-------|--------|
|
| 105 |
-
| **Poster** | 1,803 |
|
| 106 |
-
| **Non-poster** | 1,803 |
|
| 107 |
|
| 108 |
-
|
| 109 |
|
| 110 |
Training data: [fairdataihub/poster-sentry-training-data](https://huggingface.co/datasets/fairdataihub/poster-sentry-training-data)
|
| 111 |
|
|
|
|
| 102 |
|
| 103 |
| Class | Count | Source |
|
| 104 |
|-------|-------|--------|
|
| 105 |
+
| **Poster** | 1,803 | Repository-labeled posters from Zenodo & Figshare |
|
| 106 |
+
| **Non-poster** | 1,803 | Manually confirmed non-posters (papers, proceedings, newsletters, abstract books) |
|
| 107 |
|
| 108 |
+
Balanced subset sampled from 30,000+ PDFs scraped from Zenodo and Figshare. Poster samples are drawn from records whose uploaders tagged them as "poster" in repository metadata. Non-poster samples were flagged by a structural classifier and then manually confirmed. When PosterSentry was applied to the full corpus, ~20% of repository-labeled "posters" were reclassified as non-posters, indicating meaningful label noise in the source data.
|
| 109 |
|
| 110 |
Training data: [fairdataihub/poster-sentry-training-data](https://huggingface.co/datasets/fairdataihub/poster-sentry-training-data)
|
| 111 |
|