Instructions to use andreiaalexa/scifact-relevance-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use andreiaalexa/scifact-relevance-classifier with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("andreiaalexa/scifact-relevance-classifier", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - sentence-transformers
How to use andreiaalexa/scifact-relevance-classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("andreiaalexa/scifact-relevance-classifier") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Scientific Evidence Relevance Classifier
Lightweight scikit-learn HistGradientBoostingClassifier trained on top of intfloat/e5-small-v2 sentence embeddings to decide whether a scientific paper is relevant evidence for a scientific claim.
This model accompanies the scifact-relevance-classifier project, the Lab 3 / Assignment 1 deliverable for Information Retrieval 5LN712 (Master's in Language Technology, Uppsala University, 2026).
Headline numbers
| Metric | Value |
|---|---|
| Macro-F1 (test) | 0.872 |
| Accuracy (test) | 0.883 |
| Test set size | 939 pairs |
| Class balance (test) | 64% not_relevant, 36% relevant |
TL;DR
| Task | Binary classification: relevant / not_relevant |
| Embedding | intfloat/e5-small-v2 (frozen, asymmetric query/passage prefixes) |
| Pair features | [q, d, abs(q−d), q*d, cos(q,d)] (1537 dims; InferSent recipe) |
| Classifier | sklearn.ensemble.HistGradientBoostingClassifier (max_iter=400, lr=0.05, class_weight=balanced) |
| Best variant | title_abstract (concatenated title + abstract) |
| Training data | andreiaalexa/scifact-relevance-pairs, config title_abstract, split train |
| Code | https://github.com/alexandreia/scifact-relevance-classifier |
Usage
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer
REPO = "andreiaalexa/scifact-relevance-classifier"
# 1) Download the trained classifier
clf_path = hf_hub_download(REPO, "classifier.pkl")
with open(clf_path, "rb") as f:
clf = pickle.load(f)
# 2) Load the (public) embedding model used at training time
encoder = SentenceTransformer("intfloat/e5-small-v2")
# 3) Build pair features (E5 asymmetric prefixes, then InferSent recipe)
def pair_features(claims, documents):
q = encoder.encode([f"query: {c}" for c in claims], normalize_embeddings=True)
d = encoder.encode([f"passage: {p}" for p in documents], normalize_embeddings=True)
cos = np.sum(q * d, axis=1, keepdims=True)
return np.hstack([q, d, np.abs(q - d), q * d, cos])
# 4) Predict
claim = "Vitamin D supplementation reduces respiratory infections."
title = "Vitamin D supplementation to prevent acute respiratory tract infections."
abstract = "Randomized trials have evaluated whether vitamin D supplementation prevents acute respiratory tract infections in diverse populations."
document = f"{title}. {abstract}"
X = pair_features([claim], [document])
pred_id = int(clf.predict(X)[0])
proba = clf.predict_proba(X)[0]
label = {0: "not_relevant", 1: "relevant"}[pred_id]
print(f"prediction = {label} | P(relevant) = {proba[1]:.3f}")
The helper scifact_features.pair_features(...) is also bundled in this repo — you can download and import it directly instead of re-implementing the function above.
Classifier comparison (full ablation)
Five classifier families compared. Best-in-row in bold.
Macro-F1:
| Variant | LogReg | LinSVM | RF | HGB | MLP |
|---|---|---|---|---|---|
| title | 0.778 | 0.762 | 0.738 | 0.846 | 0.852 |
| abstract | 0.799 | 0.779 | 0.753 | 0.863 | 0.859 |
| title + abstract | 0.804 | 0.785 | 0.744 | 0.872 | 0.844 |
Accuracy:
| Variant | LogReg | LinSVM | RF | HGB | MLP |
|---|---|---|---|---|---|
| title | 0.790 | 0.774 | 0.792 | 0.860 | 0.865 |
| abstract | 0.817 | 0.798 | 0.805 | 0.874 | 0.871 |
| title + abstract | 0.818 | 0.800 | 0.797 | 0.883 | 0.857 |
Full per-class precision/recall is in experiment_results.json in this repo.
Key finding: non-linear models (HGB, MLP) clearly beat linear ones (LogReg, LinearSVM), indicating that useful pairwise interactions between embedding dimensions exist beyond what a linear weighting can capture. Random Forest underperformed on macro-F1 despite balanced class weighting.
Files in this repo
| File | Purpose |
|---|---|
classifier.pkl |
Trained sklearn pipeline: StandardScaler not needed (HGB handles raw features); the pickle is just HistGradientBoostingClassifier. |
metadata.json |
Best (variant, classifier) chosen during training; label maps; embedding model id. |
experiment_results.json |
Per-classifier, per-variant metrics including per-class precision/recall. |
scifact_features.py |
The exact pair_features() function used at training time. Import this for inference. |
README.md |
This card. |
Intended use
- Re-ranking candidate scientific evidence retrieved by an upstream IR system (e.g., BM25 or DPR top-k → this classifier scores each candidate).
- Filtering noisy retrieval results.
- Educational use illustrating embedding-based pairwise classification.
Out of scope / limitations
- Not a clinical decision tool. The training data is BEIR SciFact (biomedical claims with expert-annotated evidence). Predictions on non-biomedical scientific content may be unreliable.
- English only.
e5-small-v2is multilingual-capable but the training data is English; cross-lingual performance is untested. - Frozen-encoder ceiling. Because we don't fine-tune e5-small-v2, performance is bounded by what its general-purpose embeddings already capture about scientific text.
- Hard-negative mining is TF-IDF based, which means the model learns to distinguish lexically-similar but irrelevant documents — but it may be overconfident on completely off-topic documents.
Citations
If you use this model, please cite both the underlying dataset (SciFact / BEIR) and the embedding model:
@inproceedings{wadden-etal-2020-fact,
title = "Fact or Fiction: Verifying Scientific Claims",
author = "Wadden, David and Lin, Shanchuan and Lo, Kyle and Wang, Lucy Lu and van Zuylen, Madeleine and Cohan, Arman and Hajishirzi, Hannaneh",
booktitle = "EMNLP", year = "2020",
url = "https://aclanthology.org/2020.emnlp-main.609/"
}
@article{wang-etal-2022-text-embeddings,
title = "Text Embeddings by Weakly-Supervised Contrastive Pre-training",
author = "Wang, Liang and Yang, Nan and Huang, Xiaolong and Jiao, Binxing and Yang, Linjun and Jiang, Daxin and Majumder, Rangan and Wei, Furu",
journal = "arXiv:2212.03533", year = "2022",
url = "https://arxiv.org/abs/2212.03533"
}
@inproceedings{conneau-etal-2017-supervised,
title = "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data",
author = "Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Lo{\"i}c and Bordes, Antoine",
booktitle = "EMNLP", year = "2017",
url = "https://aclanthology.org/D17-1070/"
}
License
MIT.
- Downloads last month
- -
Model tree for andreiaalexa/scifact-relevance-classifier
Base model
intfloat/e5-small-v2Dataset used to train andreiaalexa/scifact-relevance-classifier
Space using andreiaalexa/scifact-relevance-classifier 1
Paper for andreiaalexa/scifact-relevance-classifier
Evaluation results
- accuracy on SciFact Evidence Relevance Pairs (title+abstract config)self-reported0.883
- macro-F1 on SciFact Evidence Relevance Pairs (title+abstract config)self-reported0.872