Text Classification
Scikit-learn
sentence-transformers
English
information-retrieval
claim-verification
scifact
evidence-relevance
Eval Results (legacy)
Instructions to use andreiaalexa/scifact-relevance-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use andreiaalexa/scifact-relevance-classifier with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("andreiaalexa/scifact-relevance-classifier", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - sentence-transformers
How to use andreiaalexa/scifact-relevance-classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("andreiaalexa/scifact-relevance-classifier") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
File size: 1,090 Bytes
37804fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | """Embedding feature builders for claim-document relevance classification."""
from __future__ import annotations
import numpy as np
def e5_queries(texts: list[str]) -> list[str]:
return [f"query: {text}" for text in texts]
def e5_passages(texts: list[str]) -> list[str]:
return [f"passage: {text}" for text in texts]
def pair_features(model, claims: list[str], documents: list[str], show_progress_bar=False):
"""Build standard sentence-pair features from two embedding vectors.
q and d alone give the classifier raw semantic position. abs(q-d) exposes
distance dimensions. q*d exposes alignment dimensions. cosine gives a
single retrieval-style similarity signal.
"""
q = model.encode(
e5_queries(claims),
normalize_embeddings=True,
show_progress_bar=show_progress_bar,
)
d = model.encode(
e5_passages(documents),
normalize_embeddings=True,
show_progress_bar=show_progress_bar,
)
cosine = np.sum(q * d, axis=1, keepdims=True)
return np.hstack([q, d, np.abs(q - d), q * d, cosine])
|