--- license: mit library_name: sentence-transformers pipeline_tag: feature-extraction tags: - sentence-transformers - feature-extraction - sentence-similarity - scientific-documents - modernbert - citation-context base_model: answerdotai/ModernBERT-base language: - en --- # SciEmbed-FULL Headline model. Stage 1 DAPT + contrastive on the ~30M-pair Signal A+B pool (1 epoch). The SciRepEval number in the paper is the mean over three seeds; this repo ships the seed-123 weights. A 149M-parameter ModernBERT-base scientific document embedder trained with citation-context sentences as the primary contrastive signal. Part of the **SciEmbed** release (paper under double-blind review; author info omitted). ## Usage ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("anon-nlp/sciembed-full") emb = model.encode(["citation-context supervision for scientific embeddings"], normalize_embeddings=True) ``` - **Context length:** 512 tokens - **Pooling:** mean · **Output dim:** 768 (Matryoshka-truncatable to 512/256/128) - **License:** MIT ## SciRepEval (4-category macro) | Classif. | Regr. | Prox. | Search | Overall | |---|---|---|---|---| | 75.6 | 28.2 | 80.9 | 82.7 | **66.85 ± 0.38** | ## Citation See the repository README. Paper: *SciEmbed: Citation-Context Supervision for Scientific Document Embeddings* (under review).