--- license: mit library_name: sentence-transformers pipeline_tag: feature-extraction tags: - sentence-transformers - feature-extraction - sentence-similarity - scientific-documents - modernbert - citation-context base_model: answerdotai/ModernBERT-base language: - en --- # SciEmbed-CTX Signal A+B on a 7M-pair subsample (3 epochs). Best ablation; the FULL model is this recipe scaled to the full pool. A 149M-parameter ModernBERT-base scientific document embedder trained with citation-context sentences as the primary contrastive signal. Part of the **SciEmbed** release (paper under double-blind review; author info omitted). ## Usage ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("anon-nlp/sciembed-ctx") emb = model.encode(["citation-context supervision for scientific embeddings"], normalize_embeddings=True) ``` - **Context length:** 512 tokens - **Pooling:** mean · **Output dim:** 768 (Matryoshka-truncatable to 512/256/128) - **License:** MIT ## SciRepEval (4-category macro) | Classif. | Regr. | Prox. | Search | Overall | |---|---|---|---|---| | 75.5 | **28.3** | 80.9 | 82.5 | 66.8 ± 0.02 | ## Citation See the repository README. Paper: *SciEmbed: Citation-Context Supervision for Scientific Document Embeddings* (under review).