--- license: mit library_name: sentence-transformers pipeline_tag: feature-extraction tags: - sentence-transformers - feature-extraction - sentence-similarity - scientific-documents - modernbert - citation-context base_model: answerdotai/ModernBERT-base language: - en --- # SciEmbed-BASE Signal A only (citation edges), 7M pairs, 3 epochs. The citation-edge baseline that isolates what Signal B adds. A 149M-parameter ModernBERT-base scientific document embedder trained with citation-context sentences as the primary contrastive signal. Part of the **SciEmbed** release (paper under double-blind review; author info omitted). ## Usage ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("anon-nlp/sciembed-base") emb = model.encode(["citation-context supervision for scientific embeddings"], normalize_embeddings=True) ``` - **Context length:** 512 tokens - **Pooling:** mean · **Output dim:** 768 (Matryoshka-truncatable to 512/256/128) - **License:** MIT ## SciRepEval (4-category macro) | Classif. | Regr. | Prox. | Search | Overall | |---|---|---|---|---| | 75.3 | 26.8 | 80.2 | 82.2 | 66.1 ± 0.09 | ## Citation See the repository README. Paper: *SciEmbed: Citation-Context Supervision for Scientific Document Embeddings* (under review).