EMBO
/

vicreg_our

@@ -21,7 +21,7 @@ metrics:
 ## Model Description
-SODA-VEC embedding model trained with VICReg Our loss function. This model uses normalized embeddings with covariance, feature, and dot product losses (diagonal-only) to learn biomedical text representations.
 This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
@@ -44,6 +44,17 @@ This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Ve
 **Loss Function**: VICReg Our: normalized embeddings with covariance loss, feature loss, and dot product loss (diagonal-only)
 **Coefficients**: cov=1.0, feature=1.0, dot=1.0
 **Base Model**: `answerdotai/ModernBERT-base`
@@ -135,7 +146,7 @@ The model has been evaluated on comprehensive biomedical benchmarks including:
 - **Field-Specific Separability**: Distinguishing between different biological fields
 - **Semantic Search**: Retrieval quality on biomedical text corpora
-For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/EMBO/soda-vec).
 ## Intended Use
@@ -143,9 +154,6 @@ This model is designed for:
 - **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
 - **Scientific Text Similarity**: Computing similarity between biomedical texts
-- **Information Retrieval**: Building search systems for scientific literature
-- **Downstream Tasks**: As a base for fine-tuning on specific biomedical tasks
-- **Research Applications**: Academic and research use in life sciences
 ## Limitations
@@ -163,13 +171,13 @@ If you use this model, please cite:
   title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
   author = {EMBO},
   year = {2024},
-  url = {https://github.com/EMBO/soda-vec}
 }
 ```
 ## Model Card Contact
-For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/EMBO/soda-vec).
 ---

 ## Model Description
+SODA-VEC embedding model trained with [VICReg](https://arxiv.org/pdf/2105.04906) Our loss function. This model uses normalized embeddings with covariance, feature, and dot product losses (diagonal-only) to learn biomedical text representations.
 This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
 **Loss Function**: VICReg Our: normalized embeddings with covariance loss, feature loss, and dot product loss (diagonal-only)
+We have implemented a series of changes from the original [VICREG in the paper from Meta](https://arxiv.org/pdf/2105.04906). Here we show the main differences:
+| Feature | Original VICReg | VICReg Our |
+|---------|----------------|------------|
+| Normalization | No | Yes (L2-normalized) |
+| Invariance (MSE) | Yes | No |
+| Variance (hinge) | Yes | No |
+| Covariance | Yes (unnormalized) | Yes (normalized) |
+| Feature correlation | No | Yes (cross-view) |
+| Sample similarity | No | Yes (dot product) |
 **Coefficients**: cov=1.0, feature=1.0, dot=1.0
 **Base Model**: `answerdotai/ModernBERT-base`
 - **Field-Specific Separability**: Distinguishing between different biological fields
 - **Semantic Search**: Retrieval quality on biomedical text corpora
+For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/source-data/soda-vec).
 ## Intended Use
 - **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
 - **Scientific Text Similarity**: Computing similarity between biomedical texts
 ## Limitations
   title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
   author = {EMBO},
   year = {2024},
+  url = {https://github.com/source-data/soda-vec}
 }
 ```
 ## Model Card Contact
+For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/source-data/soda-vec).
 ---