drAbreu commited on
Commit
1426fb8
·
verified ·
1 Parent(s): d3712e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -21,7 +21,7 @@ metrics:
21
 
22
  ## Model Description
23
 
24
- SODA-VEC embedding model trained with VICReg Our loss function. This model uses normalized embeddings with covariance, feature, and dot product losses (diagonal-only) to learn biomedical text representations.
25
 
26
  This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
27
 
@@ -44,6 +44,17 @@ This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Ve
44
 
45
  **Loss Function**: VICReg Our: normalized embeddings with covariance loss, feature loss, and dot product loss (diagonal-only)
46
 
 
 
 
 
 
 
 
 
 
 
 
47
  **Coefficients**: cov=1.0, feature=1.0, dot=1.0
48
  **Base Model**: `answerdotai/ModernBERT-base`
49
 
@@ -135,7 +146,7 @@ The model has been evaluated on comprehensive biomedical benchmarks including:
135
  - **Field-Specific Separability**: Distinguishing between different biological fields
136
  - **Semantic Search**: Retrieval quality on biomedical text corpora
137
 
138
- For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/EMBO/soda-vec).
139
 
140
  ## Intended Use
141
 
@@ -143,9 +154,6 @@ This model is designed for:
143
 
144
  - **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
145
  - **Scientific Text Similarity**: Computing similarity between biomedical texts
146
- - **Information Retrieval**: Building search systems for scientific literature
147
- - **Downstream Tasks**: As a base for fine-tuning on specific biomedical tasks
148
- - **Research Applications**: Academic and research use in life sciences
149
 
150
  ## Limitations
151
 
@@ -163,13 +171,13 @@ If you use this model, please cite:
163
  title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
164
  author = {EMBO},
165
  year = {2024},
166
- url = {https://github.com/EMBO/soda-vec}
167
  }
168
  ```
169
 
170
  ## Model Card Contact
171
 
172
- For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/EMBO/soda-vec).
173
 
174
  ---
175
 
 
21
 
22
  ## Model Description
23
 
24
+ SODA-VEC embedding model trained with [VICReg](https://arxiv.org/pdf/2105.04906) Our loss function. This model uses normalized embeddings with covariance, feature, and dot product losses (diagonal-only) to learn biomedical text representations.
25
 
26
  This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
27
 
 
44
 
45
  **Loss Function**: VICReg Our: normalized embeddings with covariance loss, feature loss, and dot product loss (diagonal-only)
46
 
47
+ We have implemented a series of changes from the original [VICREG in the paper from Meta](https://arxiv.org/pdf/2105.04906). Here we show the main differences:
48
+
49
+ | Feature | Original VICReg | VICReg Our |
50
+ |---------|----------------|------------|
51
+ | Normalization | No | Yes (L2-normalized) |
52
+ | Invariance (MSE) | Yes | No |
53
+ | Variance (hinge) | Yes | No |
54
+ | Covariance | Yes (unnormalized) | Yes (normalized) |
55
+ | Feature correlation | No | Yes (cross-view) |
56
+ | Sample similarity | No | Yes (dot product) |
57
+
58
  **Coefficients**: cov=1.0, feature=1.0, dot=1.0
59
  **Base Model**: `answerdotai/ModernBERT-base`
60
 
 
146
  - **Field-Specific Separability**: Distinguishing between different biological fields
147
  - **Semantic Search**: Retrieval quality on biomedical text corpora
148
 
149
+ For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/source-data/soda-vec).
150
 
151
  ## Intended Use
152
 
 
154
 
155
  - **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
156
  - **Scientific Text Similarity**: Computing similarity between biomedical texts
 
 
 
157
 
158
  ## Limitations
159
 
 
171
  title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
172
  author = {EMBO},
173
  year = {2024},
174
+ url = {https://github.com/source-data/soda-vec}
175
  }
176
  ```
177
 
178
  ## Model Card Contact
179
 
180
+ For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/source-data/soda-vec).
181
 
182
  ---
183