mse=0.0234
Browse files- README.md +53 -49
- config_sentence_transformers.json +1 -1
- eval/similarity_evaluation_val_results.csv +4 -4
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -4,35 +4,35 @@ tags:
|
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:CosineSimilarityLoss
|
| 9 |
base_model: sentence-transformers/all-mpnet-base-v2
|
| 10 |
widget:
|
| 11 |
-
- source_sentence:
|
| 12 |
sentences:
|
| 13 |
-
-
|
| 14 |
-
-
|
| 15 |
-
-
|
| 16 |
-
- source_sentence:
|
| 17 |
sentences:
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
- source_sentence:
|
| 22 |
sentences:
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
- source_sentence:
|
| 27 |
sentences:
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
- source_sentence:
|
| 32 |
sentences:
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
pipeline_tag: sentence-similarity
|
| 37 |
library_name: sentence-transformers
|
| 38 |
metrics:
|
|
@@ -49,10 +49,10 @@ model-index:
|
|
| 49 |
type: val
|
| 50 |
metrics:
|
| 51 |
- type: pearson_cosine
|
| 52 |
-
value: 0.
|
| 53 |
name: Pearson Cosine
|
| 54 |
- type: spearman_cosine
|
| 55 |
-
value: 0.
|
| 56 |
name: Spearman Cosine
|
| 57 |
---
|
| 58 |
|
|
@@ -105,9 +105,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 105 |
model = SentenceTransformer("sentence_transformers_model_id")
|
| 106 |
# Run inference
|
| 107 |
sentences = [
|
| 108 |
-
'
|
| 109 |
-
'
|
| 110 |
-
'
|
| 111 |
]
|
| 112 |
embeddings = model.encode(sentences)
|
| 113 |
print(embeddings.shape)
|
|
@@ -152,10 +152,10 @@ You can finetune this model on your own dataset.
|
|
| 152 |
* Dataset: `val`
|
| 153 |
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
| 154 |
|
| 155 |
-
| Metric | Value
|
| 156 |
-
|
| 157 |
-
| pearson_cosine | 0.
|
| 158 |
-
| **spearman_cosine** | **0.
|
| 159 |
|
| 160 |
<!--
|
| 161 |
## Bias, Risks and Limitations
|
|
@@ -175,19 +175,19 @@ You can finetune this model on your own dataset.
|
|
| 175 |
|
| 176 |
#### Unnamed Dataset
|
| 177 |
|
| 178 |
-
* Size:
|
| 179 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 180 |
* Approximate statistics based on the first 1000 samples:
|
| 181 |
-
| | sentence_0 | sentence_1
|
| 182 |
-
|
| 183 |
-
| type | string | string
|
| 184 |
-
| details | <ul><li>min: 4 tokens</li><li>mean:
|
| 185 |
* Samples:
|
| 186 |
-
| sentence_0
|
| 187 |
-
|
| 188 |
-
| <code>
|
| 189 |
-
| <code>
|
| 190 |
-
| <code>
|
| 191 |
* Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
|
| 192 |
```json
|
| 193 |
{
|
|
@@ -326,20 +326,24 @@ You can finetune this model on your own dataset.
|
|
| 326 |
### Training Logs
|
| 327 |
| Epoch | Step | val_spearman_cosine |
|
| 328 |
|:------:|:----:|:-------------------:|
|
| 329 |
-
| 0.
|
| 330 |
-
| 1.0 |
|
| 331 |
-
| 1.
|
| 332 |
-
|
|
| 333 |
-
| 2.
|
| 334 |
-
|
|
| 335 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 336 |
|
| 337 |
|
| 338 |
### Framework Versions
|
| 339 |
-
- Python: 3.12.
|
| 340 |
- Sentence Transformers: 4.1.0
|
| 341 |
- Transformers: 4.52.4
|
| 342 |
-
- PyTorch: 2.7.1
|
| 343 |
- Accelerate: 1.7.0
|
| 344 |
- Datasets: 3.6.0
|
| 345 |
- Tokenizers: 0.21.1
|
|
|
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:3072
|
| 8 |
- loss:CosineSimilarityLoss
|
| 9 |
base_model: sentence-transformers/all-mpnet-base-v2
|
| 10 |
widget:
|
| 11 |
+
- source_sentence: Build type system for programming language from scratch
|
| 12 |
sentences:
|
| 13 |
+
- Uses TypeScript for type-safe JavaScript
|
| 14 |
+
- Led architecture decision meetings resulting in consensus
|
| 15 |
+
- Integrated Stripe, PayPal, and custom payment solutions
|
| 16 |
+
- source_sentence: Privacy engineering skills
|
| 17 |
sentences:
|
| 18 |
+
- Implemented differential privacy
|
| 19 |
+
- Technical implementation without vendor management
|
| 20 |
+
- Created developer-friendly APIs with Swagger docs
|
| 21 |
+
- source_sentence: Privacy Pass, privacy protocol
|
| 22 |
sentences:
|
| 23 |
+
- Modern development tools only
|
| 24 |
+
- Excellent at breaking down complex topics for junior developers
|
| 25 |
+
- Privacy-preserving authentication methods
|
| 26 |
+
- source_sentence: JVM tuning and profiling
|
| 27 |
sentences:
|
| 28 |
+
- Performance monitoring patterns
|
| 29 |
+
- Optimized GC settings reducing pause times
|
| 30 |
+
- Senior developer with proven track record debugging distributed system race conditions
|
| 31 |
+
- source_sentence: Knowledge sharing enthusiasm
|
| 32 |
sentences:
|
| 33 |
+
- Regular meetup speaker and blogger
|
| 34 |
+
- Optimized Spark jobs processing terabytes of data daily
|
| 35 |
+
- Configured database partitioning
|
| 36 |
pipeline_tag: sentence-similarity
|
| 37 |
library_name: sentence-transformers
|
| 38 |
metrics:
|
|
|
|
| 49 |
type: val
|
| 50 |
metrics:
|
| 51 |
- type: pearson_cosine
|
| 52 |
+
value: 0.8977247913414342
|
| 53 |
name: Pearson Cosine
|
| 54 |
- type: spearman_cosine
|
| 55 |
+
value: 0.8052388814564073
|
| 56 |
name: Spearman Cosine
|
| 57 |
---
|
| 58 |
|
|
|
|
| 105 |
model = SentenceTransformer("sentence_transformers_model_id")
|
| 106 |
# Run inference
|
| 107 |
sentences = [
|
| 108 |
+
'Knowledge sharing enthusiasm',
|
| 109 |
+
'Regular meetup speaker and blogger',
|
| 110 |
+
'Configured database partitioning',
|
| 111 |
]
|
| 112 |
embeddings = model.encode(sentences)
|
| 113 |
print(embeddings.shape)
|
|
|
|
| 152 |
* Dataset: `val`
|
| 153 |
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
| 154 |
|
| 155 |
+
| Metric | Value |
|
| 156 |
+
|:--------------------|:-----------|
|
| 157 |
+
| pearson_cosine | 0.8977 |
|
| 158 |
+
| **spearman_cosine** | **0.8052** |
|
| 159 |
|
| 160 |
<!--
|
| 161 |
## Bias, Risks and Limitations
|
|
|
|
| 175 |
|
| 176 |
#### Unnamed Dataset
|
| 177 |
|
| 178 |
+
* Size: 3,072 training samples
|
| 179 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 180 |
* Approximate statistics based on the first 1000 samples:
|
| 181 |
+
| | sentence_0 | sentence_1 | label |
|
| 182 |
+
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 183 |
+
| type | string | string | float |
|
| 184 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 9.74 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 11.06 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.67</li><li>max: 1.0</li></ul> |
|
| 185 |
* Samples:
|
| 186 |
+
| sentence_0 | sentence_1 | label |
|
| 187 |
+
|:---------------------------------------------------------------------------|:---------------------------------------------------------------------------|:-----------------|
|
| 188 |
+
| <code>Boundary-value testing and equivalence partitioning expertise</code> | <code>QA engineer designing test cases with boundary value analysis</code> | <code>0.9</code> |
|
| 189 |
+
| <code>Must have strong decision-making skills</code> | <code>Makes timely decisions based on available information</code> | <code>0.7</code> |
|
| 190 |
+
| <code>8+ years building real-time collaboration tools</code> | <code>Traditional request-response application development</code> | <code>0.2</code> |
|
| 191 |
* Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
|
| 192 |
```json
|
| 193 |
{
|
|
|
|
| 326 |
### Training Logs
|
| 327 |
| Epoch | Step | val_spearman_cosine |
|
| 328 |
|:------:|:----:|:-------------------:|
|
| 329 |
+
| 0.5208 | 50 | 0.6737 |
|
| 330 |
+
| 1.0 | 96 | 0.7384 |
|
| 331 |
+
| 1.0417 | 100 | 0.7431 |
|
| 332 |
+
| 1.5625 | 150 | 0.7703 |
|
| 333 |
+
| 2.0 | 192 | 0.7790 |
|
| 334 |
+
| 2.0833 | 200 | 0.7817 |
|
| 335 |
+
| 2.6042 | 250 | 0.8011 |
|
| 336 |
+
| 3.0 | 288 | 0.7967 |
|
| 337 |
+
| 3.125 | 300 | 0.7963 |
|
| 338 |
+
| 3.6458 | 350 | 0.8046 |
|
| 339 |
+
| 4.0 | 384 | 0.8052 |
|
| 340 |
|
| 341 |
|
| 342 |
### Framework Versions
|
| 343 |
+
- Python: 3.12.10
|
| 344 |
- Sentence Transformers: 4.1.0
|
| 345 |
- Transformers: 4.52.4
|
| 346 |
+
- PyTorch: 2.7.1+cu126
|
| 347 |
- Accelerate: 1.7.0
|
| 348 |
- Datasets: 3.6.0
|
| 349 |
- Tokenizers: 0.21.1
|
config_sentence_transformers.json
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "4.1.0",
|
| 4 |
"transformers": "4.52.4",
|
| 5 |
-
"pytorch": "2.7.1"
|
| 6 |
},
|
| 7 |
"prompts": {},
|
| 8 |
"default_prompt_name": null,
|
|
|
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "4.1.0",
|
| 4 |
"transformers": "4.52.4",
|
| 5 |
+
"pytorch": "2.7.1+cu126"
|
| 6 |
},
|
| 7 |
"prompts": {},
|
| 8 |
"default_prompt_name": null,
|
eval/similarity_evaluation_val_results.csv
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
epoch,steps,cosine_pearson,cosine_spearman
|
| 2 |
-
1.0,
|
| 3 |
-
2.0,
|
| 4 |
-
3.0,
|
| 5 |
-
4.0,
|
|
|
|
| 1 |
epoch,steps,cosine_pearson,cosine_spearman
|
| 2 |
+
1.0,96,0.8272483418053012,0.7384040919120075
|
| 3 |
+
2.0,192,0.8806144722889805,0.7789630856263889
|
| 4 |
+
3.0,288,0.8940053252264049,0.7967165513263559
|
| 5 |
+
4.0,384,0.8977247913414342,0.8052388814564073
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 437967672
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:628af632e016d61b250d100cdf4a3b0b13f3c1b2802767ceea7fd31e83f3ebfa
|
| 3 |
size 437967672
|