Update README.md
Browse files
README.md
CHANGED
|
@@ -346,15 +346,12 @@ We welcome contributions to improve SOFIA:
|
|
| 346 |
|
| 347 |
## Hugging Face Model Card Upgrades
|
| 348 |
|
| 349 |
-
|
| 350 |
|
| 351 |
-
|
|
|
|
| 352 |
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
### 1) Add YAML header to the **top of README.md** (enables widgets, search, and metrics)
|
| 356 |
-
|
| 357 |
-
```md
|
| 358 |
---
|
| 359 |
library_name: sentence-transformers
|
| 360 |
license: apache-2.0
|
|
@@ -379,108 +376,79 @@ widget:
|
|
| 379 |
- text: "Hello world"
|
| 380 |
- text: "How are you?"
|
| 381 |
---
|
| 382 |
-
|
| 383 |
```
|
| 384 |
|
| 385 |
-
|
|
|
|
| 386 |
|
| 387 |
-
|
| 388 |
-
|
| 389 |
-
### 2) Add a real **license file** (Apache-2.0)
|
| 390 |
-
|
| 391 |
-
Create `LICENSE`:
|
| 392 |
-
|
| 393 |
-
```text
|
| 394 |
-
Apache License
|
| 395 |
-
Version 2.0, January 2004
|
| 396 |
-
http://www.apache.org/licenses/
|
| 397 |
-
...
|
| 398 |
-
END OF TERMS AND CONDITIONS
|
| 399 |
-
```
|
| 400 |
-
|
| 401 |
-
(Use the standard Apache-2.0 text; HF will detect it automatically.)
|
| 402 |
-
|
| 403 |
-
---
|
| 404 |
-
|
| 405 |
-
### 3) Auto-insert **MTEB results** into README (model-index)
|
| 406 |
-
|
| 407 |
-
Run this locally to generate metrics → it will update the README in place.
|
| 408 |
-
|
| 409 |
-
**a) Quick eval & cache**
|
| 410 |
|
|
|
|
| 411 |
```bash
|
| 412 |
-
python -
|
| 413 |
from mteb import MTEB
|
| 414 |
from sentence_transformers import SentenceTransformer
|
| 415 |
-
|
| 416 |
-
tasks = [
|
| 417 |
-
MTEB(tasks=tasks).run(
|
| 418 |
-
|
| 419 |
-
PY
|
| 420 |
```
|
| 421 |
|
| 422 |
-
**
|
| 423 |
-
|
| 424 |
-
```md
|
| 425 |
<!-- METRICS_START -->
|
| 426 |
_TBD_
|
| 427 |
<!-- METRICS_END -->
|
| 428 |
```
|
| 429 |
|
| 430 |
-
**
|
| 431 |
-
|
| 432 |
-
|
| 433 |
-
|
| 434 |
-
import json, glob, re, pathlib, statistics as st
|
| 435 |
from pathlib import Path
|
| 436 |
|
| 437 |
-
|
| 438 |
-
for
|
| 439 |
-
|
| 440 |
-
task =
|
| 441 |
-
|
| 442 |
-
|
| 443 |
-
|
| 444 |
-
|
| 445 |
-
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
|
| 450 |
-
|
| 451 |
-
|
| 452 |
-
|
| 453 |
-
|
| 454 |
-
|
| 455 |
-
|
| 456 |
-
f
|
| 457 |
-
f
|
| 458 |
-
f
|
| 459 |
-
|
| 460 |
-
|
| 461 |
-
|
| 462 |
-
|
| 463 |
-
|
| 464 |
-
|
|
|
|
|
|
|
| 465 |
readme, flags=re.S)
|
| 466 |
-
Path(
|
| 467 |
-
print(
|
| 468 |
-
|
| 469 |
-
|
| 470 |
-
|
| 471 |
-
This gives you a **valid `model-index`** section HF can parse.
|
| 472 |
-
|
| 473 |
-
---
|
| 474 |
-
|
| 475 |
-
### 4) Lock the **inference dimension** in the card (already 1024)
|
| 476 |
-
|
| 477 |
-
Your files show Dense out\_features=1024 and pooling mean enabled; keep that claim consistent. ([Hugging Face][2])
|
| 478 |
-
|
| 479 |
-
---
|
| 480 |
|
| 481 |
-
###
|
|
|
|
| 482 |
|
| 483 |
-
|
|
|
|
| 484 |
|
| 485 |
```json
|
| 486 |
{
|
|
@@ -492,31 +460,30 @@ Your `config_sentence_transformers.json` has empty prompts. Add sensible default
|
|
| 492 |
}
|
| 493 |
```
|
| 494 |
|
| 495 |
-
|
| 496 |
-
|
| 497 |
-
---
|
| 498 |
-
|
| 499 |
-
### 6) Minimal client code (Python + Node) for the README
|
| 500 |
|
|
|
|
| 501 |
```python
|
| 502 |
from sentence_transformers import SentenceTransformer, util
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
|
|
|
|
|
|
| 507 |
```
|
| 508 |
|
| 509 |
-
|
|
|
|
| 510 |
import { SentenceTransformer } from "sentence-transformers";
|
| 511 |
-
const m = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
|
| 512 |
-
const emb = await m.encode(["hello","world"], { normalize: true });
|
| 513 |
-
console.log(emb[0].length); // 1024
|
| 514 |
-
```
|
| 515 |
|
| 516 |
-
|
|
|
|
|
|
|
|
|
|
| 517 |
|
| 518 |
-
|
|
|
|
| 519 |
|
| 520 |
-
[
|
| 521 |
-
[2]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/2_Dense/config.json "2_Dense/config.json · MaliosDark/sofia-embedding-v1 at main"
|
| 522 |
-
[3]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/config_sentence_transformers.json "config_sentence_transformers.json · MaliosDark/sofia-embedding-v1 at main"
|
|
|
|
| 346 |
|
| 347 |
## Hugging Face Model Card Upgrades
|
| 348 |
|
| 349 |
+
Your model is live on Hugging Face! It loads correctly as **MPNet + mean pooling + Dense(768→1024)**, matching your configuration files. Here are **drop-in upgrades** to enhance your model card with widgets, metrics, and better discoverability.
|
| 350 |
|
| 351 |
+
### 1. YAML Front Matter (Required)
|
| 352 |
+
Add this to the **very top** of your README.md (before the title) to enable Hugging Face features:
|
| 353 |
|
| 354 |
+
```yaml
|
|
|
|
|
|
|
|
|
|
|
|
|
| 355 |
---
|
| 356 |
library_name: sentence-transformers
|
| 357 |
license: apache-2.0
|
|
|
|
| 376 |
- text: "Hello world"
|
| 377 |
- text: "How are you?"
|
| 378 |
---
|
|
|
|
| 379 |
```
|
| 380 |
|
| 381 |
+
### 2. License File (Required)
|
| 382 |
+
Create a `LICENSE` file in your repo root with the full Apache 2.0 text. Hugging Face will auto-detect it.
|
| 383 |
|
| 384 |
+
### 3. MTEB Metrics Block (Recommended)
|
| 385 |
+
To display performance metrics on your model card:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 386 |
|
| 387 |
+
**Step A: Run evaluation locally**
|
| 388 |
```bash
|
| 389 |
+
python -c "
|
| 390 |
from mteb import MTEB
|
| 391 |
from sentence_transformers import SentenceTransformer
|
| 392 |
+
model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
|
| 393 |
+
tasks = ['STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'STSBenchmark']
|
| 394 |
+
MTEB(tasks=tasks).run(model, output_folder='./mteb_results')
|
| 395 |
+
"
|
|
|
|
| 396 |
```
|
| 397 |
|
| 398 |
+
**Step B: Add metrics placeholder to README**
|
| 399 |
+
```markdown
|
|
|
|
| 400 |
<!-- METRICS_START -->
|
| 401 |
_TBD_
|
| 402 |
<!-- METRICS_END -->
|
| 403 |
```
|
| 404 |
|
| 405 |
+
**Step C: Inject results automatically**
|
| 406 |
+
```bash
|
| 407 |
+
python -c "
|
| 408 |
+
import json, glob, re
|
|
|
|
| 409 |
from pathlib import Path
|
| 410 |
|
| 411 |
+
results = []
|
| 412 |
+
for f in glob.glob('mteb_results/*/*/results.json'):
|
| 413 |
+
data = json.load(open(f))
|
| 414 |
+
task = data['mteb_dataset_name']
|
| 415 |
+
main = data.get('main_score')
|
| 416 |
+
pearson = data.get('test', {}).get('cos_sim', {}).get('pearson')
|
| 417 |
+
spearman = data.get('test', {}).get('cos_sim', {}).get('spearman')
|
| 418 |
+
results.append((task, main, pearson, spearman))
|
| 419 |
+
|
| 420 |
+
lines = ['model-index:', '- name: sofia-embedding-v1', ' results:']
|
| 421 |
+
for task, main, p, s in sorted(results):
|
| 422 |
+
m = f'{main:.4f}' if main else 'null'
|
| 423 |
+
pe = f'{p:.4f}' if p else 'null'
|
| 424 |
+
sp = f'{s:.4f}' if s else 'null'
|
| 425 |
+
lines.extend([
|
| 426 |
+
f' - task: {{type: sts, name: STS}}',
|
| 427 |
+
f' dataset: {{name: {task}, type: mteb/{task}}}',
|
| 428 |
+
' metrics:',
|
| 429 |
+
f' - type: main_score',
|
| 430 |
+
f' value: {m}',
|
| 431 |
+
f' - type: pearson',
|
| 432 |
+
f' value: {pe}',
|
| 433 |
+
f' - type: spearman',
|
| 434 |
+
f' value: {sp}'
|
| 435 |
+
])
|
| 436 |
+
|
| 437 |
+
block = '```\n' + '\n'.join(lines) + '\n```'
|
| 438 |
+
readme = Path('README.md').read_text()
|
| 439 |
+
readme = re.sub(r'<!-- METRICS_START -->.*?<!-- METRICS_END -->',
|
| 440 |
+
f'<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->',
|
| 441 |
readme, flags=re.S)
|
| 442 |
+
Path('README.md').write_text(readme)
|
| 443 |
+
print('Metrics injected into README!')
|
| 444 |
+
"
|
| 445 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 446 |
|
| 447 |
+
### 4. Inference Configuration (Already Correct)
|
| 448 |
+
Your model correctly outputs 1024-dimensional embeddings with mean pooling. No changes needed.
|
| 449 |
|
| 450 |
+
### 5. Prompted Retrieval Mode (Optional)
|
| 451 |
+
For better zero-shot retrieval, update `config_sentence_transformers.json`:
|
| 452 |
|
| 453 |
```json
|
| 454 |
{
|
|
|
|
| 460 |
}
|
| 461 |
```
|
| 462 |
|
| 463 |
+
### 6. Usage Examples
|
| 464 |
+
Add these minimal code snippets to your README:
|
|
|
|
|
|
|
|
|
|
| 465 |
|
| 466 |
+
**Python:**
|
| 467 |
```python
|
| 468 |
from sentence_transformers import SentenceTransformer, util
|
| 469 |
+
|
| 470 |
+
model = SentenceTransformer("MaliosDark/sofia-embedding-v1")
|
| 471 |
+
sentences = ["Hello world", "How are you?"]
|
| 472 |
+
embeddings = model.encode(sentences, normalize_embeddings=True)
|
| 473 |
+
similarity = util.cos_sim(embeddings[0], embeddings[1])
|
| 474 |
+
print(similarity.item()) # ~0.9
|
| 475 |
```
|
| 476 |
|
| 477 |
+
**JavaScript/Node.js:**
|
| 478 |
+
```javascript
|
| 479 |
import { SentenceTransformer } from "sentence-transformers";
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 |
|
| 481 |
+
const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
|
| 482 |
+
const embeddings = await model.encode(["hello", "world"], { normalize: true });
|
| 483 |
+
console.log(embeddings[0].length); // 1024
|
| 484 |
+
```
|
| 485 |
|
| 486 |
+
### Ready-to-Use README Template
|
| 487 |
+
Want a complete PR-ready README with all upgrades applied? Let me know and I'll generate it based on your current model card.
|
| 488 |
|
| 489 |
+
[View on Hugging Face](https://huggingface.co/MaliosDark/sofia-embedding-v1)
|
|
|
|
|
|