MaliosDark commited on
Commit
8a51e3a
·
verified ·
1 Parent(s): 2f2e3fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -109
README.md CHANGED
@@ -346,15 +346,12 @@ We welcome contributions to improve SOFIA:
346
 
347
  ## Hugging Face Model Card Upgrades
348
 
349
- Nice! It's live and loads as **MPNet + mean pooling + Dense(768→1024)** matches your files (`modules.json`, `1_Pooling/config.json`, `2_Dense/config.json`, `sentence_bert_config.json`). ([Hugging Face][1])
350
 
351
- Below are **drop-in upgrades**: paste/add these files to your repo and commit.
 
352
 
353
- ---
354
-
355
- ### 1) Add YAML header to the **top of README.md** (enables widgets, search, and metrics)
356
-
357
- ```md
358
  ---
359
  library_name: sentence-transformers
360
  license: apache-2.0
@@ -379,108 +376,79 @@ widget:
379
  - text: "Hello world"
380
  - text: "How are you?"
381
  ---
382
-
383
  ```
384
 
385
- > Put that **as the very first lines** of the README, before `# SOFIA`.
 
386
 
387
- ---
388
-
389
- ### 2) Add a real **license file** (Apache-2.0)
390
-
391
- Create `LICENSE`:
392
-
393
- ```text
394
- Apache License
395
- Version 2.0, January 2004
396
- http://www.apache.org/licenses/
397
- ...
398
- END OF TERMS AND CONDITIONS
399
- ```
400
-
401
- (Use the standard Apache-2.0 text; HF will detect it automatically.)
402
-
403
- ---
404
-
405
- ### 3) Auto-insert **MTEB results** into README (model-index)
406
-
407
- Run this locally to generate metrics → it will update the README in place.
408
-
409
- **a) Quick eval & cache**
410
 
 
411
  ```bash
412
- python - <<'PY'
413
  from mteb import MTEB
414
  from sentence_transformers import SentenceTransformer
415
- mid = "MaliosDark/sofia-embedding-v1"
416
- tasks = ["STS12","STS13","STS14","STS15","STS16","STSBenchmark"]
417
- MTEB(tasks=tasks).run(SentenceTransformer(mid), output_folder="./mteb_out")
418
- print("Wrote results under ./mteb_out")
419
- PY
420
  ```
421
 
422
- **b) Insert a `<!-- METRICS_START --> ... <!-- METRICS_END -->` block in README**
423
-
424
- ```md
425
  <!-- METRICS_START -->
426
  _TBD_
427
  <!-- METRICS_END -->
428
  ```
429
 
430
- **c) Run the injector**
431
-
432
- ````bash
433
- python - <<'PY'
434
- import json, glob, re, pathlib, statistics as st
435
  from pathlib import Path
436
 
437
- res = []
438
- for j in glob.glob("mteb_out/*/*/results.json"):
439
- R = json.load(open(j))
440
- task = R["mteb_dataset_name"]
441
- metrics = R.get("main_score", None)
442
- # fallbacks
443
- pearson = R.get("test", {}).get("cos_sim", {}).get("pearson", None)
444
- spearman = R.get("test", {}).get("cos_sim", {}).get("spearman", None)
445
- res.append((task, metrics, pearson, spearman))
446
-
447
- lines = ["model-index:\n- name: sofia-embedding-v1\n results:"]
448
- for task, main, p, s in sorted(res):
449
- m = (f"{main:.4f}" if isinstance(main,(int,float)) else "null")
450
- pe= (f"{p:.4f}" if isinstance(p,(int,float)) else "null")
451
- sp= (f"{s:.4f}" if isinstance(s,(int,float)) else "null")
452
- lines += [
453
- " - task: {type: sts, name: STS}",
454
- f" dataset: {{name: {task}, type: mteb/{task}}}",
455
- " metrics:",
456
- f" - type: main_score\n value: {m}",
457
- f" - type: pearson\n value: {pe}",
458
- f" - type: spearman\n value: {sp}",
459
- ]
460
-
461
- block = "```\n" + "\n".join(lines) + "\n```"
462
- readme = Path("README.md").read_text(encoding="utf-8")
463
- readme = re.sub(r"<!-- METRICS_START -->.*?<!-- METRICS_END -->",
464
- f"<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->",
 
 
465
  readme, flags=re.S)
466
- Path("README.md").write_text(readme, encoding="utf-8")
467
- print("README updated with model-index block.")
468
- PY
469
- ````
470
-
471
- This gives you a **valid `model-index`** section HF can parse.
472
-
473
- ---
474
-
475
- ### 4) Lock the **inference dimension** in the card (already 1024)
476
-
477
- Your files show Dense out\_features=1024 and pooling mean enabled; keep that claim consistent. ([Hugging Face][2])
478
-
479
- ---
480
 
481
- ### 5) Optional add **prompted mode** (query/document) for retrieval
 
482
 
483
- Your `config_sentence_transformers.json` has empty prompts. Add sensible defaults:
 
484
 
485
  ```json
486
  {
@@ -492,31 +460,30 @@ Your `config_sentence_transformers.json` has empty prompts. Add sensible default
492
  }
493
  ```
494
 
495
- (Upload this file to the repo to improve zero-shot retrieval.) ([Hugging Face][3])
496
-
497
- ---
498
-
499
- ### 6) Minimal client code (Python + Node) for the README
500
 
 
501
  ```python
502
  from sentence_transformers import SentenceTransformer, util
503
- m = SentenceTransformer("MaliosDark/sofia-embedding-v1")
504
- a, b = "A quick brown fox", "The fast brown fox"
505
- x = m.encode([a, b], normalize_embeddings=True)
506
- print(util.cos_sim(x[0], x[1]).item())
 
 
507
  ```
508
 
509
- ```js
 
510
  import { SentenceTransformer } from "sentence-transformers";
511
- const m = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
512
- const emb = await m.encode(["hello","world"], { normalize: true });
513
- console.log(emb[0].length); // 1024
514
- ```
515
 
516
- ---
 
 
 
517
 
518
- Want me to auto-generate a **PR-ready README** for your repo (with the YAML header + metrics block inserted)? I can drop the exact Markdown here based on your current page.
 
519
 
520
- [1]: https://huggingface.co/MaliosDark/sofia-embedding-v1/tree/main "MaliosDark/sofia-embedding-v1 at main"
521
- [2]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/2_Dense/config.json "2_Dense/config.json · MaliosDark/sofia-embedding-v1 at main"
522
- [3]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/config_sentence_transformers.json "config_sentence_transformers.json · MaliosDark/sofia-embedding-v1 at main"
 
346
 
347
  ## Hugging Face Model Card Upgrades
348
 
349
+ Your model is live on Hugging Face! It loads correctly as **MPNet + mean pooling + Dense(768→1024)**, matching your configuration files. Here are **drop-in upgrades** to enhance your model card with widgets, metrics, and better discoverability.
350
 
351
+ ### 1. YAML Front Matter (Required)
352
+ Add this to the **very top** of your README.md (before the title) to enable Hugging Face features:
353
 
354
+ ```yaml
 
 
 
 
355
  ---
356
  library_name: sentence-transformers
357
  license: apache-2.0
 
376
  - text: "Hello world"
377
  - text: "How are you?"
378
  ---
 
379
  ```
380
 
381
+ ### 2. License File (Required)
382
+ Create a `LICENSE` file in your repo root with the full Apache 2.0 text. Hugging Face will auto-detect it.
383
 
384
+ ### 3. MTEB Metrics Block (Recommended)
385
+ To display performance metrics on your model card:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
386
 
387
+ **Step A: Run evaluation locally**
388
  ```bash
389
+ python -c "
390
  from mteb import MTEB
391
  from sentence_transformers import SentenceTransformer
392
+ model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
393
+ tasks = ['STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'STSBenchmark']
394
+ MTEB(tasks=tasks).run(model, output_folder='./mteb_results')
395
+ "
 
396
  ```
397
 
398
+ **Step B: Add metrics placeholder to README**
399
+ ```markdown
 
400
  <!-- METRICS_START -->
401
  _TBD_
402
  <!-- METRICS_END -->
403
  ```
404
 
405
+ **Step C: Inject results automatically**
406
+ ```bash
407
+ python -c "
408
+ import json, glob, re
 
409
  from pathlib import Path
410
 
411
+ results = []
412
+ for f in glob.glob('mteb_results/*/*/results.json'):
413
+ data = json.load(open(f))
414
+ task = data['mteb_dataset_name']
415
+ main = data.get('main_score')
416
+ pearson = data.get('test', {}).get('cos_sim', {}).get('pearson')
417
+ spearman = data.get('test', {}).get('cos_sim', {}).get('spearman')
418
+ results.append((task, main, pearson, spearman))
419
+
420
+ lines = ['model-index:', '- name: sofia-embedding-v1', ' results:']
421
+ for task, main, p, s in sorted(results):
422
+ m = f'{main:.4f}' if main else 'null'
423
+ pe = f'{p:.4f}' if p else 'null'
424
+ sp = f'{s:.4f}' if s else 'null'
425
+ lines.extend([
426
+ f' - task: {{type: sts, name: STS}}',
427
+ f' dataset: {{name: {task}, type: mteb/{task}}}',
428
+ ' metrics:',
429
+ f' - type: main_score',
430
+ f' value: {m}',
431
+ f' - type: pearson',
432
+ f' value: {pe}',
433
+ f' - type: spearman',
434
+ f' value: {sp}'
435
+ ])
436
+
437
+ block = '```\n' + '\n'.join(lines) + '\n```'
438
+ readme = Path('README.md').read_text()
439
+ readme = re.sub(r'<!-- METRICS_START -->.*?<!-- METRICS_END -->',
440
+ f'<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->',
441
  readme, flags=re.S)
442
+ Path('README.md').write_text(readme)
443
+ print('Metrics injected into README!')
444
+ "
445
+ ```
 
 
 
 
 
 
 
 
 
 
446
 
447
+ ### 4. Inference Configuration (Already Correct)
448
+ Your model correctly outputs 1024-dimensional embeddings with mean pooling. No changes needed.
449
 
450
+ ### 5. Prompted Retrieval Mode (Optional)
451
+ For better zero-shot retrieval, update `config_sentence_transformers.json`:
452
 
453
  ```json
454
  {
 
460
  }
461
  ```
462
 
463
+ ### 6. Usage Examples
464
+ Add these minimal code snippets to your README:
 
 
 
465
 
466
+ **Python:**
467
  ```python
468
  from sentence_transformers import SentenceTransformer, util
469
+
470
+ model = SentenceTransformer("MaliosDark/sofia-embedding-v1")
471
+ sentences = ["Hello world", "How are you?"]
472
+ embeddings = model.encode(sentences, normalize_embeddings=True)
473
+ similarity = util.cos_sim(embeddings[0], embeddings[1])
474
+ print(similarity.item()) # ~0.9
475
  ```
476
 
477
+ **JavaScript/Node.js:**
478
+ ```javascript
479
  import { SentenceTransformer } from "sentence-transformers";
 
 
 
 
480
 
481
+ const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
482
+ const embeddings = await model.encode(["hello", "world"], { normalize: true });
483
+ console.log(embeddings[0].length); // 1024
484
+ ```
485
 
486
+ ### Ready-to-Use README Template
487
+ Want a complete PR-ready README with all upgrades applied? Let me know and I'll generate it based on your current model card.
488
 
489
+ [View on Hugging Face](https://huggingface.co/MaliosDark/sofia-embedding-v1)