MaliosDark commited on
Commit
2f2e3fe
·
verified ·
1 Parent(s): 8dfaf9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -1
README.md CHANGED
@@ -337,9 +337,186 @@ We welcome contributions to improve SOFIA:
337
 
338
  - **Website**: [zunvra.com](https://zunvra.com)
339
  - **Email**: contact@zunvra.com
340
- - **GitHub**: [github.com/zunvra](https://github.com/MaliosDark)
341
 
342
 
343
  ---
344
 
345
  *SOFIA: Intelligent embeddings for the future of AI.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337
 
338
  - **Website**: [zunvra.com](https://zunvra.com)
339
  - **Email**: contact@zunvra.com
340
+ - **GitHub**: [github.com/MaliosDark](https://github.com/MaliosDark)
341
 
342
 
343
  ---
344
 
345
  *SOFIA: Intelligent embeddings for the future of AI.*
346
+
347
+ ## Hugging Face Model Card Upgrades
348
+
349
+ Nice! It's live and loads as **MPNet + mean pooling + Dense(768→1024)** — matches your files (`modules.json`, `1_Pooling/config.json`, `2_Dense/config.json`, `sentence_bert_config.json`). ([Hugging Face][1])
350
+
351
+ Below are **drop-in upgrades**: paste/add these files to your repo and commit.
352
+
353
+ ---
354
+
355
+ ### 1) Add YAML header to the **top of README.md** (enables widgets, search, and metrics)
356
+
357
+ ```md
358
+ ---
359
+ library_name: sentence-transformers
360
+ license: apache-2.0
361
+ pipeline_tag: sentence-similarity
362
+ tags:
363
+ - embeddings
364
+ - sentence-transformers
365
+ - mpnet
366
+ - lora
367
+ - triplet-loss
368
+ - cosine-similarity
369
+ - retrieval
370
+ - mteb
371
+ language:
372
+ - en
373
+ datasets:
374
+ - sentence-transformers/stsb
375
+ - paws
376
+ - banking77
377
+ - mteb/nq
378
+ widget:
379
+ - text: "Hello world"
380
+ - text: "How are you?"
381
+ ---
382
+
383
+ ```
384
+
385
+ > Put that **as the very first lines** of the README, before `# SOFIA`.
386
+
387
+ ---
388
+
389
+ ### 2) Add a real **license file** (Apache-2.0)
390
+
391
+ Create `LICENSE`:
392
+
393
+ ```text
394
+ Apache License
395
+ Version 2.0, January 2004
396
+ http://www.apache.org/licenses/
397
+ ...
398
+ END OF TERMS AND CONDITIONS
399
+ ```
400
+
401
+ (Use the standard Apache-2.0 text; HF will detect it automatically.)
402
+
403
+ ---
404
+
405
+ ### 3) Auto-insert **MTEB results** into README (model-index)
406
+
407
+ Run this locally to generate metrics → it will update the README in place.
408
+
409
+ **a) Quick eval & cache**
410
+
411
+ ```bash
412
+ python - <<'PY'
413
+ from mteb import MTEB
414
+ from sentence_transformers import SentenceTransformer
415
+ mid = "MaliosDark/sofia-embedding-v1"
416
+ tasks = ["STS12","STS13","STS14","STS15","STS16","STSBenchmark"]
417
+ MTEB(tasks=tasks).run(SentenceTransformer(mid), output_folder="./mteb_out")
418
+ print("Wrote results under ./mteb_out")
419
+ PY
420
+ ```
421
+
422
+ **b) Insert a `<!-- METRICS_START --> ... <!-- METRICS_END -->` block in README**
423
+
424
+ ```md
425
+ <!-- METRICS_START -->
426
+ _TBD_
427
+ <!-- METRICS_END -->
428
+ ```
429
+
430
+ **c) Run the injector**
431
+
432
+ ````bash
433
+ python - <<'PY'
434
+ import json, glob, re, pathlib, statistics as st
435
+ from pathlib import Path
436
+
437
+ res = []
438
+ for j in glob.glob("mteb_out/*/*/results.json"):
439
+ R = json.load(open(j))
440
+ task = R["mteb_dataset_name"]
441
+ metrics = R.get("main_score", None)
442
+ # fallbacks
443
+ pearson = R.get("test", {}).get("cos_sim", {}).get("pearson", None)
444
+ spearman = R.get("test", {}).get("cos_sim", {}).get("spearman", None)
445
+ res.append((task, metrics, pearson, spearman))
446
+
447
+ lines = ["model-index:\n- name: sofia-embedding-v1\n results:"]
448
+ for task, main, p, s in sorted(res):
449
+ m = (f"{main:.4f}" if isinstance(main,(int,float)) else "null")
450
+ pe= (f"{p:.4f}" if isinstance(p,(int,float)) else "null")
451
+ sp= (f"{s:.4f}" if isinstance(s,(int,float)) else "null")
452
+ lines += [
453
+ " - task: {type: sts, name: STS}",
454
+ f" dataset: {{name: {task}, type: mteb/{task}}}",
455
+ " metrics:",
456
+ f" - type: main_score\n value: {m}",
457
+ f" - type: pearson\n value: {pe}",
458
+ f" - type: spearman\n value: {sp}",
459
+ ]
460
+
461
+ block = "```\n" + "\n".join(lines) + "\n```"
462
+ readme = Path("README.md").read_text(encoding="utf-8")
463
+ readme = re.sub(r"<!-- METRICS_START -->.*?<!-- METRICS_END -->",
464
+ f"<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->",
465
+ readme, flags=re.S)
466
+ Path("README.md").write_text(readme, encoding="utf-8")
467
+ print("README updated with model-index block.")
468
+ PY
469
+ ````
470
+
471
+ This gives you a **valid `model-index`** section HF can parse.
472
+
473
+ ---
474
+
475
+ ### 4) Lock the **inference dimension** in the card (already 1024)
476
+
477
+ Your files show Dense out\_features=1024 and pooling mean enabled; keep that claim consistent. ([Hugging Face][2])
478
+
479
+ ---
480
+
481
+ ### 5) Optional – add **prompted mode** (query/document) for retrieval
482
+
483
+ Your `config_sentence_transformers.json` has empty prompts. Add sensible defaults:
484
+
485
+ ```json
486
+ {
487
+ "__version__": { "sentence_transformers": "5.1.0" },
488
+ "model_type": "SentenceTransformer",
489
+ "prompts": { "query": "Query: ", "document": "Document: " },
490
+ "default_prompt_name": null,
491
+ "similarity_fn_name": "cosine"
492
+ }
493
+ ```
494
+
495
+ (Upload this file to the repo to improve zero-shot retrieval.) ([Hugging Face][3])
496
+
497
+ ---
498
+
499
+ ### 6) Minimal client code (Python + Node) for the README
500
+
501
+ ```python
502
+ from sentence_transformers import SentenceTransformer, util
503
+ m = SentenceTransformer("MaliosDark/sofia-embedding-v1")
504
+ a, b = "A quick brown fox", "The fast brown fox"
505
+ x = m.encode([a, b], normalize_embeddings=True)
506
+ print(util.cos_sim(x[0], x[1]).item())
507
+ ```
508
+
509
+ ```js
510
+ import { SentenceTransformer } from "sentence-transformers";
511
+ const m = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
512
+ const emb = await m.encode(["hello","world"], { normalize: true });
513
+ console.log(emb[0].length); // 1024
514
+ ```
515
+
516
+ ---
517
+
518
+ Want me to auto-generate a **PR-ready README** for your repo (with the YAML header + metrics block inserted)? I can drop the exact Markdown here based on your current page.
519
+
520
+ [1]: https://huggingface.co/MaliosDark/sofia-embedding-v1/tree/main "MaliosDark/sofia-embedding-v1 at main"
521
+ [2]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/2_Dense/config.json "2_Dense/config.json · MaliosDark/sofia-embedding-v1 at main"
522
+ [3]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/config_sentence_transformers.json "config_sentence_transformers.json · MaliosDark/sofia-embedding-v1 at main"