MaliosDark
/

sofia-embedding-v1

@@ -337,9 +337,186 @@ We welcome contributions to improve SOFIA:
 - **Website**: [zunvra.com](https://zunvra.com)
 - **Email**: contact@zunvra.com
-- **GitHub**: [github.com/zunvra](https://github.com/MaliosDark)
 ---
 *SOFIA: Intelligent embeddings for the future of AI.*

 - **Website**: [zunvra.com](https://zunvra.com)
 - **Email**: contact@zunvra.com
+- **GitHub**: [github.com/MaliosDark](https://github.com/MaliosDark)
 ---
 *SOFIA: Intelligent embeddings for the future of AI.*
+## Hugging Face Model Card Upgrades
+Nice! It's live and loads as **MPNet + mean pooling + Dense(768→1024)** — matches your files (`modules.json`, `1_Pooling/config.json`, `2_Dense/config.json`, `sentence_bert_config.json`). ([Hugging Face][1])
+Below are **drop-in upgrades**: paste/add these files to your repo and commit.
+---
+### 1) Add YAML header to the **top of README.md** (enables widgets, search, and metrics)
+```md
+---
+library_name: sentence-transformers
+license: apache-2.0
+pipeline_tag: sentence-similarity
+tags:
+  - embeddings
+  - sentence-transformers
+  - mpnet
+  - lora
+  - triplet-loss
+  - cosine-similarity
+  - retrieval
+  - mteb
+language:
+  - en
+datasets:
+  - sentence-transformers/stsb
+  - paws
+  - banking77
+  - mteb/nq
+widget:
+  - text: "Hello world"
+  - text: "How are you?"
+---
+```
+> Put that **as the very first lines** of the README, before `# SOFIA`.
+---
+### 2) Add a real **license file** (Apache-2.0)
+Create `LICENSE`:
+```text
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+...
+END OF TERMS AND CONDITIONS
+```
+(Use the standard Apache-2.0 text; HF will detect it automatically.)
+---
+### 3) Auto-insert **MTEB results** into README (model-index)
+Run this locally to generate metrics → it will update the README in place.
+**a) Quick eval & cache**
+```bash
+python - <<'PY'
+from mteb import MTEB
+from sentence_transformers import SentenceTransformer
+mid = "MaliosDark/sofia-embedding-v1"
+tasks = ["STS12","STS13","STS14","STS15","STS16","STSBenchmark"]
+MTEB(tasks=tasks).run(SentenceTransformer(mid), output_folder="./mteb_out")
+print("Wrote results under ./mteb_out")
+PY
+```
+**b) Insert a `<!-- METRICS_START --> ... <!-- METRICS_END -->` block in README**
+```md
+<!-- METRICS_START -->
+_TBD_
+<!-- METRICS_END -->
+```
+**c) Run the injector**
+````bash
+python - <<'PY'
+import json, glob, re, pathlib, statistics as st
+from pathlib import Path
+res = []
+for j in glob.glob("mteb_out/*/*/results.json"):
+    R = json.load(open(j))
+    task = R["mteb_dataset_name"]
+    metrics = R.get("main_score", None)
+    # fallbacks
+    pearson = R.get("test", {}).get("cos_sim", {}).get("pearson", None)
+    spearman = R.get("test", {}).get("cos_sim", {}).get("spearman", None)
+    res.append((task, metrics, pearson, spearman))
+lines = ["model-index:\n- name: sofia-embedding-v1\n  results:"]
+for task, main, p, s in sorted(res):
+    m = (f"{main:.4f}" if isinstance(main,(int,float)) else "null")
+    pe= (f"{p:.4f}" if isinstance(p,(int,float)) else "null")
+    sp= (f"{s:.4f}" if isinstance(s,(int,float)) else "null")
+    lines += [
+        "  - task: {type: sts, name: STS}",
+        f"    dataset: {{name: {task}, type: mteb/{task}}}",
+        "    metrics:",
+        f"    - type: main_score\n      value: {m}",
+        f"    - type: pearson\n      value: {pe}",
+        f"    - type: spearman\n      value: {sp}",
+    ]
+block = "```\n" + "\n".join(lines) + "\n```"
+readme = Path("README.md").read_text(encoding="utf-8")
+readme = re.sub(r"<!-- METRICS_START -->.*?<!-- METRICS_END -->",
+                f"<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->",
+                readme, flags=re.S)
+Path("README.md").write_text(readme, encoding="utf-8")
+print("README updated with model-index block.")
+PY
+````
+This gives you a **valid `model-index`** section HF can parse.
+---
+### 4) Lock the **inference dimension** in the card (already 1024)
+Your files show Dense out\_features=1024 and pooling mean enabled; keep that claim consistent. ([Hugging Face][2])
+---
+### 5) Optional – add **prompted mode** (query/document) for retrieval
+Your `config_sentence_transformers.json` has empty prompts. Add sensible defaults:
+```json
+{
+  "__version__": { "sentence_transformers": "5.1.0" },
+  "model_type": "SentenceTransformer",
+  "prompts": { "query": "Query: ", "document": "Document: " },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}
+```
+(Upload this file to the repo to improve zero-shot retrieval.) ([Hugging Face][3])
+---
+### 6) Minimal client code (Python + Node) for the README
+```python
+from sentence_transformers import SentenceTransformer, util
+m = SentenceTransformer("MaliosDark/sofia-embedding-v1")
+a, b = "A quick brown fox", "The fast brown fox"
+x = m.encode([a, b], normalize_embeddings=True)
+print(util.cos_sim(x[0], x[1]).item())
+```
+```js
+import { SentenceTransformer } from "sentence-transformers";
+const m = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
+const emb = await m.encode(["hello","world"], { normalize: true });
+console.log(emb[0].length); // 1024
+```
+---
+Want me to auto-generate a **PR-ready README** for your repo (with the YAML header + metrics block inserted)? I can drop the exact Markdown here based on your current page.
+[1]: https://huggingface.co/MaliosDark/sofia-embedding-v1/tree/main "MaliosDark/sofia-embedding-v1 at main"
+[2]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/2_Dense/config.json "2_Dense/config.json · MaliosDark/sofia-embedding-v1 at main"
+[3]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/config_sentence_transformers.json "config_sentence_transformers.json · MaliosDark/sofia-embedding-v1 at main"