MaliosDark
/

sofia-embedding-v1

@@ -346,15 +346,12 @@ We welcome contributions to improve SOFIA:
 ## Hugging Face Model Card Upgrades
-Nice! It's live and loads as **MPNet + mean pooling + Dense(768→1024)** — matches your files (`modules.json`, `1_Pooling/config.json`, `2_Dense/config.json`, `sentence_bert_config.json`). ([Hugging Face][1])
-Below are **drop-in upgrades**: paste/add these files to your repo and commit.
----
-### 1) Add YAML header to the **top of README.md** (enables widgets, search, and metrics)
-```md
 ---
 library_name: sentence-transformers
 license: apache-2.0
@@ -379,108 +376,79 @@ widget:
   - text: "Hello world"
   - text: "How are you?"
 ---
 ```
-> Put that **as the very first lines** of the README, before `# SOFIA`.
----
-### 2) Add a real **license file** (Apache-2.0)
-Create `LICENSE`:
-```text
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-...
-END OF TERMS AND CONDITIONS
-```
-(Use the standard Apache-2.0 text; HF will detect it automatically.)
----
-### 3) Auto-insert **MTEB results** into README (model-index)
-Run this locally to generate metrics → it will update the README in place.
-**a) Quick eval & cache**
 ```bash
-python - <<'PY'
 from mteb import MTEB
 from sentence_transformers import SentenceTransformer
-mid = "MaliosDark/sofia-embedding-v1"
-tasks = ["STS12","STS13","STS14","STS15","STS16","STSBenchmark"]
-MTEB(tasks=tasks).run(SentenceTransformer(mid), output_folder="./mteb_out")
-print("Wrote results under ./mteb_out")
-PY
 ```
-**b) Insert a `<!-- METRICS_START --> ... <!-- METRICS_END -->` block in README**
-```md
 <!-- METRICS_START -->
 _TBD_
 <!-- METRICS_END -->
 ```
-**c) Run the injector**
-````bash
-python - <<'PY'
-import json, glob, re, pathlib, statistics as st
 from pathlib import Path
-res = []
-for j in glob.glob("mteb_out/*/*/results.json"):
-    R = json.load(open(j))
-    task = R["mteb_dataset_name"]
-    metrics = R.get("main_score", None)
-    # fallbacks
-    pearson = R.get("test", {}).get("cos_sim", {}).get("pearson", None)
-    spearman = R.get("test", {}).get("cos_sim", {}).get("spearman", None)
-    res.append((task, metrics, pearson, spearman))
-lines = ["model-index:\n- name: sofia-embedding-v1\n  results:"]
-for task, main, p, s in sorted(res):
-    m = (f"{main:.4f}" if isinstance(main,(int,float)) else "null")
-    pe= (f"{p:.4f}" if isinstance(p,(int,float)) else "null")
-    sp= (f"{s:.4f}" if isinstance(s,(int,float)) else "null")
-    lines += [
-        "  - task: {type: sts, name: STS}",
-        f"    dataset: {{name: {task}, type: mteb/{task}}}",
-        "    metrics:",
-        f"    - type: main_score\n      value: {m}",
-        f"    - type: pearson\n      value: {pe}",
-        f"    - type: spearman\n      value: {sp}",
-    ]
-block = "```\n" + "\n".join(lines) + "\n```"
-readme = Path("README.md").read_text(encoding="utf-8")
-readme = re.sub(r"<!-- METRICS_START -->.*?<!-- METRICS_END -->",
-                f"<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->",
                 readme, flags=re.S)
-Path("README.md").write_text(readme, encoding="utf-8")
-print("README updated with model-index block.")
-PY
-````
-This gives you a **valid `model-index`** section HF can parse.
----
-### 4) Lock the **inference dimension** in the card (already 1024)
-Your files show Dense out\_features=1024 and pooling mean enabled; keep that claim consistent. ([Hugging Face][2])
----
-### 5) Optional – add **prompted mode** (query/document) for retrieval
-Your `config_sentence_transformers.json` has empty prompts. Add sensible defaults:
 ```json
 {
@@ -492,31 +460,30 @@ Your `config_sentence_transformers.json` has empty prompts. Add sensible default
 }
 ```
-(Upload this file to the repo to improve zero-shot retrieval.) ([Hugging Face][3])
----
-### 6) Minimal client code (Python + Node) for the README
 ```python
 from sentence_transformers import SentenceTransformer, util
-m = SentenceTransformer("MaliosDark/sofia-embedding-v1")
-a, b = "A quick brown fox", "The fast brown fox"
-x = m.encode([a, b], normalize_embeddings=True)
-print(util.cos_sim(x[0], x[1]).item())
 ```
-```js
 import { SentenceTransformer } from "sentence-transformers";
-const m = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
-const emb = await m.encode(["hello","world"], { normalize: true });
-console.log(emb[0].length); // 1024
-```
----
-Want me to auto-generate a **PR-ready README** for your repo (with the YAML header + metrics block inserted)? I can drop the exact Markdown here based on your current page.
-[1]: https://huggingface.co/MaliosDark/sofia-embedding-v1/tree/main "MaliosDark/sofia-embedding-v1 at main"
-[2]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/2_Dense/config.json "2_Dense/config.json · MaliosDark/sofia-embedding-v1 at main"
-[3]: https://huggingface.co/MaliosDark/sofia-embedding-v1/blob/main/config_sentence_transformers.json "config_sentence_transformers.json · MaliosDark/sofia-embedding-v1 at main"

 ## Hugging Face Model Card Upgrades
+Your model is live on Hugging Face! It loads correctly as **MPNet + mean pooling + Dense(768→1024)**, matching your configuration files. Here are **drop-in upgrades** to enhance your model card with widgets, metrics, and better discoverability.
+### 1. YAML Front Matter (Required)
+Add this to the **very top** of your README.md (before the title) to enable Hugging Face features:
+```yaml
 ---
 library_name: sentence-transformers
 license: apache-2.0
   - text: "Hello world"
   - text: "How are you?"
 ---
 ```
+### 2. License File (Required)
+Create a `LICENSE` file in your repo root with the full Apache 2.0 text. Hugging Face will auto-detect it.
+### 3. MTEB Metrics Block (Recommended)
+To display performance metrics on your model card:
+**Step A: Run evaluation locally**
 ```bash
+python -c "
 from mteb import MTEB
 from sentence_transformers import SentenceTransformer
+model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
+tasks = ['STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'STSBenchmark']
+MTEB(tasks=tasks).run(model, output_folder='./mteb_results')
+"
 ```
+**Step B: Add metrics placeholder to README**
+```markdown
 <!-- METRICS_START -->
 _TBD_
 <!-- METRICS_END -->
 ```
+**Step C: Inject results automatically**
+```bash
+python -c "
+import json, glob, re
 from pathlib import Path
+results = []
+for f in glob.glob('mteb_results/*/*/results.json'):
+    data = json.load(open(f))
+    task = data['mteb_dataset_name']
+    main = data.get('main_score')
+    pearson = data.get('test', {}).get('cos_sim', {}).get('pearson')
+    spearman = data.get('test', {}).get('cos_sim', {}).get('spearman')
+    results.append((task, main, pearson, spearman))
+lines = ['model-index:', '- name: sofia-embedding-v1', '  results:']
+for task, main, p, s in sorted(results):
+    m = f'{main:.4f}' if main else 'null'
+    pe = f'{p:.4f}' if p else 'null'
+    sp = f'{s:.4f}' if s else 'null'
+    lines.extend([
+        f'  - task: {{type: sts, name: STS}}',
+        f'    dataset: {{name: {task}, type: mteb/{task}}}',
+        '    metrics:',
+        f'    - type: main_score',
+        f'      value: {m}',
+        f'    - type: pearson',
+        f'      value: {pe}',
+        f'    - type: spearman',
+        f'      value: {sp}'
+    ])
+block = '```\n' + '\n'.join(lines) + '\n```'
+readme = Path('README.md').read_text()
+readme = re.sub(r'<!-- METRICS_START -->.*?<!-- METRICS_END -->',
+                f'<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->',
                 readme, flags=re.S)
+Path('README.md').write_text(readme)
+print('Metrics injected into README!')
+"
+```
+### 4. Inference Configuration (Already Correct)
+Your model correctly outputs 1024-dimensional embeddings with mean pooling. No changes needed.
+### 5. Prompted Retrieval Mode (Optional)
+For better zero-shot retrieval, update `config_sentence_transformers.json`:
 ```json
 {
 }
 ```
+### 6. Usage Examples
+Add these minimal code snippets to your README:
+**Python:**
 ```python
 from sentence_transformers import SentenceTransformer, util
+model = SentenceTransformer("MaliosDark/sofia-embedding-v1")
+sentences = ["Hello world", "How are you?"]
+embeddings = model.encode(sentences, normalize_embeddings=True)
+similarity = util.cos_sim(embeddings[0], embeddings[1])
+print(similarity.item())  # ~0.9
 ```
+**JavaScript/Node.js:**
+```javascript
 import { SentenceTransformer } from "sentence-transformers";
+const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
+const embeddings = await model.encode(["hello", "world"], { normalize: true });
+console.log(embeddings[0].length); // 1024
+```
+### Ready-to-Use README Template
+Want a complete PR-ready README with all upgrades applied? Let me know and I'll generate it based on your current model card.
+[View on Hugging Face](https://huggingface.co/MaliosDark/sofia-embedding-v1)