SwarmOS
/

SwarmMed-14B-v1.2

Model card Files Files and versions

xet

Community

Trustcat commited on 1 day ago

Commit

7cb8d5a

verified ·

1 Parent(s): f8c62b4

Add comprehensive model card with eval results, training details, and S&B pipeline documentation

Browse files

Files changed (1) hide show

README.md +312 -30

README.md CHANGED Viewed

@@ -1,57 +1,339 @@
 ---
 license: apache-2.0
-base_model: unsloth/Qwen2.5-14B-Instruct
 tags:
 - medical
 - healthcare
 - clinical-reasoning
-- swarm-and-bee
 - platinum-pairs
 datasets:
-- SwarmOS/SwarmMedQA
-language:
-- en
 pipeline_tag: text-generation
 ---
 # SwarmMed-14B-v1.2
-Medical AI model fine-tuned on 14,174 platinum-verified QA pairs across 80+ medical specialties.
-## Training
-- **Base**: Qwen2.5-14B-Instruct (4-bit quantized)
-- **Method**: LoRA r=128, alpha=256, all projection layers
-- **Data**: 14,174 platinum pairs (CoVe-verified + 235B rewritten)
-- **Epochs**: 3, effective batch size 32
-- **Loss**: 1.058 → 0.169 (final train), 0.223 (eval)
-- **GPU**: NVIDIA RTX PRO 6000 Blackwell 96GB
-- **Time**: 7h 25min, 2.23 kWh
-- **Rig**: swarmrails (Xeon w9-3475X, 256GB RAM)
-## Improvements over v1.1
-- **+40% more training data** (14,174 vs 10,008 pairs)
-- **Lower final loss** (0.169 vs 0.198)
-- **Broader specialty coverage** (80+ specialties)
-## Usage
 ```python
-from unsloth import FastLanguageModel
-model, tokenizer = FastLanguageModel.from_pretrained(
-    model_name="SwarmOS/SwarmMed-14B-v1.2",
-    max_seq_length=4096,
-    load_in_4bit=True,
 )
-FastLanguageModel.for_inference(model)
 ```
-## Safety
-This model is for research and educational purposes only. Not for clinical decision-making. Always consult qualified healthcare professionals.
-## Built by
-[Swarm & Bee](https://swarmandbee.com) — Last mile intelligence. Sovereign compute. Specialized models.

 ---
 license: apache-2.0
+language:
+- en
 tags:
 - medical
 - healthcare
 - clinical-reasoning
 - platinum-pairs
+- cove-verified
+- chain-of-thought
+- cardiology
+- oncology
+- neurology
+- emergency-medicine
+- psychiatry
+- pediatrics
+- pharmacology
+- endocrinology
+- radiology
+- internal-medicine
+- sft
+- lora
+- fine-tuned
+- swarm-and-bee
+base_model: Qwen/Qwen2.5-14B-Instruct
 datasets:
+- SwarmOS/SwarmMed-Platinum-500
 pipeline_tag: text-generation
+model-index:
+- name: SwarmMed-14B-v1.2
+  results:
+  - task:
+      type: text-generation
+      name: Clinical Reasoning
+    dataset:
+      name: SwarmMed Platinum Eval (50 questions, 10 specialties)
+      type: custom
+    metrics:
+    - type: custom
+      name: Composite Score
+      value: 9.64
+      verified: false
+    - type: custom
+      name: Cardiology
+      value: 11.0
+      verified: false
+    - type: custom
+      name: Pediatrics
+      value: 10.8
+      verified: false
+    - type: custom
+      name: Oncology
+      value: 10.6
+      verified: false
+    - type: custom
+      name: Internal Medicine
+      value: 10.2
+      verified: false
+    - type: custom
+      name: Emergency Medicine
+      value: 10.0
+      verified: false
 ---
 # SwarmMed-14B-v1.2
+**A 14-billion parameter medical language model trained on 14,174 independently verified clinical QA pairs across 80+ medical specialties.**
+Every training example has been fact-checked using Chain-of-Verification (CoVe) — each factual claim independently verified by a 235B parameter model without access to the original answer. No unverified data touches the training loop.
+This is the merged, ready-to-deploy version (bfloat16, 28GB). Load it with any standard `transformers` pipeline — no adapters or quantization libraries required.
+**Built by [Swarm & Bee](https://swarmandbee.com)** — sovereign compute infrastructure for specialized AI.
+---
+## Results
+Evaluated on 50 expert-crafted clinical questions across 10 specialties, scored on a 6-dimension rubric (max 15 points per question):
+| Specialty | Score | Grade |
+|-----------|-------|-------|
+| **Cardiology** | **11.0/15 (73%)** | A- |
+| **Pediatrics** | **10.8/15 (72%)** | A- |
+| **Oncology** | **10.6/15 (71%)** | B+ |
+| **Internal Medicine** | **10.2/15 (68%)** | B+ |
+| **Emergency Medicine** | **10.0/15 (67%)** | B+ |
+| Neurology | 9.4/15 (63%) | B |
+| Psychiatry | 9.0/15 (60%) | B |
+| Radiology | 9.0/15 (60%) | B |
+| Endocrinology | 8.6/15 (57%) | B- |
+| Pharmacology | 7.8/15 (52%) | C+ |
+| **Composite** | **9.64/15 (64%)** | **B+** |
+### Version Trajectory
+| Version | Training Data | Composite | Delta |
+|---------|--------------|-----------|-------|
+| v1.0 | 5,070 platinum | 7.6/15 | — |
+| v1.1 | 10,008 platinum | 9.0/15 | +1.4 |
+| **v1.2** | **14,174 platinum** | **9.64/15** | **+0.64** |
+### Scoring Rubric
+| Dimension | Max | What It Measures |
+|-----------|-----|------------------|
+| Concept Depth | 3 | Pathophysiology, mechanisms, differential diagnosis |
+| Guidelines | 3 | Current evidence-based clinical recommendations |
+| Numerical Accuracy | 3 | Drug doses, lab values, vital sign thresholds |
+| Disclaimer | 2 | Appropriate safety and consultation language |
+| Syndrome Naming | 2 | Correct medical terminology and eponyms |
+| Urgency Triage | 2 | Appropriate escalation and referral language |
+---
+## Quick Start
+### Inference with Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "SwarmOS/SwarmMed-14B-v1.2-merged"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype="auto",
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a board-certified physician. Provide evidence-based clinical reasoning with appropriate safety disclaimers."},
+    {"role": "user", "content": "A 62-year-old male presents with acute chest pain, ST elevation in leads II, III, and aVF, and troponin I of 15.2 ng/mL. BP 88/54, HR 48. What is the diagnosis and immediate management?"}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, do_sample=True)
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
+print(response)
+```
+### Inference with vLLM (Production)
+```bash
+vllm serve SwarmOS/SwarmMed-14B-v1.2-merged \
+    --max-model-len 4096 \
+    --gpu-memory-utilization 0.90
+```
 ```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
+response = client.chat.completions.create(
+    model="SwarmOS/SwarmMed-14B-v1.2-merged",
+    messages=[
+        {"role": "system", "content": "You are a board-certified physician."},
+        {"role": "user", "content": "Your clinical question here..."}
+    ],
+    temperature=0.3,
+    max_tokens=1024,
 )
+print(response.choices[0].message.content)
+```
+**Production throughput**: ~35 tokens/second on RTX PRO 6000 Blackwell (96GB).
+---
+## Training Details
+### Data Pipeline
+This model was trained exclusively on **platinum-tier** data — every training example has passed a multi-stage verification pipeline:
+```
+Medical Literature          18 Specialty Templates
+    (Harrison's,       ×     (cardiology, oncology,
+     Robbins, Katzung,        neurology, emergency,
+     Nelson's, etc.)          pharma, psych, etc.)
+         │                        │
+         └──────────┬─────────────┘
+                    │
+              ┌─────▼─────┐
+              │   GRIND    │  Generate structured clinical QA
+              └─────┬──────┘
+                    │ 24,000+ raw pairs
+              ┌─────▼──────┐
+              │    CoVe     │  Chain-of-Verification
+              │   VERIFY    │  235B checks each claim independently
+              └─────┬──────┘
+                    │ 93.6% survive verification
+         ┌──────────┼──────────┐
+      PASS (57%)  FLAG (36%)  FAIL (6.4%)
+         │      235B Rewrite     │
+         │    (verified facts)   Rejected
+         └────────┬──────────┘
+                  │
+            ┌─────▼─────┐
+            │  PLATINUM  │  14,174 verified pairs
+            │   VAULT    │  80+ specialties
+            └────────────┘
 ```
+**Key insight from our experiments**: Platinum-verified data is **4.6x more efficient** per training pair than unverified gold data. 1,191 platinum pairs outperform 5,000 gold pairs on clinical benchmarks.
+### Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| Base model | Qwen2.5-14B-Instruct |
+| Method | LoRA (PEFT) |
+| LoRA rank (r) | 128 |
+| LoRA alpha | 256 |
+| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Trainable parameters | ~2.5B of 14.8B total |
+| Training pairs | 14,174 platinum |
+| Evaluation pairs | 224 |
+| Epochs | 3 |
+| Effective batch size | 32 (8 × 4 gradient accumulation) |
+| Learning rate | 8e-5 |
+| Max sequence length | 4,096 tokens |
+| Final training loss | 0.219 |
+| Final eval loss | 0.223 |
+| Optimizer | AdamW (8-bit) |
+| Precision | bfloat16 |
+| Framework | Unsloth + TRL |
+### Compute
+| Resource | Value |
+|----------|-------|
+| GPU | NVIDIA RTX PRO 6000 Blackwell (96GB) |
+| Training time | 7 hours 25 minutes |
+| Energy | 2.23 kWh |
+| Weights hash | `sha256:7dcf97d5...` |
+---
+## The Swarm & Bee Thesis
+### Why Verified Data Matters
+The medical AI space has a quality problem. Thousands of medical QA datasets exist on HuggingFace. Most are LLM-generated, unverified, and contain hallucinations that compound through fine-tuning. A model trained on hallucinated drug doses will confidently generate hallucinated drug doses.
+Our approach inverts this: **verify first, train second.**
+The cost of verification is amortized across every model version. The platinum vault grows daily. Each new model version trains on a strictly larger, strictly cleaner dataset. The trajectory is monotonically improving.
+### How We Build
+Swarm & Bee is a sovereign compute infrastructure firm. We operate our own GPU fleet, run our own inference stack, and control the full pipeline from data harvesting to model deployment.
+**Infrastructure:**
+- Multi-node GPU cluster (RTX 3090 Ti, RTX PRO 6000 Blackwell)
+- vLLM inference serving (~35 tok/s per node)
+- 22 production services running 24/7
+- Together.ai Qwen3-235B for verification (factored CoVe)
+- Proof-of-Pair attestation with Ethereum L1 anchoring
+- On-chain agent identity (ERC-8004 #17493 on Base)
+**Data assets (as of Feb 21, 2026):**
+- 15,025 platinum-verified clinical QA pairs
+- 9,456 gold-tier pairs
+- 80+ medical specialties covered
+- 47 distinct specialty classifiers
+- Growing 24/7 across 4 compute nodes
+### The Roadmap
+| Phase | Status | Description |
+|-------|--------|-------------|
+| Phase 1 | Complete | Anchor models (7B v1-v5, initial datasets) |
+| Phase 2 | **In Progress** | Specialty depth (cardiology, ER, oncology, pharma) |
+| Phase 3 | Planned | Cross-vertical expansion (aviation, legal, finance) |
+| Phase 4 | Planned | Next-gen base models + Blackwell hardware fleet |
+**Target**: 100,000 platinum pairs. 50+ specialized models. Sovereign deployment for every vertical.
+---
+## Limitations
+- **Not a diagnostic tool.** This model is for research and development. It does not constitute medical advice and should not be used for clinical decision-making without professional oversight.
+- **English only.** Training data and clinical guidelines are primarily US/English-language. Performance on non-English queries or jurisdiction-specific guidelines is untested.
+- **Pharmacology is weakest.** The model scores 52% on pharmacology questions — drug interaction and dosing queries should be independently verified.
+- **Point-in-time knowledge.** Clinical guidelines evolve. The model reflects medical knowledge current as of February 2026.
+- **Verification reduces but does not eliminate error.** CoVe significantly reduces hallucination (Meta AI reports -77% in their paper), but no verification system is perfect.
+---
+## Training Data
+This model was trained on the Swarm & Bee platinum vault — a proprietary collection of 14,174 verified clinical QA pairs.
+A free, open-source sample of 500 pairs is available for inspection and research:
+**[SwarmMed Platinum 500](https://huggingface.co/datasets/SwarmOS/SwarmMed-Platinum-500)** — 500 CoVe-verified pairs across 25 specialties, Apache-2.0 licensed.
+### Verification Reference
+The CoVe methodology is described in:
+> Dhuliawala, S., et al. (2023). "Chain-of-Verification Reduces Hallucination in Large Language Models." *arXiv:2309.11495*. Meta AI.
+---
+## Citation
+```bibtex
+@model{swarmmed_14b_v1.2,
+  title={SwarmMed-14B-v1.2: Verified Clinical Language Model},
+  author={Swarm and Bee},
+  year={2026},
+  url={https://huggingface.co/SwarmOS/SwarmMed-14B-v1.2-merged},
+  base_model={Qwen/Qwen2.5-14B-Instruct},
+  license={Apache-2.0},
+  note={14,174 CoVe-verified platinum training pairs, 80+ specialties}
+}
+```
+## Related Resources
+| Resource | Link |
+|----------|------|
+| LoRA Adapter (2.2GB) | [SwarmMed-14B-v1.2](https://huggingface.co/SwarmOS/SwarmMed-14B-v1.2) |
+| Training Data Sample | [SwarmMed Platinum 500](https://huggingface.co/datasets/SwarmOS/SwarmMed-Platinum-500) |
+| CoVe Paper | [arXiv:2309.11495](https://arxiv.org/abs/2309.11495) |
+| Swarm & Bee | [swarmandbee.com](https://swarmandbee.com) |
+| All Models & Data | [SwarmOS on HuggingFace](https://huggingface.co/SwarmOS) |
+---
+*Last mile intelligence. Sovereign compute. Your data never leaves your rack.*
+**Swarm & Bee** | [swarmandbee.com](https://swarmandbee.com) | SwarmOS on HuggingFace