thaddickson commited on
Commit
45ef17c
Β·
verified Β·
1 Parent(s): 9775331

Update model card for voice-v3

Browse files
Files changed (1) hide show
  1. README.md +32 -27
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  - mergekit
8
  - model_stock
9
  - slerp
 
10
  - qwen2
11
  - healthcare
12
  - cybersecurity
@@ -26,7 +27,9 @@ base_model:
26
 
27
  ## The Model
28
 
29
- Delphi-7B is a 7.6B parameter reasoning model built on a 6-model merge of Qwen 2.5 7B specialists, trained with mixed-domain SFT on 5090s and H100 GPUs in a weekend. It combines math reasoning, chain-of-thought logic, and instruction following into a single general-purpose model β€” then seasons it with hand-written expert reasoning pairs from two decades of healthcare cybersecurity work.
 
 
30
 
31
  This is Chapter 1 of a three-part build. The general model proves the pipeline. What comes next is the point.
32
 
@@ -41,53 +44,55 @@ This is Chapter 1 of a three-part build. The general model proves the pipeline.
41
  | Qwen2.5-Math-7B-Instruct | Pure math specialist. |
42
  | Qwen2.5-7B-Instruct-Uncensored | Breadth. Says what it means. |
43
 
44
- Merged with model_stock, normalize: false, int8_mask: true, bfloat16.
45
-
46
  ## Training Pipeline
47
 
48
- **Stage 1 β€” Merge.** model_stock merge of 6 Qwen 2.5 7B specialists using mergekit. Homer base provides instruction following, DeepSeek-R1-Distill brings chain-of-thought reasoning, cybertron and Math-7B cover quantitative tasks, Stratos adds reasoning distillation, Uncensored adds breadth. Logit lens confirmed clean merge with 0% oscillation.
49
 
50
- **Stage 2 β€” SFT.** Full bf16 SFT on 8x NVIDIA H100 80GB. Mixed data: 40% math, 25% instruction, 20% reasoning, 15% general knowledge. 142K samples, lr=1e-5, 3000 steps, effective batch 64. Diagnostic callbacks every 500 steps confirmed zero catastrophic forgetting across all checkpoints.
51
 
52
- **Stage 3 β€” LoRA.** LoRA refinement (rank 32, alpha 64) targeting math reasoning and instruction following recovery. 15K samples including 5K math, 10K MMLU-Pro, and 27 hand-crafted expert reasoning pairs.
53
 
54
- **Stage 4 β€” SLERP.** SLERP merge combining SFT knowledge depth with LoRA instruction discipline. Weight sweep across four t values (0.25, 0.35, 0.45, 0.55), evaluated on divergent benchmarks to find optimal balance. Winner: t=0.55.
55
 
56
- Includes hand-crafted expert reasoning pairs carved from a literary background and a poetic mind, infused with 20 years of cyber and software experience. Diagnostic frameworks. Root cause tracing. Cross-domain problem solving. Art, science, and philosophy, pointed towards the north star of good, fairness, and reasonable cognition.
57
 
58
  ## Scores
59
 
60
- Open LLM Leaderboard v2 benchmarks. All scores from lm-eval-harness using leaderboard_* tasks, full sample, chat template applied.
 
 
 
 
 
 
 
 
 
 
61
 
62
- | Benchmark | Delphi-7B | Falcon3-7B-Instruct (#1 on LB) | Delta |
63
- |---|---|---|---|
64
- | IFEval (strict) | 0.4573 | 0.7612 | -0.304 |
65
- | BBH | ~0.52* | 0.3792 | +0.14 |
66
- | MATH Level 5 | 0.1873 | 0.3187 | -0.131 |
67
- | GPQA Diamond | ~0.31* | 0.0805 | +0.23 |
68
- | MuSR | ~0.43* | 0.2117 | +0.22 |
69
- | MMLU-Pro | 0.4198 | 0.3430 | +0.077 |
70
- | **Average** | **~0.387** | **0.3491** | **+0.038** |
71
 
72
- *BBH, GPQA, MuSR projected from SFT v1 baseline β€” full eval pending. IFEval, MATH, MMLU-Pro confirmed on SLERP t=0.55 winner.*
73
 
74
- ## Evaluation Methodology
75
 
76
- All scores from [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) using `leaderboard_*` tasks, full sample count, chat template applied via `--apply_chat_template`. IFEval score is the average of `prompt_level_strict_acc` and `inst_level_strict_acc` per leaderboard methodology.
 
 
 
 
77
 
78
  ## The Roadmap
79
 
80
- **Chapter 1: Delphi-7B** β€” General reasoning model. Open LLM Leaderboard v2. You're looking at it.
81
 
82
- **Chapter 2: Delphi-72B-Cyber** β€” Healthcare cybersecurity specialist. HIPAA risk assessment. NIST RMF mapping. Pen test analysis. FDA 510(k) submissions. Trained on domain expertise.
83
 
84
- **Chapter 3: Delphi-Health** β€” Trained on de-identified data from blended sources for targeted analysis.
85
 
86
  ## Who Built This
87
 
88
- Thaddeus Dickson. CEO of Xpio Health, Co-Founder of Pryzmatech, and CTO and CISO across healthcare cybersecurity, compliance, and healthcare domains.
89
-
90
- The Oracle of Delphi is the philosophy: don't give people answers. Teach them how to think about the problem.
91
 
92
  ## License
93
 
 
7
  - mergekit
8
  - model_stock
9
  - slerp
10
+ - lora
11
  - qwen2
12
  - healthcare
13
  - cybersecurity
 
27
 
28
  ## The Model
29
 
30
+ Delphi-7B is a 7.6B parameter reasoning model built for healthcare cybersecurity, clinical operations, and cross-domain problem solving. It combines a 6-model merge of Qwen 2.5 7B specialists with multi-stage training: LoRA refinement, SLERP blending, and voice SFT from expert reasoning pairs.
31
+
32
+ Built by Thaddeus Dickson, CEO of Xpio Health. 20 years of healthcare cybersecurity and compliance expertise baked into the training data.
33
 
34
  This is Chapter 1 of a three-part build. The general model proves the pipeline. What comes next is the point.
35
 
 
44
  | Qwen2.5-Math-7B-Instruct | Pure math specialist. |
45
  | Qwen2.5-7B-Instruct-Uncensored | Breadth. Says what it means. |
46
 
 
 
47
  ## Training Pipeline
48
 
49
+ **Stage 1 β€” Merge.** model_stock merge of 6 specialists using mergekit. Homer base provides instruction following, DeepSeek-R1-Distill brings chain-of-thought, cybertron and Math-7B cover quantitative tasks.
50
 
51
+ **Stage 2 β€” LoRA.** Two rounds of LoRA refinement (rank 32, alpha 64) on 8x NVIDIA H100 80GB. Round 1: 5K math samples. Round 2: 5K math + 10K MMLU-Pro + 27 expert reasoning pairs. Preserved IFEval while improving MATH and MMLU-Pro.
52
 
53
+ **Stage 3 β€” SLERP.** Blended full-SFT knowledge model (142K mixed samples, 3000 steps on H100s) with LoRA-refined model. Weight sweep across t=0.25, 0.35, 0.45, 0.55. Winner: t=0.55 β€” best IFEval + MATH + MMLU-Pro balance.
54
 
55
+ **Stage 4 β€” Voice SFT.** QLoRA on RTX 5090. 308 hand-crafted domain examples teaching direct, specific, no-hedging responses that name exact standards (45 CFR citations, NIST SP references, CARC codes). Combined with 530 Claude-generated IFEval constraint-following examples. This stage transformed the model from generic Qwen output to domain-expert voice.
56
 
57
+ Expert reasoning pairs carved from a literary background and a poetic mind, infused with 20 years of cyber and software experience. Diagnostic frameworks. Root cause tracing. Cross-domain problem solving.
58
 
59
  ## Scores
60
 
61
+ Open LLM Leaderboard v2 benchmarks (lm-eval-harness, leaderboard_* tasks, chat template applied):
62
+
63
+ | Benchmark | Score |
64
+ |---|---|
65
+ | IFEval (prompt strict) | 0.500 |
66
+ | IFEval (inst strict) | 0.605 |
67
+ | MATH Hard | 0.187 |
68
+ | MMLU-Pro | 0.420 |
69
+ | BBH | ~0.48 |
70
+ | GPQA Diamond | ~0.31 |
71
+ | MuSR | ~0.37 |
72
 
73
+ IFEval, MATH, MMLU-Pro from full eval on SLERP t=0.55 base. Voice SFT improved IFEval from 0.39 to 0.50 prompt strict. BBH, GPQA, MuSR from LoRA R1 eval.
 
 
 
 
 
 
 
 
74
 
75
+ ## What Makes Delphi Different
76
 
77
+ Ask ChatGPT about a HIPAA breach and you get a Wikipedia article. Ask Delphi and you get the specific CFR citations, the exact steps for breach notification, the realistic timeline, and the business impact.
78
 
79
+ Delphi names specific standards (45 CFR 164.312, NIST SP 800-66), specific tools (Mirth Connect, Prowler, Burp Suite), and specific codes (CARC CO-4, ICD-10). It connects technical findings to business impact. It does not hedge when it knows the answer. It says "I don't know" when it doesn't.
80
+
81
+ ## The Oracle Philosophy
82
+
83
+ The ancient Oracle at Delphi did not give people answers. She gave them frames through which to understand their questions. That is the design philosophy: teach people how to think about the problem, not just what the answer is.
84
 
85
  ## The Roadmap
86
 
87
+ **Chapter 1: Delphi-7B** β€” General reasoning model. You are looking at it.
88
 
89
+ **Chapter 2: Delphi-72B-Cyber** β€” Healthcare cybersecurity specialist. HIPAA, NIST RMF, pen test analysis, FDA submissions.
90
 
91
+ **Chapter 3: Delphi-Health** β€” Trained on de-identified clinical data for targeted analysis.
92
 
93
  ## Who Built This
94
 
95
+ Thaddeus Dickson. CEO of Xpio Health, CISO, 20 years in healthcare cybersecurity and compliance.
 
 
96
 
97
  ## License
98