Sovereign-Professional-V1
Multi-domain GRPO hardening across three professional verticals using real benchmarks with verifiable rewards.
Data Sources (All Verified β )
| Domain | Dataset | Size | Task Type |
|---|---|---|---|
| βοΈ Legal | nguha/legalbench | 162 tasks (NeurIPS 2023) | Classification, QA |
| π Medical | GBaker/MedQA-USMLE-4-options | 10K+ USMLE questions | 4-choice MCQ |
| π° Finance | PatronusAI/financebench | 150 SEC 10-K QA | Numeric extraction |
| π° Finance | TheFinAI/flare-headlines | Market classification | Binary classification |
| π Structured | Generated | JSON work products | RL-Struct 5-component |
5-Signal Reward Stack
| Weight | Reward | What it measures |
|---|---|---|
| 0.45 | Domain Correctness | MCQ letter match, numeric Β±2%, classification exact |
| 0.20 | Structured Output | JSON validity + schema + types + content (RL-Struct) |
| 0.20 | Professional Quality | Domain terminology + evidence citation + structure |
| 0.10 | Reasoning Depth | Think tags + logical connectors |
| 0.05 | Length Penalty | DAPO soft overlong |
Launch
python professional_hardening.py
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support