Sovereign-Professional-V1

Multi-domain GRPO hardening across three professional verticals using real benchmarks with verifiable rewards.

Data Sources (All Verified βœ…)

Domain Dataset Size Task Type
βš–οΈ Legal nguha/legalbench 162 tasks (NeurIPS 2023) Classification, QA
πŸ’Š Medical GBaker/MedQA-USMLE-4-options 10K+ USMLE questions 4-choice MCQ
πŸ’° Finance PatronusAI/financebench 150 SEC 10-K QA Numeric extraction
πŸ’° Finance TheFinAI/flare-headlines Market classification Binary classification
πŸ“Š Structured Generated JSON work products RL-Struct 5-component

5-Signal Reward Stack

Weight Reward What it measures
0.45 Domain Correctness MCQ letter match, numeric Β±2%, classification exact
0.20 Structured Output JSON validity + schema + types + content (RL-Struct)
0.20 Professional Quality Domain terminology + evidence citation + structure
0.10 Reasoning Depth Think tags + logical connectors
0.05 Length Penalty DAPO soft overlong

Launch

python professional_hardening.py
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for moro72842/Sovereign-Professional-V1

Base model

Qwen/Qwen2.5-3B
Finetuned
(1265)
this model

Datasets used to train moro72842/Sovereign-Professional-V1