|
|
--- |
|
|
base_model: |
|
|
- allenai/Olmo-3-7B-Instruct |
|
|
library_name: transformers |
|
|
model_name: Role-mo-V2-32B |
|
|
tags: |
|
|
- dpo |
|
|
- trl |
|
|
licence: license |
|
|
datasets: |
|
|
- nvidia/HelpSteer3 |
|
|
- ConicCat/Lamp-P-Prompted |
|
|
- ConicCat/C2-Nemo-Delta-RLCD-Preference |
|
|
--- |
|
|
|
|
|
# Model Card for ConicCat/Role-mo-V2-7B |
|
|
|
|
|
This model is a fine-tuned version of [allenai/Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) using humanline dpo to improve writing, roleplay, and chat capabilities. |
|
|
|
|
|
## Sampler Settings |
|
|
* Chatml template |
|
|
* I recommend .7 temp and (optionally) 1.05 rep pen. |
|
|
|
|
|
## Datasets |
|
|
|
|
|
* nvidia/HelpSteer3 to maintain general capabilites and improve chat performance. |
|
|
* ConicCat/Lamp-P-Prompted for improved prose and slop reduction. |
|
|
* A C2 based human prompt / synthetic response roleplay preference dataset. |
|
|
|
|
|
## Changelog |
|
|
* V0 > V1: Changed preference data construction to improve margins between the chosen and rejected responses. |
|
|
* V1 > V2: More optimal hyperparameters; lowering lr and increasing bsz allows for stable training with lower beta. |