| | --- |
| | base_model: allenai/Olmo-3.1-32B-Instruct |
| | library_name: transformers |
| | model_name: Role-mo-V2-32B |
| | tags: |
| | - dpo |
| | - trl |
| | licence: license |
| | datasets: |
| | - nvidia/HelpSteer3 |
| | - ConicCat/Lamp-P-Prompted |
| | - ConicCat/C2-Nemo-Delta-RLCD-Preference |
| | --- |
| | |
| | # Model Card for ConicCat/Role-mo-V2-32B |
| |
|
| | This model is a fine-tuned version of [allenai/Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) using humanline dpo to improve writing, roleplay, and chat capabilities. |
| |
|
| | ## Sampler Settings |
| | * Chatml template |
| | * I recommend .7 temp and (optionally) 1.05 rep pen. |
| |
|
| | ## Datasets |
| |
|
| | * nvidia/HelpSteer3 to maintain general capabilites and improve chat performance. |
| | * ConicCat/Lamp-P-Prompted for improved prose and slop reduction. |
| | * A C2 based human prompt / synthetic response roleplay preference dataset. |
| |
|
| | ## Changelog |
| | * V0 > V1: Changed preference data construction to improve margins between the chosen and rejected responses. |
| | * V1 > V2: More optimal hyperparameters; lowering lr and increasing bsz allows for stable training with lower beta. |