ConicCat
/

Role-mo-V2-32B

Text Generation

Model card Files Files and versions

Metrics Training metrics Community

Role-mo-V2-32B / README.md

ConicCat's picture

Update README.md

4de597c verified about 1 month ago

|

history blame contribute delete

1.05 kB

	---
	base_model: allenai/Olmo-3.1-32B-Instruct
	library_name: transformers
	model_name: Role-mo-V2-32B
	tags:
	- dpo
	- trl
	licence: license
	datasets:
	- nvidia/HelpSteer3
	- ConicCat/Lamp-P-Prompted
	- ConicCat/C2-Nemo-Delta-RLCD-Preference
	---

	# Model Card for ConicCat/Role-mo-V2-32B

	This model is a fine-tuned version of [allenai/Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) using humanline dpo to improve writing, roleplay, and chat capabilities.

	## Sampler Settings
	* Chatml template
	* I recommend .7 temp and (optionally) 1.05 rep pen.

	## Datasets

	* nvidia/HelpSteer3 to maintain general capabilites and improve chat performance.
	* ConicCat/Lamp-P-Prompted for improved prose and slop reduction.
	* A C2 based human prompt / synthetic response roleplay preference dataset.

	## Changelog
	* V0 > V1: Changed preference data construction to improve margins between the chosen and rejected responses.
	* V1 > V2: More optimal hyperparameters; lowering lr and increasing bsz allows for stable training with lower beta.