Model Card for Model ID

class Config: base_model: str = "MattBou00/SmolLM-toxic-finetuned" reward_model: str = "s-nlp/roberta_toxicity_classifier" dataset_name: str = "skrishna/toy-toxicity-dataset" dataset_split: str = "test" prompt_words: int = 2 # use first N words of sample max_new_tokens: int = 16 sample_temperature: float = 1.0 top_p: float = 0.9 lr: float = 2e-6 # very low LR for stability epochs: int = 1 # bump for stronger learning entropy_coef: float = 0.005 save_dir: str = "./smollm_detox_rlhf" hub_repo: str = "MattBou00/SmolLM-toxic-detox-rlhf" # change if you like

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32