| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | # Model Card for decruz07/kellemar-DPO-7B-e |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| | Learning Rate: 5e-5, steps 300 |
| | ## Model Details |
| |
|
| | Created with beta = 0.05 |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| |
|
| |
|
| | - **Developed by:** @decruz |
| | - **Funded by [optional]:** my full-time job |
| | - **Finetuned from model [optional]:** teknium/OpenHermes-2.5-Mistral-7B |
| |
|
| |
|
| |
|
| | ## Uses |
| |
|
| | You can use this for basic inference. You could probably finetune with this if you want to. |
| |
|
| |
|
| | ## How to Get Started with the Model |
| |
|
| | You can create a space out of this, or use basic python code to call the model directly and make inferences to it. |
| |
|
| | [More Information Needed] |
| |
|
| | ## Training Details |
| |
|
| | The following was used: |
| | `training_args = TrainingArguments( |
| | per_device_train_batch_size=4, |
| | gradient_accumulation_steps=4, |
| | gradient_checkpointing=True, |
| | learning_rate=5e-5, |
| | lr_scheduler_type="cosine", |
| | max_steps=200, |
| | save_strategy="no", |
| | logging_steps=1, |
| | output_dir=new_model, |
| | optim="paged_adamw_32bit", |
| | warmup_steps=100, |
| | bf16=True, |
| | report_to="wandb", |
| | ) |
| | |
| | # Create DPO trainer |
| | dpo_trainer = DPOTrainer( |
| | model, |
| | ref_model, |
| | args=training_args, |
| | train_dataset=dataset, |
| | tokenizer=tokenizer, |
| | peft_config=peft_config, |
| | beta=0.1, |
| | max_prompt_length=1024, |
| | max_length=1536, |
| | )` |
| | |
| | ### Training Data |
| |
|
| | This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs |
| |
|
| | ### Training Procedure |
| |
|
| | Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO. |
| |
|
| | ## Model Card Authors [optional] |
| |
|
| | @decruz |
| |
|
| | ## Model Card Contact |
| |
|
| | @decruz on X/Twitter |