kinzakhan1
/

SRD_V6

@@ -1,21 +1,44 @@
 ---
-base_model: unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
 tags:
-- text-generation-inference
-- transformers
 - unsloth
-- llama
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded finetuned  model
-- **Developed by:** kinzakhan1
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+license: llama3.1
 tags:
+- reasoning
+- chain-of-thought
+- fine-tuned
 - unsloth
+- llama-3.1
+base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
+datasets:
+- custom
 ---
+# SRD_V6 - Standard Reasoning Model (Chain-of-Thought)
+## Overview
+Fine-tuned Llama 3.1 8B on Standard Reasoning Dataset (CoT) with adjusted hyperparameters.
+## Training Details
+- **Base Model**: meta-llama/Meta-Llama-3.1-8B-Instruct
+- **Training Framework**: Unsloth
+- **Dataset**: CoT Reasoning Data (CoT_reasoning_unsloth.jsonl)
+- **Examples**: 9340
+- **Training Time**: 0.33 hours
+- **Final Loss**: 1.9127
+## Hyperparameters (Adjusted for SRD)
+- Learning Rate: 2e-05 (2x higher than CRD)
+- Max Steps: 500 (more than CRD)
+- LoRA Rank: 8
+- LoRA Alpha: 16
+- LoRA Dropout: 0.05
+- Warmup: 10%
+- Max Sequence Length: 2048
+- Effective Batch Size: 8
+## Notes
+SRD dataset has longer, more complex reasoning chains which results in higher baseline loss.
+Hyperparameters adjusted accordingly.
+## Part of Experiment
+- **kinzakhan1/CRD_V6** - Clinical reasoning only
+- **kinzakhan1/SRD_V6** - Standard reasoning only (this model)
+- **kinzakhan1/MIXED_V6** - Mixed dataset