cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'
license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy
DylanDeep-Core-8B-DPO
A math reasoning model achieving 84.84% on GSM8K through preference optimization.
Model Details
- Base: Abel-7B-002 (LLaMA-2 architecture)
- Method: SFT + DPO with counterfactual reasoning
- Evaluation: 8-shot majority voting
Performance
| Model | GSM8K Accuracy |
|---|---|
| Abel-7B-002 (base) | 79.08% |
| + SFT | 84.46% |
| + DPO | 84.84% |
Training
Fine-tuned with LoRA adapters using a two-stage approach:
- Supervised fine-tuning on GSM8K training set
- DPO on 3,334 preference pairs with counterfactual probing
Training Code
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:
Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for dylxnmyl/DylanDeep-Core-8B
Dataset used to train dylxnmyl/DylanDeep-Core-8B
Evaluation results
- GSM8K (8-Shot Majority Vote) on gsm8kself-reported84.840