cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'

license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy

DylanDeep-Core-8B-DPO

A math reasoning model achieving 84.84% on GSM8K through preference optimization.

Model Details

  • Base: Abel-7B-002 (LLaMA-2 architecture)
  • Method: SFT + DPO with counterfactual reasoning
  • Evaluation: 8-shot majority voting

Performance

Model GSM8K Accuracy
Abel-7B-002 (base) 79.08%
+ SFT 84.46%
+ DPO 84.84%

Training

Fine-tuned with LoRA adapters using a two-stage approach:

  1. Supervised fine-tuning on GSM8K training set
  2. DPO on 3,334 preference pairs with counterfactual probing

Training Code

DylanDeep-Core-8B-DPO

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:

Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD
Downloads last month
-
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dylxnmyl/DylanDeep-Core-8B

Base model

GAIR/Abel-7B-002
Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train dylxnmyl/DylanDeep-Core-8B

Evaluation results