cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'

license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy

DylanDeep-Core-8B-DPO

A math reasoning model achieving 84.84% on GSM8K through preference optimization.

Model Details

Base: Abel-7B-002 (LLaMA-2 architecture)
Method: SFT + DPO with counterfactual reasoning
Evaluation: 8-shot majority voting

Performance

Model	GSM8K Accuracy
Abel-7B-002 (base)	79.08%
+ SFT	84.46%
+ DPO	84.84%

Training

Fine-tuned with LoRA adapters using a two-stage approach:

Supervised fine-tuning on GSM8K training set
DPO on 3,334 preference pairs with counterfactual probing

Training Code

DylanDeep-Core-8B-DPO

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:

Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD

Downloads last month: 2

Safetensors

Model size

7B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dylxnmyl/DylanDeep-Core-8B

Base model

GAIR/Abel-7B-002

Finetuned

(2)

this model

Quantizations

2 models

Dataset used to train dylxnmyl/DylanDeep-Core-8B

Evaluation results

GSM8K (8-Shot Majority Vote) on gsm8k
self-reported

84.840