Upload CycleQD Qwen2.5-7B Math α=0.20 - 14.29% accuracy on HLE Math

813f638 verified 8 months ago

2.07 kB

license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - math
  - cycleqd
  - qwen2.5
  - merge
datasets:
  - HLE-Math
base_model:
  - Qwen/Qwen2.5-7B
  - Qwen/Qwen2.5-7B-Instruct
model-index:
  - name: CycleQD-Qwen2.5-7B-Math-Alpha020
    results:
      - task:
          type: mathematical-reasoning
          name: HLE Math
        dataset:
          name: HLE Math
          type: hle-math
        metrics:
          - type: accuracy
            value: 14.29
            name: accuracy
            verified: true
            details: 5 correct out of 35 questions

CycleQD Qwen2.5-7B Math α=0.20

This model is a CycleQD-merged version of Qwen2.5-7B, specifically tuned for mathematical reasoning tasks.

Model Details

Base Models: Qwen2.5-7B and Qwen2.5-7B-Instruct
Merge Method: Linear interpolation with α=0.20
Formula: (1-0.20) × Qwen2.5-7B + 0.20 × Qwen2.5-7B-Instruct
Created: August 14, 2025
Model Size: 7B parameters (~15GB)

Performance

This model achieved significant improvement on the HLE Math evaluation:

Accuracy: 14.29% (5/35 questions correct)
Improvement: 5x improvement from baseline (1/35 → 5/35)
Evaluation: HLE Math category
Judge Model: Qwen2.5-32B-Instruct

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("You3dimgeo/cycleqd-qwen25-7b-math-alpha020")
tokenizer = AutoTokenizer.from_pretrained("You3dimgeo/cycleqd-qwen25-7b-math-alpha020")

# Use for mathematical reasoning
prompt = "Solve the equation: 2x^2 + 5x - 3 = 0"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0]))

Training/Merge Details

This model was created using the CycleQD (Cycle Quality-Diversity) approach:

Started with Qwen2.5-7B base model
Applied CycleQD optimization for mathematical reasoning
Merged with instruction-tuned variant using α=0.20
Evaluated on HLE Math benchmark

License

This model is licensed under Apache 2.0, following the original Qwen2.5 license.