File size: 7,235 Bytes
4b51f9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
base_model: Qwen/Qwen2.5-1.5B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-1.5B-Instruct
- lora
- transformers
- qlora
- math-reasoning
- safety
---

# Model Card for Qwen-1.5B-Instruct (Simple QLoRA)

This model includes trained QLoRA weights, optimized on the GSM8K dataset on the simple setting, which can be combined with the base model and used to run inference and evaluation. It was developed to explore the trade-offs between math reasoning capabilities and safety guardrails.

## Model Details

### Model Description

This adapter was trained as part of a CS396 pilot project exploring "Reasoning and knowledge in LLMs." It uses QLoRA to fine-tune the Qwen 2.5 1.5B parameter instruction-tuned model. The goal is to evaluate how fine-tuning on a reasoning-heavy dataset (GSM8K) impacts the model's performance on both mathematical tasks and safety benchmarks (AILuminate).

- **Developed by:** Otto Xin and Nick Ornstein
- **Finetuned from model:** Qwen/Qwen2.5-1.5B-Instruct
- **License:** Apache 2.0 (Inherited from Qwen)

### Model Sources

- **Repository:** [cs396-pilot-project](https://github.com/ottoxin/cs396-pilot-project)
- **Paper:** *Balancing Mathematical Reasoning and Safety in QLoRA Fine-Tuning*

## Uses

### Direct Use

This adapter is intended to be loaded alongside the `Qwen/Qwen2.5-1.5B-Instruct` base model using the `peft` library. It is designed for researchers and graders evaluating the intersection of mathematical reasoning capabilities and safety decay.

### Out-of-Scope Use

This is a pilot research model and should not be deployed in production environments for either mathematical problem-solving or safety-critical applications.

## How to Get Started with the Model (For TAs / Graders)

To run this code and evaluate the model, you do not need to download the weights manually. You can dynamically load the adapter directly from the Hugging Face Hub using the `peft` library.

**1. Install dependencies:**
pip install transformers peft torch accelerate bitsandbytes

## ** example pipeline **
"""
Evaluation Pipeline: Mathematical Reasoning vs. Safety
Evaluates a QLoRA adapter on GSM8K (Math) and AILuminate (Safety).
"""

import torch
import json
import re
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from datasets import load_dataset

# ==========================================
# 1. CONFIGURATION
# ==========================================
BASE_MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER_ID = "nbso/simple_pilot_project_model" 

# File paths for saving outputs
GSM8K_OUTPUT_FILE = "gsm8k_predictions.jsonl"
AILUMINATE_OUTPUT_FILE = "ailuminate_predictions.jsonl"
AILUMINATE_INPUT_CSV = "ailuminate_test.csv" # Ensure this file is in the working directory

# ==========================================
# 2. LOAD MODEL & TOKENIZER
# ==========================================
print(f"Loading Base Model: {BASE_MODEL_ID}")
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

print(f"Attaching LoRA Adapter from: {ADAPTER_ID}")
model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()

# ==========================================
# 3. GSM8K EVALUATION (MATH REASONING)
# ==========================================
print("\n--- Starting GSM8K Evaluation ---")
# Load the official GSM8K test split from Hugging Face
gsm8k_dataset = load_dataset("openai/gsm8k", "main", split="test")

# Downsample to match the 263 examples used in the simple run
gsm8k_subset = gsm8k_dataset.select(range(min(263, len(gsm8k_dataset))))

correct_answers = 0
total_answers = len(gsm8k_subset)

with open(GSM8K_OUTPUT_FILE, "w") as f:
    for item in tqdm(gsm8k_subset, desc="Evaluating GSM8K"):
        prompt = item["question"]
        gold_answer_full = item["answer"]
        
        # Extract the final number from the gold answer
        gold_match = re.search(r"####\s*(-?\d+)", gold_answer_full)
        gold_num = gold_match.group(1) if gold_match else None
        
        # Format for Qwen Chat
        messages = [
            {"role": "system", "content": "You are a helpful assistant. Solve the math problem and end your response with '#### [final answer]'."},
            {"role": "user", "content": prompt}
        ]
        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        inputs = tokenizer(text, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
            
        response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
        
        # Extract the model's answer
        model_match = re.search(r"####\s*(-?\d+)", response)
        model_num = model_match.group(1) if model_match else None
        
        is_correct = (model_num == gold_num)
        if is_correct:
            correct_answers += 1
            
        f.write(json.dumps({
            "question": prompt,
            "gold": gold_num,
            "model_pred": model_num,
            "correct": is_correct,
            "raw_response": response
        }) + "\n")

print(f"GSM8K Accuracy: {correct_answers}/{total_answers} ({(correct_answers/total_answers)*100:.2f}%)")

# ==========================================
# 4. AILUMINATE EVALUATION (SAFETY)
# ==========================================
print("\n--- Starting AILuminate Generation ---")

try:
    ailuminate_dataset = load_dataset("csv", data_files=AILUMINATE_INPUT_CSV, split="train")
except Exception as e:
    print(f"Could not load local AILuminate CSV: {e}")
    print("Please ensure 'ailuminate_test.csv' is in the working directory.")
    ailuminate_dataset = []

if len(ailuminate_dataset) > 0:
    with open(AILUMINATE_OUTPUT_FILE, "w") as f:
        for item in tqdm(ailuminate_dataset, desc="Generating AILuminate Responses"):
            prompt = item.get("Prompt_text", item.get("prompt", ""))
            hazard_category = item.get("Hazard", "Unknown")
            
            messages = [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ]
            text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            inputs = tokenizer(text, return_tensors="pt").to(model.device)
            
            with torch.no_grad():
                outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
                
            response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
            
            f.write(json.dumps({
                "prompt": prompt,
                "hazard_category": hazard_category,
                "response": response
            }) + "\n")
            
    print(f"✅ Saved AILuminate responses to {AILUMINATE_OUTPUT_FILE}")
    print("Next Step: Pass these generated responses to the Safeguard Model to calculate the final safety score.")