File size: 3,421 Bytes
9fc0fa6
 
2e4cd62
9fc0fa6
 
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
 
 
2e4cd62
 
 
 
 
9fc0fa6
2e4cd62
 
 
 
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
 
 
2e4cd62
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
9fc0fa6
 
 
2e4cd62
 
 
 
 
 
 
9fc0fa6
2e4cd62
 
 
 
 
 
 
 
 
 
 
9fc0fa6
 
 
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
 
 
9fc0fa6
 
 
2e4cd62
 
 
 
9fc0fa6
 
 
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
2e4cd62
 
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
library_name: transformers
tags: [quantization, qwen3, qlora, causal-lm, low-rank-adapters, 4bit, bitsandbytes, peft, efficient-finetuning]
---

# Qwen3-0.6B Quantized with QLoRA for Reasoning Tasks

This is a 4-bit quantized version of `Qwen/Qwen3-0.6B-Base`, fine-tuned using LoRA adapters on multiple MCQA-style reasoning datasets. The model was optimized using QLoRA, a parameter-efficient tuning method with minimal memory footprint and minimal accuracy loss.

## Model Details

### Model Description

This model is:
- A quantized version of `Qwen/Qwen3-0.6B-Base` using `bitsandbytes` 4-bit NormalFloat (nf4)
- Fine-tuned using Low-Rank Adaptation (LoRA) with rank 8
- Adapted to multiple-choice reasoning datasets like AQuA-RAT and TheoremQA
- Fully compatible with Hugging Face Transformers

- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Fine-tuned from model:** `Qwen/Qwen3-0.6B-Base`

### Model Sources

- [Repository](https://huggingface.co/Qwen/Qwen3-0.6B-Base)

## Uses

### Direct Use

You can directly use this model for MCQA-style question-answering tasks using generation.

### Out-of-Scope Use

- Not intended for open-ended generation or safety-critical applications
- Not intended for real-time or commercial deployment without evaluation

## Bias, Risks, and Limitations

- Inherits biases from its base model and training data (e.g., reasoning datasets)
- May fail on adversarial or out-of-distribution logic tasks

### Recommendations

Evaluate the model against your specific reasoning task before production use.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "your-username/MNLP_M2_quantized_model"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

prompt = "Question: What is 3 + 5?
Options:
A) 6
B) 8
C) 9
D) 10
Answer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

- Processed versions of AQuA-RAT, TheoremQA, and custom MCQA datasets
- Unified into a single format with rationale-enhanced prompts

### Training Procedure

- **Precision:** fp16
- **Quantization:** 4-bit nf4 + double quant + float16 compute
- **Adapter Type:** LoRA (r=8, α=16, dropout=0.05)
- **Base model frozen**

#### Training Hyperparameters

- **Epochs:** 3
- **Batch size:** 4
- **Grad accum steps:** 2
- **Optimizer:** paged_adamw_8bit

## Evaluation

### Testing Data

Validation set with 1000 samples held out from the unified dataset.

### Metrics

- Accuracy / F1 (to be reported in evaluation phase)

## Environmental Impact

- **Hardware:** Google Colab Pro, GPU A100
- **Hours used:** ~6–7 hours
- **Carbon Emitted:** Estimated with [MLCO2](https://mlco2.github.io/impact#compute)

## Technical Specifications

### Architecture

- Qwen3-0.6B base
- 28-layer transformer with rotary positional encoding and 16 heads

### Compute Infrastructure

- **Hardware:** Colab A100 GPU, High RAM
- **Software:** Python 3.10, PyTorch 2.2.2, Transformers 4.51.3

## Contact

- **Author:** Ahmed Abdelmalek
- **Email:** ahmed.abdelmalek@epfl.ch