README.md · abdou-u/MNLP_M2_quantized

File size: 3,421 Bytes

9fc0fa6
 
2e4cd62
9fc0fa6
 
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
 
 
2e4cd62
 
 
 
 
9fc0fa6
2e4cd62
 
 
 
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
 
 
2e4cd62
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
9fc0fa6
 
 
2e4cd62
 
 
 
 
 
 
9fc0fa6
2e4cd62
 
 
 
 
 
 
 
 
 
 
9fc0fa6
 
 
 
 
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
 
 
9fc0fa6
 
 
2e4cd62
 
 
 
9fc0fa6
 
 
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
 
 
2e4cd62
 
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62
 
9fc0fa6
 
 
2e4cd62
 
9fc0fa6
2e4cd62
9fc0fa6
2e4cd62

---
library_name: transformers
tags: [quantization, qwen3, qlora, causal-lm, low-rank-adapters, 4bit, bitsandbytes, peft, efficient-finetuning]
---

# Qwen3-0.6B Quantized with QLoRA for Reasoning Tasks

This is a 4-bit quantized version of `Qwen/Qwen3-0.6B-Base`, fine-tuned using LoRA adapters on multiple MCQA-style reasoning datasets. The model was optimized using QLoRA, a parameter-efficient tuning method with minimal memory footprint and minimal accuracy loss.

## Model Details

### Model Description

This model is:
- A quantized version of `Qwen/Qwen3-0.6B-Base` using `bitsandbytes` 4-bit NormalFloat (nf4)
- Fine-tuned using Low-Rank Adaptation (LoRA) with rank 8
- Adapted to multiple-choice reasoning datasets like AQuA-RAT and TheoremQA
- Fully compatible with Hugging Face Transformers

- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Fine-tuned from model:** `Qwen/Qwen3-0.6B-Base`

### Model Sources

- [Repository](https://huggingface.co/Qwen/Qwen3-0.6B-Base)

## Uses

### Direct Use

You can directly use this model for MCQA-style question-answering tasks using generation.

### Out-of-Scope Use

- Not intended for open-ended generation or safety-critical applications
- Not intended for real-time or commercial deployment without evaluation

## Bias, Risks, and Limitations

- Inherits biases from its base model and training data (e.g., reasoning datasets)
- May fail on adversarial or out-of-distribution logic tasks

### Recommendations

Evaluate the model against your specific reasoning task before production use.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "your-username/MNLP_M2_quantized_model"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

prompt = "Question: What is 3 + 5?
Options:
A) 6
B) 8
C) 9
D) 10
Answer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

- Processed versions of AQuA-RAT, TheoremQA, and custom MCQA datasets
- Unified into a single format with rationale-enhanced prompts

### Training Procedure

- **Precision:** fp16
- **Quantization:** 4-bit nf4 + double quant + float16 compute
- **Adapter Type:** LoRA (r=8, α=16, dropout=0.05)
- **Base model frozen**

#### Training Hyperparameters

- **Epochs:** 3
- **Batch size:** 4
- **Grad accum steps:** 2
- **Optimizer:** paged_adamw_8bit

## Evaluation

### Testing Data

Validation set with 1000 samples held out from the unified dataset.

### Metrics

- Accuracy / F1 (to be reported in evaluation phase)

## Environmental Impact

- **Hardware:** Google Colab Pro, GPU A100
- **Hours used:** ~6–7 hours
- **Carbon Emitted:** Estimated with [MLCO2](https://mlco2.github.io/impact#compute)

## Technical Specifications

### Architecture

- Qwen3-0.6B base
- 28-layer transformer with rotary positional encoding and 16 heads

### Compute Infrastructure

- **Hardware:** Colab A100 GPU, High RAM
- **Software:** Python 3.10, PyTorch 2.2.2, Transformers 4.51.3

## Contact

- **Author:** Ahmed Abdelmalek
- **Email:** ahmed.abdelmalek@epfl.ch