abdou-u's picture
Update README.md
2e4cd62 verified
---
library_name: transformers
tags: [quantization, qwen3, qlora, causal-lm, low-rank-adapters, 4bit, bitsandbytes, peft, efficient-finetuning]
---
# Qwen3-0.6B Quantized with QLoRA for Reasoning Tasks
This is a 4-bit quantized version of `Qwen/Qwen3-0.6B-Base`, fine-tuned using LoRA adapters on multiple MCQA-style reasoning datasets. The model was optimized using QLoRA, a parameter-efficient tuning method with minimal memory footprint and minimal accuracy loss.
## Model Details
### Model Description
This model is:
- A quantized version of `Qwen/Qwen3-0.6B-Base` using `bitsandbytes` 4-bit NormalFloat (nf4)
- Fine-tuned using Low-Rank Adaptation (LoRA) with rank 8
- Adapted to multiple-choice reasoning datasets like AQuA-RAT and TheoremQA
- Fully compatible with Hugging Face Transformers
- **Developed by:** Ahmed Abdelmalek (EPFL CS-552 Project)
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Fine-tuned from model:** `Qwen/Qwen3-0.6B-Base`
### Model Sources
- [Repository](https://huggingface.co/Qwen/Qwen3-0.6B-Base)
## Uses
### Direct Use
You can directly use this model for MCQA-style question-answering tasks using generation.
### Out-of-Scope Use
- Not intended for open-ended generation or safety-critical applications
- Not intended for real-time or commercial deployment without evaluation
## Bias, Risks, and Limitations
- Inherits biases from its base model and training data (e.g., reasoning datasets)
- May fail on adversarial or out-of-distribution logic tasks
### Recommendations
Evaluate the model against your specific reasoning task before production use.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "your-username/MNLP_M2_quantized_model"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "Question: What is 3 + 5?
Options:
A) 6
B) 8
C) 9
D) 10
Answer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Training Data
- Processed versions of AQuA-RAT, TheoremQA, and custom MCQA datasets
- Unified into a single format with rationale-enhanced prompts
### Training Procedure
- **Precision:** fp16
- **Quantization:** 4-bit nf4 + double quant + float16 compute
- **Adapter Type:** LoRA (r=8, α=16, dropout=0.05)
- **Base model frozen**
#### Training Hyperparameters
- **Epochs:** 3
- **Batch size:** 4
- **Grad accum steps:** 2
- **Optimizer:** paged_adamw_8bit
## Evaluation
### Testing Data
Validation set with 1000 samples held out from the unified dataset.
### Metrics
- Accuracy / F1 (to be reported in evaluation phase)
## Environmental Impact
- **Hardware:** Google Colab Pro, GPU A100
- **Hours used:** ~6–7 hours
- **Carbon Emitted:** Estimated with [MLCO2](https://mlco2.github.io/impact#compute)
## Technical Specifications
### Architecture
- Qwen3-0.6B base
- 28-layer transformer with rotary positional encoding and 16 heads
### Compute Infrastructure
- **Hardware:** Colab A100 GPU, High RAM
- **Software:** Python 3.10, PyTorch 2.2.2, Transformers 4.51.3
## Contact
- **Author:** Ahmed Abdelmalek
- **Email:** ahmed.abdelmalek@epfl.ch