Model Card: QLoRA Fine‑Tuned Small Language Model

🧩 Model Summary

This model is a QLoRA-fine-tuned variant of an open-source lightweight transformer (Phi-style architecture), trained entirely on Google Colab Free Tier using 4-bit quantization + PEFT to minimize compute and memory.

The project demonstrates how small LLMs can be efficiently adapted, evaluated, and deployed using a fully reproducible low-compute workflow.

📁 Repository Structure

/README.md
/config.json
/tokenizer.json
/adapter_config.json
/adapter_model.bin
/training_logs/

🛠️ Training Setup

Base Model

Architecture: Phi-style small transformer
Precision: 4-bit (NF4)
Parameters: ~300M (depending on chosen base)

Method

Fine-tuning: QLoRA (PEFT)
Optimizer: AdamW
LR: 2e-4
Batch Size: 8
Epochs: 5
Eval Strategy: Per epoch
Platform: Google Colab Free Tier (T4 GPU or TPU v2)

📊 Evaluation

=== METRIC COMPARISON === ROUGE-L: before=0.2726 after=0.2726 BLEU : before=19.9785 after=19.9744 Perplexity: before=23.67 after=3.02

Graphs (Before → After)

Notebook includes:

ROUGE-L Bar Chart
BLEU Bar Chart
Perplexity Bar Chart

📜 Dataset

Type: Small instruction–response dataset
Size: ~20–100 examples
Purpose: Rapid QLoRA testing and demonstration

📦 Intended Use

Good for:

Teaching QLoRA
Prototyping domain-specific adapters
Low-compute experimentation
Research demonstrations

Not intended for:

High-risk or production tasks
Safety-critical operations
Broad general-purpose deployment

⚠️ Limitations

Small dataset (limited generalization)
No RLHF alignment
Domain-specific behavior
Not safety-aligned

💻 Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "<your-hf-username>/<your-model-repo>"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain QLoRA in one sentence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(output[0], skip_special_tokens=True))

📤 Hugging Face Hub Workflow

The provided notebook automates:

Authentication
Repo creation
Uploading adapters
Uploading evaluation graphs
Uploading logs
Uploading this README.md

🧪 Reproducibility

Training workflow:

Load base model
Apply 4-bit quantization
Train QLoRA adapters
Evaluate baseline and fine-tuned performance
Compare metrics + graphs
Push results to HF Hub

Runtime on Google Colab Free Tier: ~40–45 minutes.

📚 Credits

Microsoft — Phi inspiration
HuggingFace — Transformers, PEFT, Accelerate
Google Colab — Compute backend
Project Author — Data, workflow, evaluation

📄 License

Follow the base model’s original license (Apache 2.0 / MIT / OpenRAIL / etc.)

🤝 Contributions

Pull requests welcome — avoid adding copyrighted or sensitive data.

Downloads last month: 1

Abdurrahmanesc
/

phi-3-mini-4k-instruct-1it