Model Card: QLoRA FineβTuned Small Language Model
π§© Model Summary
This model is a QLoRA-fine-tuned variant of an open-source lightweight transformer (Phi-style architecture), trained entirely on Google Colab Free Tier using 4-bit quantization + PEFT to minimize compute and memory.
The project demonstrates how small LLMs can be efficiently adapted, evaluated, and deployed using a fully reproducible low-compute workflow.
π Repository Structure
/README.md
/config.json
/tokenizer.json
/adapter_config.json
/adapter_model.bin
/training_logs/
π οΈ Training Setup
Base Model
- Architecture: Phi-style small transformer
- Precision: 4-bit (NF4)
- Parameters: ~300M (depending on chosen base)
Method
- Fine-tuning: QLoRA (PEFT)
- Optimizer: AdamW
- LR: 2e-4
- Batch Size: 8
- Epochs: 5
- Eval Strategy: Per epoch
- Platform: Google Colab Free Tier (T4 GPU or TPU v2)
π Evaluation
=== METRIC COMPARISON === ROUGE-L: before=0.2726 after=0.2726 BLEU : before=19.9785 after=19.9744 Perplexity: before=23.67 after=3.02
Graphs (Before β After)
Notebook includes:
- ROUGE-L Bar Chart
- BLEU Bar Chart
- Perplexity Bar Chart
π Dataset
- Type: Small instructionβresponse dataset
- Size: ~20β100 examples
- Purpose: Rapid QLoRA testing and demonstration
π¦ Intended Use
Good for:
- Teaching QLoRA
- Prototyping domain-specific adapters
- Low-compute experimentation
- Research demonstrations
Not intended for:
- High-risk or production tasks
- Safety-critical operations
- Broad general-purpose deployment
β οΈ Limitations
- Small dataset (limited generalization)
- No RLHF alignment
- Domain-specific behavior
- Not safety-aligned
π» Usage Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "<your-hf-username>/<your-model-repo>"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
prompt = "Explain QLoRA in one sentence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(output[0], skip_special_tokens=True))
π€ Hugging Face Hub Workflow
The provided notebook automates:
- Authentication
- Repo creation
- Uploading adapters
- Uploading evaluation graphs
- Uploading logs
- Uploading this
README.md
π§ͺ Reproducibility
Training workflow:
- Load base model
- Apply 4-bit quantization
- Train QLoRA adapters
- Evaluate baseline and fine-tuned performance
- Compare metrics + graphs
- Push results to HF Hub
Runtime on Google Colab Free Tier: ~40β45 minutes.
π Credits
- Microsoft β Phi inspiration
- HuggingFace β Transformers, PEFT, Accelerate
- Google Colab β Compute backend
- Project Author β Data, workflow, evaluation
π License
Follow the base modelβs original license (Apache 2.0 / MIT / OpenRAIL / etc.)
π€ Contributions
Pull requests welcome β avoid adding copyrighted or sensitive data.
- Downloads last month
- -