Model Card: QLoRA Fine‑Tuned Small Language Model

🧩 Model Summary

This model is a QLoRA-fine-tuned variant of an open-source lightweight transformer (Phi-style architecture), trained entirely on Google Colab Free Tier using 4-bit quantization + PEFT to minimize compute and memory.

The project demonstrates how small LLMs can be efficiently adapted, evaluated, and deployed using a fully reproducible low-compute workflow.


πŸ“ Repository Structure

/README.md
/config.json
/tokenizer.json
/adapter_config.json
/adapter_model.bin
/training_logs/

πŸ› οΈ Training Setup

Base Model

  • Architecture: Phi-style small transformer
  • Precision: 4-bit (NF4)
  • Parameters: ~300M (depending on chosen base)

Method

  • Fine-tuning: QLoRA (PEFT)
  • Optimizer: AdamW
  • LR: 2e-4
  • Batch Size: 8
  • Epochs: 5
  • Eval Strategy: Per epoch
  • Platform: Google Colab Free Tier (T4 GPU or TPU v2)

πŸ“Š Evaluation

=== METRIC COMPARISON === ROUGE-L: before=0.2726 after=0.2726 BLEU : before=19.9785 after=19.9744 Perplexity: before=23.67 after=3.02

Graphs (Before β†’ After)

Notebook includes:

  • ROUGE-L Bar Chart
  • BLEU Bar Chart
  • Perplexity Bar Chart

πŸ“œ Dataset

  • Type: Small instruction–response dataset
  • Size: ~20–100 examples
  • Purpose: Rapid QLoRA testing and demonstration

πŸ“¦ Intended Use

Good for:

  • Teaching QLoRA
  • Prototyping domain-specific adapters
  • Low-compute experimentation
  • Research demonstrations

Not intended for:

  • High-risk or production tasks
  • Safety-critical operations
  • Broad general-purpose deployment

⚠️ Limitations

  • Small dataset (limited generalization)
  • No RLHF alignment
  • Domain-specific behavior
  • Not safety-aligned

πŸ’» Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "<your-hf-username>/<your-model-repo>"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain QLoRA in one sentence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(output[0], skip_special_tokens=True))

πŸ“€ Hugging Face Hub Workflow

The provided notebook automates:

  • Authentication
  • Repo creation
  • Uploading adapters
  • Uploading evaluation graphs
  • Uploading logs
  • Uploading this README.md

πŸ§ͺ Reproducibility

Training workflow:

  1. Load base model
  2. Apply 4-bit quantization
  3. Train QLoRA adapters
  4. Evaluate baseline and fine-tuned performance
  5. Compare metrics + graphs
  6. Push results to HF Hub

Runtime on Google Colab Free Tier: ~40–45 minutes.


πŸ“š Credits

  • Microsoft β€” Phi inspiration
  • HuggingFace β€” Transformers, PEFT, Accelerate
  • Google Colab β€” Compute backend
  • Project Author β€” Data, workflow, evaluation

πŸ“„ License

Follow the base model’s original license (Apache 2.0 / MIT / OpenRAIL / etc.)


🀝 Contributions

Pull requests welcome β€” avoid adding copyrighted or sensitive data.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Abdurrahmanesc/phi-3-mini-4k-instruct-1it