Instructions to use bnpatel01/llama-wikiqa-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use bnpatel01/llama-wikiqa-finetuned with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B-bnb-4bit") model = PeftModel.from_pretrained(base_model, "bnpatel01/llama-wikiqa-finetuned") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use bnpatel01/llama-wikiqa-finetuned with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bnpatel01/llama-wikiqa-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bnpatel01/llama-wikiqa-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bnpatel01/llama-wikiqa-finetuned to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="bnpatel01/llama-wikiqa-finetuned", max_seq_length=2048, )
LLaMA 3.2 3B — WikiQA Fine-tuned
A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained on the WikiQA dataset for open-domain question answering. Built using Unsloth for 2× faster training with LoRA adapters.
Quick Start
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "bnpatel01/llama-wikiqa-finetuned",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
Run Inference
alpaca_prompt = """### Instruction:
{}
### Input:
{}
### Response:
{}"""
question = "What is the capital of France?"
inputs = tokenizer(
[alpaca_prompt.format(question, "", "")],
return_tensors="pt"
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
answer = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()
print(answer)
Model Details
| Property | Value |
|---|---|
| Base Model | unsloth/Llama-3.2-3B-bnb-4bit |
| Fine-tune Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit (load_in_4bit=True) |
| Max Seq Length | 2048 tokens |
| Adapter Size | ~92.8 MB |
| Framework | Unsloth + HuggingFace PEFT |
| Language | English |
| Task | Open-Domain Question Answering |
Dataset
Trained on the microsoft/wiki_qa dataset — a benchmark for open-domain QA using Wikipedia passages.
| Split | Samples (after label=1 filter) |
|---|---|
| Train | 6,165 |
| Validation | 2,733 |
| Test | 20,360 |
Only samples with label == 1 (correct answer–question pairs) were used for training.
Training Configuration
TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
num_train_epochs = 3,
learning_rate = 2e-4,
optim = "adamw_8bit",
)
- Epochs: 3
- Optimizer: AdamW 8-bit
- Precision: bf16 (if supported), else fp16
- Gradient checkpointing: Unsloth optimized
Prompt Format
This model uses the Alpaca instruction format:
### Instruction:
<your question here>
### Input:
<optional context, leave empty for QA>
### Response:
<model answer>
Requirements
pip install unsloth
pip install torch transformers peft
Recommended: Google Colab with T4/A100 GPU or any CUDA-capable GPU with 8GB+ VRAM.
Limitations
- Trained only on WikiQA — best suited for factoid, Wikipedia-style questions
- May not perform well on complex reasoning or multi-hop questions
- Knowledge is limited to the base LLaMA 3.2 training data cutoff
- Responses may occasionally be incorrect or hallucinated
License
This model is released under the Apache 2.0 license. The base model follows Meta's LLaMA 3.2 Community License.
Acknowledgements
- Unsloth — for making fine-tuning 2× faster
- Meta AI — for the LLaMA 3.2 base model
- Microsoft Research — for the WikiQA dataset
Made with ❤️ by bnpatel01
- Downloads last month
- 56