LLaMA 3.2 3B — WikiQA Fine-tuned

A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained on the WikiQA dataset for open-domain question answering. Built using Unsloth for 2× faster training with LoRA adapters.

Quick Start

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "bnpatel01/llama-wikiqa-finetuned",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

Run Inference

alpaca_prompt = """### Instruction:
{}

### Input:
{}

### Response:
{}"""

question = "What is the capital of France?"

inputs = tokenizer(
    [alpaca_prompt.format(question, "", "")],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
answer = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()
print(answer)

Model Details

Property	Value
Base Model	unsloth/Llama-3.2-3B-bnb-4bit
Fine-tune Method	LoRA (Low-Rank Adaptation)
LoRA Rank	16
LoRA Alpha	16
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization	4-bit (load_in_4bit=True)
Max Seq Length	2048 tokens
Adapter Size	~92.8 MB
Framework	Unsloth + HuggingFace PEFT
Language	English
Task	Open-Domain Question Answering

Dataset

Trained on the microsoft/wiki_qa dataset — a benchmark for open-domain QA using Wikipedia passages.

Split	Samples (after label=1 filter)
Train	6,165
Validation	2,733
Test	20,360

Only samples with label == 1 (correct answer–question pairs) were used for training.

Training Configuration

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 5,
    num_train_epochs = 3,
    learning_rate = 2e-4,
    optim = "adamw_8bit",
)

Epochs: 3
Optimizer: AdamW 8-bit
Precision: bf16 (if supported), else fp16
Gradient checkpointing: Unsloth optimized

Prompt Format

This model uses the Alpaca instruction format:

### Instruction:
<your question here>

### Input:
<optional context, leave empty for QA>

### Response:
<model answer>

Requirements

pip install unsloth
pip install torch transformers peft

Recommended: Google Colab with T4/A100 GPU or any CUDA-capable GPU with 8GB+ VRAM.

Limitations

Trained only on WikiQA — best suited for factoid, Wikipedia-style questions
May not perform well on complex reasoning or multi-hop questions
Knowledge is limited to the base LLaMA 3.2 training data cutoff
Responses may occasionally be incorrect or hallucinated

License

This model is released under the Apache 2.0 license. The base model follows Meta's LLaMA 3.2 Community License.

Acknowledgements

Unsloth — for making fine-tuning 2× faster
Meta AI — for the LLaMA 3.2 base model
Microsoft Research — for the WikiQA dataset

Made with ❤️ by bnpatel01

Downloads last month: 56

Model tree for bnpatel01/llama-wikiqa-finetuned

Base model

meta-llama/Llama-3.2-3B

Quantized

unsloth/Llama-3.2-3B-bnb-4bit

Adapter

(9)

this model

bnpatel01
/

llama-wikiqa-finetuned