LLaMA 3.2 3B — WikiQA Fine-tuned

A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained on the WikiQA dataset for open-domain question answering. Built using Unsloth for 2× faster training with LoRA adapters.


Quick Start

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "bnpatel01/llama-wikiqa-finetuned",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

Run Inference

alpaca_prompt = """### Instruction:
{}

### Input:
{}

### Response:
{}"""

question = "What is the capital of France?"

inputs = tokenizer(
    [alpaca_prompt.format(question, "", "")],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
answer = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()
print(answer)

Model Details

Property Value
Base Model unsloth/Llama-3.2-3B-bnb-4bit
Fine-tune Method LoRA (Low-Rank Adaptation)
LoRA Rank 16
LoRA Alpha 16
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit (load_in_4bit=True)
Max Seq Length 2048 tokens
Adapter Size ~92.8 MB
Framework Unsloth + HuggingFace PEFT
Language English
Task Open-Domain Question Answering

Dataset

Trained on the microsoft/wiki_qa dataset — a benchmark for open-domain QA using Wikipedia passages.

Split Samples (after label=1 filter)
Train 6,165
Validation 2,733
Test 20,360

Only samples with label == 1 (correct answer–question pairs) were used for training.


Training Configuration

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 5,
    num_train_epochs = 3,
    learning_rate = 2e-4,
    optim = "adamw_8bit",
)
  • Epochs: 3
  • Optimizer: AdamW 8-bit
  • Precision: bf16 (if supported), else fp16
  • Gradient checkpointing: Unsloth optimized

Prompt Format

This model uses the Alpaca instruction format:

### Instruction:
<your question here>

### Input:
<optional context, leave empty for QA>

### Response:
<model answer>

Requirements

pip install unsloth
pip install torch transformers peft

Recommended: Google Colab with T4/A100 GPU or any CUDA-capable GPU with 8GB+ VRAM.


Limitations

  • Trained only on WikiQA — best suited for factoid, Wikipedia-style questions
  • May not perform well on complex reasoning or multi-hop questions
  • Knowledge is limited to the base LLaMA 3.2 training data cutoff
  • Responses may occasionally be incorrect or hallucinated

License

This model is released under the Apache 2.0 license. The base model follows Meta's LLaMA 3.2 Community License.


Acknowledgements


Made with ❤️ by bnpatel01

Downloads last month
56
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bnpatel01/llama-wikiqa-finetuned

Adapter
(9)
this model

Dataset used to train bnpatel01/llama-wikiqa-finetuned