Llama-3.1-8B-Orca-Structured-LoRA

This is a custom-trained LoRA adapter for Meta-Llama-3.1-8B-Instruct. It has been fine-tuned to excel at step-by-step mathematical reasoning and to respond with a polite, highly structured, and logically organized tone.

Model Highlights

Base Model: Llama-3.1-8B-Instruct
Primary Skills: Complex math word problems, logical reasoning, structured explanations
Output Style: Helpful, and well-organized
Training Hardware: Single Tesla T4 GPU with Unsloth's 4-bit quantization optimization

Training Data

The model combines two carefully curated datasets. A balanced subset of 16,000 examples was used:

mlabonne/FineTome-100k: 8,000 examples - Teaches polite conversational tone and deep reasoning chain formatting.
microsoft/orca-math-word-problems-200k: 8,000 examples - Provides step-by-step mathematical problem-solving capabilities.

How to Use (Inference)

Run this model easily with the unsloth library. It automatically downloads the base model and applies this LoRA adapter.

1. Install Dependencies

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes

2. Run the Model

from unsloth import FastLanguageModel

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "x0root/Llama-3.1-8B-Orca-Structured-LoRA",
    max_seq_length = 1024,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Create prompt
messages = [
    {"role": "system", "content": "You are a highly intelligent, polite AI assistant. Always think step-by-step and structure your answers beautifully."},
    {"role": "user", "content": "A store sells apples for $1.20 each and bananas for $0.50 each. If I buy 4 apples and 6 bananas, and I pay with a $20 bill, how much change should I receive? Please explain your reasoning clearly."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

# Generate response
outputs = model.generate(
    input_ids = inputs,
    max_new_tokens = 512,
    use_cache = True,
    temperature = 0.3,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("assistant\n")[-1])

Technical Specifications

Training Hyperparameters

Fine-tuned using Supervised Fine-Tuning (SFT) with the following configuration:

LoRA Rank (r): 16
LoRA Alpha: 16
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length: 1024 tokens
Batch Size: 1 (per device)
Gradient Accumulation Steps: 8
Effective Batch Size: 8
Learning Rate: 2e-4
Optimizer: paged_adamw_8bit
Max Steps: 2000
Warmup Steps: 50
Weight Decay: 0.01

Frameworks

Unsloth (2x faster, memory-efficient training)
Hugging Face Transformers & TRL
PyTorch

Downloads last month: 7

Model tree for x0root/Llama-3.1-8B-Orca-Structured-LoRA

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

Adapter

(100)

this model

x0root
/

Llama-3.1-8B-Orca-Structured-LoRA