Llama-3.1-8B-Orca-Structured-LoRA

This is a custom-trained LoRA adapter for Meta-Llama-3.1-8B-Instruct. It has been fine-tuned to excel at step-by-step mathematical reasoning and to respond with a polite, highly structured, and logically organized tone.

Model Highlights

  • Base Model: Llama-3.1-8B-Instruct
  • Primary Skills: Complex math word problems, logical reasoning, structured explanations
  • Output Style: Helpful, and well-organized
  • Training Hardware: Single Tesla T4 GPU with Unsloth's 4-bit quantization optimization

Training Data

The model combines two carefully curated datasets. A balanced subset of 16,000 examples was used:

  1. mlabonne/FineTome-100k: 8,000 examples - Teaches polite conversational tone and deep reasoning chain formatting.
  2. microsoft/orca-math-word-problems-200k: 8,000 examples - Provides step-by-step mathematical problem-solving capabilities.

How to Use (Inference)

Run this model easily with the unsloth library. It automatically downloads the base model and applies this LoRA adapter.

1. Install Dependencies

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes

2. Run the Model

from unsloth import FastLanguageModel

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "x0root/Llama-3.1-8B-Orca-Structured-LoRA",
    max_seq_length = 1024,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Create prompt
messages = [
    {"role": "system", "content": "You are a highly intelligent, polite AI assistant. Always think step-by-step and structure your answers beautifully."},
    {"role": "user", "content": "A store sells apples for $1.20 each and bananas for $0.50 each. If I buy 4 apples and 6 bananas, and I pay with a $20 bill, how much change should I receive? Please explain your reasoning clearly."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

# Generate response
outputs = model.generate(
    input_ids = inputs,
    max_new_tokens = 512,
    use_cache = True,
    temperature = 0.3,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("assistant\n")[-1])

Technical Specifications

Training Hyperparameters

Fine-tuned using Supervised Fine-Tuning (SFT) with the following configuration:

  • LoRA Rank (r): 16
  • LoRA Alpha: 16
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Max Sequence Length: 1024 tokens
  • Batch Size: 1 (per device)
  • Gradient Accumulation Steps: 8
  • Effective Batch Size: 8
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit
  • Max Steps: 2000
  • Warmup Steps: 50
  • Weight Decay: 0.01

Frameworks

  • Unsloth (2x faster, memory-efficient training)
  • Hugging Face Transformers & TRL
  • PyTorch
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for x0root/Llama-3.1-8B-Orca-Structured-LoRA

Datasets used to train x0root/Llama-3.1-8B-Orca-Structured-LoRA