continue_sft_bitd_lora_top26_shallow_k01

LoRA adapter trained on top of tuongvy2603/BITD_baseline — a second-stage continued SFT on the top26_shallow / k=1 data pool. Loading base + this adapter gives a model that has been SFT'd twice.

This adapter is part of a sweep over the per-prompt sample budget k used to build the continued-SFT training set from the top26 prompts (shallow variant).

Usage

Install

pip install transformers peft torch

Load and run

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Load base model
base = AutoModelForCausalLM.from_pretrained(
    "tuongvy2603/BITD_baseline",
    dtype=torch.bfloat16,
    device_map="auto",
)

# 2. Attach LoRA adapter
model = PeftModel.from_pretrained(base, "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")

# 3. Tokenizer (chat template lives here)
tok = AutoTokenizer.from_pretrained("tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")

# 4. Generate
messages = [{"role": "user", "content": "Pick open-minded or close-minded."}]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

No merging required — PeftModel injects the LoRA weights at runtime. Forward results are mathematically identical to a merged model.

Optional: merge for standalone use

If you want a single self-contained checkpoint (e.g. for llama.cpp / GGUF conversion, or serving stacks that don't load adapters):

merged = model.merge_and_unload()
merged.save_pretrained("continue_sft_bitd_lora_top26_shallow_k01_merged")
tok.save_pretrained("continue_sft_bitd_lora_top26_shallow_k01_merged")

Training details

Base model tuongvy2603/BITD_baseline
Method LoRA (PEFT)
Data pool top26_shallow, k = 1 samples / prompt
LoRA rank / alpha / dropout 16 / 32 / 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 5.0
Batch size 8 × 2 grad accum = 16 effective
Learning rate 0.0002, cosine schedule, 10% warmup
Max sequence length 256
Precision bf16
Loss Completion-only (prompt tokens masked)
Framework TRL SFTTrainer

Full resolved config: see run_config.json in this repo.

Framework versions

  • PEFT 0.19.1
  • TRL 1.4.0
  • Transformers 5.8.1
  • PyTorch 2.12.0
  • Datasets 4.8.5
  • Tokenizers 0.22.2

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01

Adapter
(21)
this model