Instructions to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("tuongvy2603/BITD_baseline")
model = PeftModel.from_pretrained(base_model, "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")

Transformers

How to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01

SGLang

How to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01 with Docker Model Runner:
```
docker model run hf.co/tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01
```

continue_sft_bitd_lora_top26_shallow_k01

LoRA adapter trained on top of tuongvy2603/BITD_baseline — a second-stage continued SFT on the top26_shallow / k=1 data pool. Loading base + this adapter gives a model that has been SFT'd twice.

This adapter is part of a sweep over the per-prompt sample budget k used to build the continued-SFT training set from the top26 prompts (shallow variant).

Usage

Install

pip install transformers peft torch

Load and run

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Load base model
base = AutoModelForCausalLM.from_pretrained(
    "tuongvy2603/BITD_baseline",
    dtype=torch.bfloat16,
    device_map="auto",
)

# 2. Attach LoRA adapter
model = PeftModel.from_pretrained(base, "tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")

# 3. Tokenizer (chat template lives here)
tok = AutoTokenizer.from_pretrained("tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01")

# 4. Generate
messages = [{"role": "user", "content": "Pick open-minded or close-minded."}]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

No merging required — PeftModel injects the LoRA weights at runtime. Forward results are mathematically identical to a merged model.

Optional: merge for standalone use

If you want a single self-contained checkpoint (e.g. for llama.cpp / GGUF conversion, or serving stacks that don't load adapters):

merged = model.merge_and_unload()
merged.save_pretrained("continue_sft_bitd_lora_top26_shallow_k01_merged")
tok.save_pretrained("continue_sft_bitd_lora_top26_shallow_k01_merged")

Training details


Base model	`tuongvy2603/BITD_baseline`
Method	LoRA (PEFT)
Data pool	`top26_shallow`, `k = 1` samples / prompt
LoRA rank / alpha / dropout	16 / 32 / 0.05
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Epochs	5.0
Batch size	8 × 2 grad accum = 16 effective
Learning rate	0.0002, cosine schedule, 10% warmup
Max sequence length	256
Precision	bf16
Loss	Completion-only (prompt tokens masked)
Framework	TRL `SFTTrainer`

Full resolved config: see run_config.json in this repo.

Framework versions

PEFT 0.19.1
TRL 1.4.0
Transformers 5.8.1
PyTorch 2.12.0
Datasets 4.8.5
Tokenizers 0.22.2

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Downloads last month: 1

Model tree for tuongvy2603/continue_sft_bitd_lora_top26_shallow_k01

Base model

tuongvy2603/BITD_baseline

Adapter

(21)

this model