open-llama-3b-openthought-mid-4bit

Merged and 4-bit quantized OpenLLaMA 3B v2 fine-tuned on OpenThoughts-114k reasoning data.

How it was made

  1. Base: openlm-research/open_llama_3b_v2
  2. QLoRA fine-tuned on open-thoughts/OpenThoughts-114k
  3. Merged LoRA into base (16bit)
  4. Quantized to 4-bit NF4

Training Data

  • open-thoughts/OpenThoughts-114k (DeepSeek-R1 reasoning traces)
  • 10,582 samples after filtering (<= 2024 tokens)
  • 3 epochs, bs=128, cosine LR 2e-4
  • Final loss: 0.71

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ping98k/open-llama-3b-openthought-mid-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ping98k/open-llama-3b-openthought-mid-4bit")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is a linked list?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=False))

LoRA adapter

See ping98k/open-llama-3b-openthought-mid-lora

Downloads last month
60
Safetensors
Model size
4B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ping98k/open-llama-3b-openthought-mid-4bit

Quantized
(30)
this model
Adapters
1 model