Trellis-506M-SFT

Supervised fine-tuned version of Trellis-506M, a 506M parameter LLaMA-style language model optimized for structured output tasks — JSON generation, function calling, schema compliance, and structured extraction.

Model Details

Parameter	Value
Base model	mdonigian/trellis-pretraining
Architecture	LLaMA (LlamaForCausalLM)
Parameters	~506M
Hidden size	1,280
Layers	24
Attention heads	20 (10 KV heads, GQA 2:1)
Context length	2,048
Vocab size	50,304 (50,277 base + 6 chat tokens + padding)
Precision	bfloat16

SFT Training

Dataset

Trained on mdonigian/full-structured-instruction-sft-dataset (~95k examples), assembled from:

Source	Target Count	Description
Glaive Function Calling v2	~70,000	Multi-turn function calling conversations
UltraChat 200k	~30,000	General instruction-following dialogues
Hermes Function Calling v1	~20,000	Single/multi-turn function calling + JSON mode
Synthetic JSON schema compliance	~9,000	Schema → correct JSON (generated with GPT-5-mini)
Synthetic structured extraction	~5,000	Text → structured JSON extraction (generated with GPT-5-mini)

All examples are standardized to a common chat format using custom special tokens (see below). Source-specific filtering includes deduplication, token length capping (2048), and quality validation.

Hyperparameters

Parameter	Value
Epochs	3
Effective batch size	32
Learning rate	2e-5
LR schedule	Cosine decay
Warmup	10% of total steps
Weight decay	0.01
Max gradient norm	1.0
Max sequence length	2,048
Optimizer	AdamW (fused)
Precision	bfloat16
Seed	42

Training Details

Framework: TRL SFTTrainer
Attention: Flash Attention 2
Compilation: torch.compile enabled
Loss masking: Completion-only — loss computed only on assistant response tokens, not system/user/tool tokens
Hardware: NVIDIA B200

Chat Format

All training data uses these special tokens:

<|system|>You are a helpful assistant that generates valid JSON.<|end|>
<|user|>Generate a user profile with name, email, and age.<|end|>
<|assistant|>{"name": "Alice Chen", "email": "alice@example.com", "age": 28}<|end|>

Token	Purpose
`<\|system\|>`	System prompt
`<\|user\|>`	User message
`<\|assistant\|>`	Assistant response
`<\|tool_call\|>`	Function/tool call
`<\|tool_result\|>`	Tool execution result
`<\|end\|>`	End of turn

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "mdonigian/trellis-sft",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("mdonigian/trellis-sft")

prompt = """<|system|>You are a helpful assistant that generates valid JSON.<|end|>
<|user|>Generate a JSON object for a book with title, author, year, and genre.<|end|>
<|assistant|>"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Experimental Design

This model is one component of a controlled experiment comparing curated pretraining vs. standard pretraining for structured output tasks:

Model	Pretraining	Parameters	SFT
Trellis-506M-SFT (this model)	Curated 20B tokens	~506M	Identical
Pythia-410M-deduped-SFT	The Pile (uncurated)	~410M	Identical
Pythia-1B-deduped-SFT	The Pile (uncurated)	~1B	Identical

All three models undergo identical SFT with the same dataset, hyperparameters, and training procedure. Post-SFT evaluation covers:

Tier 1: Custom structured output benchmarks (JSON schema compliance, structured extraction, classification)
Tier 2: General NLP benchmarks via lm_eval (HellaSwag, ARC, PIQA, Winogrande, MMLU)
Tier 3: Code benchmarks (HumanEval, MBPP)

Limitations

506M parameters limits general knowledge and complex reasoning
Context length capped at 2,048 tokens
No safety training, RLHF, or DPO alignment
Optimized for structured output; general chat quality is limited

Citation

@misc{trellis-sft-2026,
  title={Trellis-506M-SFT: Supervised Fine-Tuning for Structured Output},
  author={Donigian, Matt},
  year={2026},
  url={https://huggingface.co/mdonigian/trellis-sft}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mdonigian/trellis-sft

Base model

mdonigian/trellis-pretraining

Finetuned

(1)

this model

mdonigian
/

trellis-sft