Trellis-506M-SFT

Supervised fine-tuned version of Trellis-506M, a 506M parameter LLaMA-style language model optimized for structured output tasks — JSON generation, function calling, schema compliance, and structured extraction.

Model Details

Parameter Value
Base model mdonigian/trellis-pretraining
Architecture LLaMA (LlamaForCausalLM)
Parameters ~506M
Hidden size 1,280
Layers 24
Attention heads 20 (10 KV heads, GQA 2:1)
Context length 2,048
Vocab size 50,304 (50,277 base + 6 chat tokens + padding)
Precision bfloat16

SFT Training

Dataset

Trained on mdonigian/full-structured-instruction-sft-dataset (~95k examples), assembled from:

Source Target Count Description
Glaive Function Calling v2 ~70,000 Multi-turn function calling conversations
UltraChat 200k ~30,000 General instruction-following dialogues
Hermes Function Calling v1 ~20,000 Single/multi-turn function calling + JSON mode
Synthetic JSON schema compliance ~9,000 Schema → correct JSON (generated with GPT-5-mini)
Synthetic structured extraction ~5,000 Text → structured JSON extraction (generated with GPT-5-mini)

All examples are standardized to a common chat format using custom special tokens (see below). Source-specific filtering includes deduplication, token length capping (2048), and quality validation.

Hyperparameters

Parameter Value
Epochs 3
Effective batch size 32
Learning rate 2e-5
LR schedule Cosine decay
Warmup 10% of total steps
Weight decay 0.01
Max gradient norm 1.0
Max sequence length 2,048
Optimizer AdamW (fused)
Precision bfloat16
Seed 42

Training Details

  • Framework: TRL SFTTrainer
  • Attention: Flash Attention 2
  • Compilation: torch.compile enabled
  • Loss masking: Completion-only — loss computed only on assistant response tokens, not system/user/tool tokens
  • Hardware: NVIDIA B200

Chat Format

All training data uses these special tokens:

<|system|>You are a helpful assistant that generates valid JSON.<|end|>
<|user|>Generate a user profile with name, email, and age.<|end|>
<|assistant|>{"name": "Alice Chen", "email": "alice@example.com", "age": 28}<|end|>
Token Purpose
<|system|> System prompt
<|user|> User message
<|assistant|> Assistant response
<|tool_call|> Function/tool call
<|tool_result|> Tool execution result
<|end|> End of turn

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "mdonigian/trellis-sft",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("mdonigian/trellis-sft")

prompt = """<|system|>You are a helpful assistant that generates valid JSON.<|end|>
<|user|>Generate a JSON object for a book with title, author, year, and genre.<|end|>
<|assistant|>"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Experimental Design

This model is one component of a controlled experiment comparing curated pretraining vs. standard pretraining for structured output tasks:

Model Pretraining Parameters SFT
Trellis-506M-SFT (this model) Curated 20B tokens ~506M Identical
Pythia-410M-deduped-SFT The Pile (uncurated) ~410M Identical
Pythia-1B-deduped-SFT The Pile (uncurated) ~1B Identical

All three models undergo identical SFT with the same dataset, hyperparameters, and training procedure. Post-SFT evaluation covers:

  • Tier 1: Custom structured output benchmarks (JSON schema compliance, structured extraction, classification)
  • Tier 2: General NLP benchmarks via lm_eval (HellaSwag, ARC, PIQA, Winogrande, MMLU)
  • Tier 3: Code benchmarks (HumanEval, MBPP)

Limitations

  • 506M parameters limits general knowledge and complex reasoning
  • Context length capped at 2,048 tokens
  • No safety training, RLHF, or DPO alignment
  • Optimized for structured output; general chat quality is limited

Citation

@misc{trellis-sft-2026,
  title={Trellis-506M-SFT: Supervised Fine-Tuning for Structured Output},
  author={Donigian, Matt},
  year={2026},
  url={https://huggingface.co/mdonigian/trellis-sft}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mdonigian/trellis-sft

Finetuned
(1)
this model

Dataset used to train mdonigian/trellis-sft