Instructions to use user-anto/Axiom-Dense-380M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use user-anto/Axiom-Dense-380M-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="user-anto/Axiom-Dense-380M-Instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("user-anto/Axiom-Dense-380M-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use user-anto/Axiom-Dense-380M-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "user-anto/Axiom-Dense-380M-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/user-anto/Axiom-Dense-380M-Instruct
- SGLang
How to use user-anto/Axiom-Dense-380M-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "user-anto/Axiom-Dense-380M-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "user-anto/Axiom-Dense-380M-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use user-anto/Axiom-Dense-380M-Instruct with Docker Model Runner:
docker model run hf.co/user-anto/Axiom-Dense-380M-Instruct
Axiom-Dense-380M-Instruct
Axiom-Dense-380M-Instruct is a fine-tuned, instruction-following decoder-only causal language model. It was trained by performing Supervised Fine-Tuning (SFT) on the base model Axiom-Dense-380M-Base using instruction-response conversational data.
Quickstart
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "user-anto/Axiom-Dense-380M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cpu")
prompt = "<|im_start|>user\nWrite a short email to my team about meeting tomorrow.<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cpu")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.2,
top_p=0.85,
repetition_penalty=1.15,
no_repeat_ngram_size=3,
)
print(tokenizer.decode(outputs[0]))
Model Summary
- Model type: decoder-only Transformer (causal LM)
- Parameter count: 385,849,344
- Context length: 1,024 tokens
- Vocabulary: 100,277 (
tiktokencl100k_basewith ChatML special tokens patched) - Training objective: Autoregressive supervised fine-tuning (SFT) using target masking (only computing loss on the assistant's responses)
- Prompt format: ChatML (
<|im_start|>,<|im_end|>)
Architecture
This model preserves the same dense Transformer stack as the base model, but utilizes added special tokens to delimit speaker turns during inference.
- Hidden size: 1024
- Layers: 24
- Attention heads: 16
- KV heads: 8 (GQA)
- FFN multiplier: 2.6667 (rounded to 2816 intermediate dimension)
- Normalization: RMSNorm
- Positional encoding: RoPE (
theta=10000) - Activation: SwiGLU
- Special tokens:
<|im_start|>(100264) and<|im_end|>(100265) for ChatML boundaries
Training Data
- Source dataset:
HuggingFaceTB/smol-smoltalk - Local dataset path during training:
data/smol-smoltalk - SFT targets: Computes loss only on assistant response tokens, masking out prompt and user tokens.
- Total training tokens: 204,802,175 (~0.205B tokens)
- Validation tokens: 197,825 tokens
SFT Training Setup
- Effective tokens per optimizer step: 319,488 (
batch_size=1,seq_len=1024,grad_accum=312) - Total optimizer steps: 641
- Optimizer: AdamW8bit (with bitsandbytes)
- LR schedule: warmup, constant phase, cosine decay
- Warmup steps: 51 steps (8% of training)
- Cosine decay phase: 102 steps (16% of training, starting at step 539)
- LR max/min: 3e-4 / 3e-5 (initial learning rate starts at 1.5e-4 during warmup)
- Weight decay: 0.1
- Precision: bfloat16
- Gradient checkpointing: enabled
Evaluation Snapshot
- Pretraining base perplexity: 18.1233
- Best observed SFT eval loss: 1.2641 at step 630
- Best observed SFT eval perplexity: 3.5398 at step 630
- Final SFT step (640) eval loss: 1.2868
- Final SFT step (640) eval perplexity: 3.6210
The SFT process successfully aligned the model to follow prompt formats and drastically reduced perplexity on conversational validation targets.
Chat Format
This model uses the standard ChatML system format. A typical chat turn looks like:
<|im_start|>user
Write a short email to my team about meeting tomorrow.<|im_end|>
<|im_start|>assistant
Subject: Meeting Tomorrow...<|im_end|>
Intended Use
- Assistant-style task completion
- Multi-turn conversational chat
- Zero-shot and few-shot instruction-following
- Educational use and custom model inference experimentation
Out-of-Scope / Limitations
- Safety-critical domains (medical, legal, financial advice)
- Deployment in production without robust safety classifiers and filters
- Handling long contexts beyond the 1,024-token limit
- Language support beyond English (which dominates the smoltalk dataset)
Tokenization
- Tokenizer:
tiktokenwithcl100k_basebase ranks - Patched special tokens:
<|endoftext|>= 100257 (EOS/PAD)<|im_start|>= 100264<|im_end|>= 100265<|endofprompt|>= 100276
- Downloads last month
- 58
Model tree for user-anto/Axiom-Dense-380M-Instruct
Base model
user-anto/Axiom-Dense-380M-Base
docker model run hf.co/user-anto/Axiom-Dense-380M-Instruct