Text Generation
Transformers
Safetensors
English
llama
sft
exact-loss-trainer
chatml
python
math
code
instruction-tuned
conversational
text-generation-inference
Instructions to use User01110/testing-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use User01110/testing-50M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("User01110/testing-50M") model = AutoModelForCausalLM.from_pretrained("User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use User01110/testing-50M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "User01110/testing-50M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/User01110/testing-50M
- SGLang
How to use User01110/testing-50M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use User01110/testing-50M with Docker Model Runner:
docker model run hf.co/User01110/testing-50M
File size: 6,458 Bytes
ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 6b49457 ab05cb6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | ---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
base_model: SupraLabs/Supra-1.5-50M-Base-exp
base_model_relation: finetune
datasets:
- nvidia/Nemotron-SFT-Instruction-Following-Chat-v2
- microsoft/orca-math-word-problems-200k
- TIGER-Lab/MathInstruct
- User01110/math-curated-dataset
- Programming-Language/codeagent-python
- Cutecat6152/python-data-basic
- flytech/python-codes-25k
- QuixiAI/open-instruct-uncensored
- openai/gsm8k
- EleutherAI/arithmetic
tags:
- sft
- exact-loss-trainer
- chatml
- python
- math
- code
- instruction-tuned
---
# testing-50M
This is an experimental instruction SFT run from `SupraLabs/Supra-1.5-50M-Base-exp`.
## Training Setup
| Field | Value |
| --- | --- |
| Base model | `SupraLabs/Supra-1.5-50M-Base-exp` |
| Base revision | `main` |
| Output repo | `User01110/testing-50M` |
| Sequence length | 1024 |
| Max optimizer steps | 10,000 |
| Per-device batch size | 128 |
| Gradient accumulation | 4 |
| Sample presentations per GPU | 5,120,000 |
| Max token slots per GPU | 5,242,880,000 |
| Learning rate | 2.00e-04 |
| Warmup steps | 100 |
| Weight decay | 0.05 |
| Save/push cadence | every 1,000 optimizer steps plus final |
| Loss masking | assistant-span-only from step 0 |
| Loss logging | printed `loss` is normalized by gradient accumulation; `raw_sum` is the Trainer sum over 4 microbatches |
| Gate logging | novelty score if the loaded architecture exposes `last_gate`; otherwise `n/a` |
| Prompt format | ChatML |
| System prompt | `You are a helpful assistant.` |
The stream randomly mixes the selected instruction, math, and coding sources. Sources are reopened after exhaustion and keep relooping until the 10,000-step training cap finishes, except `Cutecat6152/python-data-basic`, which is capped at 3 passes.
Listed source rows before relooping: 3,718,915. The 10,000-step training budget presents 5,120,000 examples per GPU.
## Prompt Template Compatibility
The uploaded tokenizer includes the ChatML special tokens and chat template, so inference and future SFT should not require manually adding `<|im_start|>` or `<|im_end|>`.
ChatML messages are rendered as:
```text
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{ user_message }<|im_end|>
<|im_start|>assistant
```
This script starts from the base checkpoint, adds `<|im_start|>` and `<|im_end|>` once as tokenizer special tokens, resizes embeddings once, saves the tokenizer with `chat_template`, disables automatic post-processing during pretokenized SFT, and keeps/saves the model context config with `max_position_embeddings >= 1024`.
The base model is loaded with pinned revision `main` so Transformers will not silently fetch a newer remote modeling file during training.
Complete inference example:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "User01110/testing-50M"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what a neural network is in simple terms."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
temperature=0.7,
top_k=40,
top_p=0.95,
repetition_penalty=1.2,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
new_tokens = output[0, inputs["input_ids"].shape[-1]:]
text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
print(text)
```
## Dataset Mix
| Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
| --- | --- | --- | ---: | --- | --- | --- |
| nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops until max_steps |
| microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops until max_steps |
| TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | user=instruction; assistant=output | reloops until max_steps |
| User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | user=prompt; assistant=response; rebuilds clean ChatML | reloops until max_steps |
| Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops until max_steps |
| Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max |
| flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops until max_steps |
| QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops until max_steps |
| openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
| openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
| EleutherAI/arithmetic | 10 validation subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops until max_steps |
## Notes
- Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
- Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
- Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until `max_steps`; `python-data-basic` is dropped after 3 completed passes.
- RoPE buffers and tokenizer/model load are verified during final export.
|