Instructions to use User01110/testing-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use User01110/testing-50M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="User01110/testing-50M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("User01110/testing-50M")
model = AutoModelForCausalLM.from_pretrained("User01110/testing-50M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use User01110/testing-50M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "User01110/testing-50M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/User01110/testing-50M

SGLang

How to use User01110/testing-50M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "User01110/testing-50M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "User01110/testing-50M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "User01110/testing-50M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use User01110/testing-50M with Docker Model Runner:
```
docker model run hf.co/User01110/testing-50M
```

User01110 commited on 5 days ago

Commit

ab05cb6

verified ·

1 Parent(s): b9b6532

Upload checkpoint step 1,000

Browse files

Files changed (7) hide show

README.md +158 -0
chat_template.jinja +1 -0
config.json +34 -0
generation_config.json +9 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0

README.md ADDED Viewed

	@@ -0,0 +1,158 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+base_model: SupraLabs/Supra-1.5-50M-Base-exp
+base_model_relation: finetune
+datasets:
+- nvidia/Nemotron-SFT-Instruction-Following-Chat-v2
+- Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned
+- MBZUAI/LaMini-instruction
+- ketchup123/tulu-gsm8k-openmath-instruct-100k-LF
+- NecroMOnk/khan-math-linear_algebra
+- endurasolution/ron-math-dataset
+- User01110/math-curated-dataset
+- microsoft/orca-math-word-problems-200k
+- TIGER-Lab/MathInstruct
+- openai/gsm8k
+- EleutherAI/arithmetic
+- Programming-Language/codeagent-python
+- jan-hq/multiturn_programming_binarized
+- Cutecat6152/python-data-basic
+- flytech/python-codes-25k
+tags:
+- sft
+- exact-loss-trainer
+- chatml
+- python
+- math
+- code
+- instruction-tuned
+---
+# testing-50M
+This is an experimental instruction SFT run from `SupraLabs/Supra-1.5-50M-Base-exp`.
+## Training Setup
+| Field | Value |
+| --- | --- |
+| Base model | `SupraLabs/Supra-1.5-50M-Base-exp` |
+| Base revision | `main` |
+| Output repo | `User01110/testing-50M` |
+| Sequence length | 1024 |
+| Max optimizer steps | 20,000 |
+| Per-device batch size | 128 |
+| Gradient accumulation | 4 |
+| Sample presentations per GPU | 10,240,000 |
+| Max token slots per GPU | 10,485,760,000 |
+| Learning rate | 2.00e-04 |
+| Warmup steps | 100 |
+| Weight decay | 0.05 |
+| Save/push cadence | every 1,000 optimizer steps plus final |
+| Loss masking | assistant-span-only from step 0 |
+| Loss logging | printed `loss` is normalized by gradient accumulation; `raw_sum` is the Trainer sum over 4 microbatches |
+| Gate logging | novelty score if the loaded architecture exposes `last_gate`; otherwise `n/a` |
+| Prompt format | ChatML |
+| System prompt | `You are a helpful assistant.` |
+The stream randomly mixes math, coding, and conversation-heavy instruction sources. Sources are reopened after exhaustion and keep relooping until the 20,000-step training cap finishes.
+Listed source rows before relooping: 35,728,143. The 20,000-step training budget presents 10,240,000 examples per GPU.
+## Prompt Template Compatibility
+The uploaded tokenizer includes the ChatML special tokens and chat template, so inference and future SFT should not require manually adding `<|im_start|>` or `<|im_end|>`.
+ChatML messages are rendered as:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{ user_message }<|im_end|>
+<|im_start|>assistant
+```
+This script starts from the base checkpoint, adds `<|im_start|>` and `<|im_end|>` once as tokenizer special tokens, resizes embeddings once, saves the tokenizer with `chat_template`, disables automatic post-processing during pretokenized SFT, and keeps/saves the model context config with `max_position_embeddings >= 1024`.
+The base model is loaded with pinned revision `main` so Transformers will not silently fetch a newer remote modeling file during training.
+Complete inference example:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+repo = "User01110/testing-50M"
+tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    repo,
+    trust_remote_code=True,
+    torch_dtype="auto",
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Explain what a neural network is in simple terms."},
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
+with torch.no_grad():
+    output = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        do_sample=False,
+        temperature=0.7,
+        top_k=40,
+        top_p=0.95,
+        repetition_penalty=1.2,
+        pad_token_id=tokenizer.pad_token_id,
+        eos_token_id=tokenizer.eos_token_id,
+    )
+new_tokens = output[0, inputs["input_ids"].shape[-1]:]
+text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
+print(text)
+```
+## Dataset Mix
+| Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
+| --- | --- | --- | ---: | --- | --- | --- |
+| nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content}], uuid, license, used_in, reasoning | ChatML conversation turns; reasoning_off split only | reloops until max_steps |
+| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | General-Distillation | train | 187,794 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
+| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | General-Math | train | 76,727 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
+| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | MultilingualSTEM | train | 89,997 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
+| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | PHD-Science | train | 103,307 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
+| MBZUAI/LaMini-instruction | default | train | 2,585,615 | instruction, response, instruction_source | instruction -> response | reloops until max_steps |
+| ketchup123/tulu-gsm8k-openmath-instruct-100k-LF | default | train | 100,000 | conversations[{role, content}] | math conversations to ChatML turns | reloops until max_steps |
+| NecroMOnk/khan-math-linear_algebra | default | train | 1,295,000 | messages[{role, content}], topic, subtopic | math tutor messages to ChatML turns | reloops until max_steps |
+| endurasolution/ron-math-dataset | default | train | 29,226,764 | instruction, input, output | instruction + optional input -> output | reloops until max_steps |
+| User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | prompt -> response; ignores source ChatML column and rebuilds clean ChatML | reloops until max_steps |
+| microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | question -> answer | reloops until max_steps |
+| TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | instruction -> output | reloops until max_steps |
+| openai/gsm8k | main | train | 7,473 | question, answer | question -> answer | reloops until max_steps |
+| openai/gsm8k | socratic | train | 7,473 | question, answer | question -> answer | reloops until max_steps |
+| EleutherAI/arithmetic | 10 validation subsets | validation | 20,000 | context, completion | direct parquet URLs to avoid dataset-script loader failure | reloops until max_steps |
+| Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | prompt -> response | reloops until max_steps |
+| jan-hq/multiturn_programming_binarized | default | train | 100,139 | messages[{role, content}] | single/multiturn programming messages; all assistant spans labeled | reloops until max_steps |
+| Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | instruction -> response | reloops until max_steps |
+| flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | instruction + optional input -> output | reloops until max_steps |
+## Notes
+- Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
+- Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
+- Kimi assistant text has `<think>...</think>` blocks stripped before tokenization.
+- Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until `max_steps`.
+- RoPE buffers and tokenizer/model load are verified during final export.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@


1	+ {% for message in messages %}{{ '<\|im_start\|>' + message['role'] + '\n' + (message['content'] \| trim) + '<\|im_end\|>\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<\|im_start\|>assistant\n' }}{% endif %}

config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 512,
+  "initializer_range": 0.02,
+  "intermediate_size": 1408,
+  "max_position_embeddings": 5120,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 8,
+  "num_hidden_layers": 12,
+  "num_key_value_heads": 4,
+  "pad_token_id": 1,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "factor": 1.0,
+    "rope_theta": 10000.0,
+    "rope_type": "linear",
+    "type": "linear"
+  },
+  "tie_word_embeddings": true,
+  "transformers_version": "5.10.2",
+  "use_cache": false,
+  "vocab_size": 32002
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": [
+    2
+  ],
+  "pad_token_id": 1,
+  "transformers_version": "5.10.2"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:777493dc618f20fa153dc09ca84f0fb151e4f59a0593660c639200d807c20747
+size 207161232

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 1000000000,
+  "pad_token": "<pad>",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>"
+}