Text Generation
Transformers
Safetensors
English
llama
sft
exact-loss-trainer
chatml
python
math
code
instruction-tuned
conversational
text-generation-inference
Instructions to use User01110/testing-50M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use User01110/testing-50M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("User01110/testing-50M") model = AutoModelForCausalLM.from_pretrained("User01110/testing-50M") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use User01110/testing-50M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "User01110/testing-50M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/User01110/testing-50M
- SGLang
How to use User01110/testing-50M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "User01110/testing-50M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/testing-50M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use User01110/testing-50M with Docker Model Runner:
docker model run hf.co/User01110/testing-50M
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| base_model: SupraLabs/Supra-1.5-50M-Base-exp | |
| base_model_relation: finetune | |
| datasets: | |
| - nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | |
| - microsoft/orca-math-word-problems-200k | |
| - TIGER-Lab/MathInstruct | |
| - User01110/math-curated-dataset | |
| - Programming-Language/codeagent-python | |
| - Cutecat6152/python-data-basic | |
| - flytech/python-codes-25k | |
| - QuixiAI/open-instruct-uncensored | |
| - openai/gsm8k | |
| - EleutherAI/arithmetic | |
| tags: | |
| - sft | |
| - exact-loss-trainer | |
| - chatml | |
| - python | |
| - math | |
| - code | |
| - instruction-tuned | |
| # testing-50M | |
| This is an experimental instruction SFT run from `SupraLabs/Supra-1.5-50M-Base-exp`. | |
| ## Training Setup | |
| | Field | Value | | |
| | --- | --- | | |
| | Base model | `SupraLabs/Supra-1.5-50M-Base-exp` | | |
| | Base revision | `main` | | |
| | Output repo | `User01110/testing-50M` | | |
| | Sequence length | 1024 | | |
| | Max optimizer steps | 10,000 | | |
| | Per-device batch size | 128 | | |
| | Gradient accumulation | 4 | | |
| | Sample presentations per GPU | 5,120,000 | | |
| | Max token slots per GPU | 5,242,880,000 | | |
| | Learning rate | 2.00e-04 | | |
| | Warmup steps | 100 | | |
| | Weight decay | 0.05 | | |
| | Save/push cadence | every 1,000 optimizer steps plus final | | |
| | Loss masking | assistant-span-only from step 0 | | |
| | Loss logging | printed `loss` is normalized by gradient accumulation; `raw_sum` is the Trainer sum over 4 microbatches | | |
| | Gate logging | novelty score if the loaded architecture exposes `last_gate`; otherwise `n/a` | | |
| | Prompt format | ChatML | | |
| | System prompt | `You are a helpful assistant.` | | |
| The stream randomly mixes the selected instruction, math, and coding sources. Sources are reopened after exhaustion and keep relooping until the 10,000-step training cap finishes, except `Cutecat6152/python-data-basic`, which is capped at 3 passes. | |
| Listed source rows before relooping: 3,718,915. The 10,000-step training budget presents 5,120,000 examples per GPU. | |
| ## Prompt Template Compatibility | |
| The uploaded tokenizer includes the ChatML special tokens and chat template, so inference and future SFT should not require manually adding `<|im_start|>` or `<|im_end|>`. | |
| ChatML messages are rendered as: | |
| ```text | |
| <|im_start|>system | |
| You are a helpful assistant.<|im_end|> | |
| <|im_start|>user | |
| { user_message }<|im_end|> | |
| <|im_start|>assistant | |
| ``` | |
| This script starts from the base checkpoint, adds `<|im_start|>` and `<|im_end|>` once as tokenizer special tokens, resizes embeddings once, saves the tokenizer with `chat_template`, disables automatic post-processing during pretokenized SFT, and keeps/saves the model context config with `max_position_embeddings >= 1024`. | |
| The base model is loaded with pinned revision `main` so Transformers will not silently fetch a newer remote modeling file during training. | |
| Complete inference example: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| repo = "User01110/testing-50M" | |
| tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| repo, | |
| trust_remote_code=True, | |
| torch_dtype="auto", | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "Explain what a neural network is in simple terms."}, | |
| ] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device) | |
| with torch.no_grad(): | |
| output = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| do_sample=False, | |
| temperature=0.7, | |
| top_k=40, | |
| top_p=0.95, | |
| repetition_penalty=1.2, | |
| pad_token_id=tokenizer.pad_token_id, | |
| eos_token_id=tokenizer.eos_token_id, | |
| ) | |
| new_tokens = output[0, inputs["input_ids"].shape[-1]:] | |
| text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip() | |
| print(text) | |
| ``` | |
| ## Dataset Mix | |
| | Dataset | Config | Split | Rows | Schema | Mapping | Pass policy | | |
| | --- | --- | --- | ---: | --- | --- | --- | | |
| | nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops until max_steps | | |
| | microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops until max_steps | | |
| | TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | user=instruction; assistant=output | reloops until max_steps | | |
| | User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | user=prompt; assistant=response; rebuilds clean ChatML | reloops until max_steps | | |
| | Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops until max_steps | | |
| | Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max | | |
| | flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops until max_steps | | |
| | QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops until max_steps | | |
| | openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps | | |
| | openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps | | |
| | EleutherAI/arithmetic | 10 validation subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops until max_steps | | |
| ## Notes | |
| - Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available. | |
| - Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised. | |
| - Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until `max_steps`; `python-data-basic` is dropped after 3 completed passes. | |
| - RoPE buffers and tokenizer/model load are verified during final export. | |