Text Generation
Transformers
Safetensors
llama
sft
chatml
trl
python
math
instruction-tuned
conversational
text-generation-inference
Instructions to use User01110/supralabs-50M-testing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use User01110/supralabs-50M-testing with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="User01110/supralabs-50M-testing") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("User01110/supralabs-50M-testing") model = AutoModelForCausalLM.from_pretrained("User01110/supralabs-50M-testing") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use User01110/supralabs-50M-testing with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "User01110/supralabs-50M-testing" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/supralabs-50M-testing", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/User01110/supralabs-50M-testing
- SGLang
How to use User01110/supralabs-50M-testing with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "User01110/supralabs-50M-testing" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/supralabs-50M-testing", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "User01110/supralabs-50M-testing" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "User01110/supralabs-50M-testing", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use User01110/supralabs-50M-testing with Docker Model Runner:
docker model run hf.co/User01110/supralabs-50M-testing
| base_model: SupraLabs/Supra-1.5-50M-Base-exp | |
| library_name: transformers | |
| tags: | |
| - sft | |
| - chatml | |
| - trl | |
| - python | |
| - math | |
| - instruction-tuned | |
| # supralabs-50M-testing | |
| This is an experimental ChatML SFT run from `SupraLabs/Supra-1.5-50M-Base-exp`. | |
| ## Training Setup | |
| | Field | Value | | |
| | --- | --- | | |
| | Base model | `SupraLabs/Supra-1.5-50M-Base-exp` | | |
| | Output repo | `User01110/supralabs-50M-testing` | | |
| | Sequence length | 1024 | | |
| | Max optimizer steps | 10,000 | | |
| | Per-device batch size | 128 | | |
| | Gradient accumulation | 4 | | |
| | Sample presentations per GPU | 5,120,000 | | |
| | Max token slots per GPU | 5,242,880,000 | | |
| | Learning rate | 2.00e-04 | | |
| | Warmup steps | 100 | | |
| | Weight decay | 0.05 | | |
| | Save/push cadence | every 1,000 optimizer steps plus final | | |
| | Loss mask | assistant response only | | |
| | Chat format | ChatML | | |
| | System prompt | `You are a helpful assistant.` | | |
| The stream reloops datasets as needed to reach the fixed step budget. `Cutecat6152/python-data-basic` is capped at three passes because it only has 100 rows. | |
| Unique one-pass source rows listed below: 3,667,971. First-cycle source presentations with the `python-data-basic` cap included: 3,668,171. The 20k-step training budget presents 5,120,000 examples per GPU, so larger sources are expected to reloop during training. | |
| ## ChatML Compatibility | |
| The tokenizer is saved with: | |
| | Token | Purpose | | |
| | --- | --- | | |
| | `<|im_start|>` | ChatML message start | | |
| | `<|im_end|>` | ChatML message end | | |
| The uploaded tokenizer includes the ChatML template, so inference and future SFT should not require manually adding these tokens again. | |
| Example prompt: | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "Explain what a neural network is in simple terms."}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| ``` | |
| ## Dataset Mix | |
| | Dataset | Config | Split | Rows | Schema | Mapping | Pass policy | | |
| | --- | --- | --- | ---: | --- | --- | --- | | |
| | nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops as needed | | |
| | microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops as needed | | |
| | TIGER-Lab/MathInstruct | default | train | 262,039 | instruction, output | user=instruction; assistant=output | reloops as needed | | |
| | Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops as needed | | |
| | Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max | | |
| | flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops as needed | | |
| | QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops as needed | | |
| | openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed | | |
| | openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed | | |
| | EleutherAI/arithmetic | 10 selected subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops as needed | | |
| ## Notes | |
| - Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available. | |
| - Nemotron is loaded from the direct `reasoning_off.jsonl` file to avoid mixing in reasoning-on schema fields. | |
| - EleutherAI arithmetic is loaded from raw JSONL files to avoid old dataset-script loading issues. | |
| - RoPE buffers and tokenizer/model load are verified during final export. | |