---
base_model: SupraLabs/Supra-1.5-50M-Base-exp
library_name: transformers
tags:
- sft
- chatml
- trl
- python
- math
- instruction-tuned
---

# supralabs-50M-testing

This is an experimental ChatML SFT run from `SupraLabs/Supra-1.5-50M-Base-exp`.

## Training Setup

| Field | Value |
| --- | --- |
| Base model | `SupraLabs/Supra-1.5-50M-Base-exp` |
| Output repo | `User01110/supralabs-50M-testing` |
| Sequence length | 1024 |
| Max optimizer steps | 10,000 |
| Per-device batch size | 128 |
| Gradient accumulation | 4 |
| Sample presentations per GPU | 5,120,000 |
| Max token slots per GPU | 5,242,880,000 |
| Learning rate | 2.00e-04 |
| Warmup steps | 100 |
| Weight decay | 0.05 |
| Save/push cadence | every 1,000 optimizer steps plus final |
| Loss mask | assistant response only |
| Chat format | ChatML |
| System prompt | `You are a helpful assistant.` |

The stream reloops datasets as needed to reach the fixed step budget. `Cutecat6152/python-data-basic` is capped at three passes because it only has 100 rows.

Unique one-pass source rows listed below: 3,667,971. First-cycle source presentations with the `python-data-basic` cap included: 3,668,171. The 20k-step training budget presents 5,120,000 examples per GPU, so larger sources are expected to reloop during training.

## ChatML Compatibility

The tokenizer is saved with:

| Token | Purpose |
| --- | --- |
| `<|im_start|>` | ChatML message start |
| `<|im_end|>` | ChatML message end |

The uploaded tokenizer includes the ChatML template, so inference and future SFT should not require manually adding these tokens again.

Example prompt:

```python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain what a neural network is in simple terms."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
```

## Dataset Mix

| Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
| --- | --- | --- | ---: | --- | --- | --- |
| nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops as needed |
| microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops as needed |
| TIGER-Lab/MathInstruct | default | train | 262,039 | instruction, output | user=instruction; assistant=output | reloops as needed |
| Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops as needed |
| Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max |
| flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops as needed |
| QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops as needed |
| openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed |
| openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops as needed |
| EleutherAI/arithmetic | 10 selected subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops as needed |

## Notes

- Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
- Nemotron is loaded from the direct `reasoning_off.jsonl` file to avoid mixing in reasoning-on schema fields.
- EleutherAI arithmetic is loaded from raw JSONL files to avoid old dataset-script loading issues.
- RoPE buffers and tokenizer/model load are verified during final export.