| | ---
|
| | base_model: Qwen/Qwen2.5-0.5B-Instruct
|
| | datasets: HuggingFaceH4/Bespoke-Stratos-17k
|
| | library_name: transformers
|
| | model_name: Qwen2.5-0.5B-Open-R1-Distill
|
| | tags:
|
| | - generated_from_trainer
|
| | - open-r1
|
| | - trl
|
| | - sft
|
| | licence: license
|
| | language:
|
| | - zho
|
| | - eng
|
| | - fra
|
| | - spa
|
| | - por
|
| | - deu
|
| | - ita
|
| | - rus
|
| | - jpn
|
| | - kor
|
| | - vie
|
| | - tha
|
| | - ara
|
| | ---
|
| |
|
| | # Model Card for Qwen2.5-0.5B-Open-R1-Distill
|
| |
|
| | This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) on the [HuggingFaceH4/Bespoke-Stratos-17k](https://huggingface.co/datasets/HuggingFaceH4/Bespoke-Stratos-17k) dataset.
|
| | It has been trained using [TRL](https://github.com/huggingface/trl).
|
| |
|
| | ## Quick start
|
| |
|
| | ```python
|
| | from transformers import pipeline
|
| |
|
| | question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
| | generator = pipeline("text-generation", model="herman66/Qwen2.5-0.5B-Open-R1-Distill", device="cuda")
|
| | output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
| | print(output["generated_text"])
|
| | ```
|
| |
|
| | ## Training procedure
|
| |
|
| | [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/le6400-sgpjbg-com/huggingface/runs/kg70uger)
|
| |
|
| |
|
| | This model was trained with SFT.
|
| |
|
| | ### Framework versions
|
| |
|
| | - TRL: 0.15.0.dev0
|
| | - Transformers: 4.49.0.dev0
|
| | - Pytorch: 2.5.1
|
| | - Datasets: 3.2.0
|
| | - Tokenizers: 0.21.0
|
| |
|
| | ## Citations
|
| |
|
| |
|
| |
|
| | Cite TRL as:
|
| |
|
| | ```bibtex
|
| | @misc{vonwerra2022trl,
|
| | title = {{TRL: Transformer Reinforcement Learning}},
|
| | author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
| | year = 2020,
|
| | journal = {GitHub repository},
|
| | publisher = {GitHub},
|
| | howpublished = {\url{https://github.com/huggingface/trl}}
|
| | }
|
| | ``` |