NovatasticRoScript
/

Atomight-V2.1-0.5B-Inference

Text Generation

reinforcement-learning

text-generation-inference

lm-evaluation-harness

Eval Results (legacy)

Model card Files Files and versions

NovatasticRoScript commited on 3 days ago

Commit

91cde09

·

verified ·

1 Parent(s): 4e76bc1

Update README.md

Files changed (1) hide show

README.md +61 -9

README.md CHANGED Viewed

@@ -1,21 +1,73 @@
 ---
 base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
 tags:
-- text-generation-inference
-- transformers
 - unsloth
-- qwen2
 license: mit
 language:
 - en
 ---
-# Uploaded finetuned model
-- **Developed by:** NovatasticRoScript
-- **License:** mit
-- **Finetuned from model :** unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
-This model was trained 2x faster with [Unsloth] and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
+library_name: transformers
+model_name: results
 tags:
+- generated_from_trainer
+- trl
+- grpo
 - unsloth
+licence: license
 license: mit
+datasets:
+- bespokelabs/Bespoke-Stratos-17k
 language:
 - en
 ---
+# Model Card for results
+This model is a fine-tuned version of [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="NovatasticRoScript/results", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
+### Framework versions
+- TRL: 1.5.0
+- Transformers: 5.9.0
+- Pytorch: 2.10.0
+- Datasets: 4.8.5
+- Tokenizers: 0.22.2
+## Citations
+Cite GRPO as:
+```bibtex
+@article{shao2024deepseekmath,
+    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
+    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
+    year         = 2024,
+    eprint       = {arXiv:2402.03300},
+}
+```
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```