Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -6,5 +6,26 @@ colorTo: purple
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
# Welcome to Open-R1 🐳🤗
|
| 10 |
+
Open-R1 is an open initiative to replicate and extend the techniques behind DeepSeek-R1, a state-of-the-art reasoning model, in a fully transparent and collaborative way. This organization is dedicated to:
|
| 11 |
+
- Sharing datasets and models built on the path to replicating DeepSeek-R1.
|
| 12 |
+
- Fostering meaningful discussions and collaboration in the Discussion tab.
|
| 13 |
+
By working together, we aim to create a robust foundation for reasoning models that the entire research and industry community can leverage.
|
| 14 |
+
|
| 15 |
+
# Plan of attack
|
| 16 |
+
We are using the DeepSeek-R1 tech report as a guide to recreate their pipeline. The work can be broken down into three main steps:
|
| 17 |
+
|
| 18 |
+
- Replicate R1-Distill:
|
| 19 |
+
Distill a high-quality reasoning corpus from DeepSeek-R1 to create the R1-Distill models.
|
| 20 |
+
- Recreate the pure RL pipeline:
|
| 21 |
+
Reproduce the reinforcement learning process that DeepSeek used to train R1-Zero. This will likely require curating new, large-scale datasets for math, reasoning, and code.
|
| 22 |
+
- Demonstrate end-to-end training:
|
| 23 |
+
Show that we can go from a base model to RL-tuned reasoning capabilities through a multi-stage training approach, combining supervised fine-tuning (SFT) and reinforcement learning (RL).
|
| 24 |
+
|
| 25 |
+
# How to contribute
|
| 26 |
+
This project thrives on community participation! Here are some ways you can contribute:
|
| 27 |
+
- Join the Discussion: Share ideas, ask questions, and collaborate with others in the Discussion tab.
|
| 28 |
+
- Contribute Code or Datasets: Submit pull requests with datasets, models, or improvements to the pipeline.
|
| 29 |
+
- Experiment and Share Results: Try out different approaches and share your findings with the community.
|
| 30 |
+
Let’s build something impactful together. 🚀
|
| 31 |
|
|
|