Jon Hall commited on
Commit
c2b23fd
·
1 Parent(s): 0019963

Initial commit

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -53,7 +53,7 @@ We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSee
53
  DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
54
  With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
55
  However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
56
- we introduce DeepSeek-R1, which incorporates cold-start data before RL.
57
  DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
58
  To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
59
 
 
53
  DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
54
  With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
55
  However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
56
+ we introduce DeepSeek-R1, which incorporates cold-start data before RL.
57
  DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
58
  To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
59