DedeProGames commited on
Commit
40a5695
·
verified ·
1 Parent(s): eaf1738

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -20,7 +20,7 @@ tags:
20
  </p>
21
 
22
  ## 1. Introduction
23
- **Nebula** is a **321M-parameter** generalist Small Reasoning Model trained on **200B+ tokens**.
24
 
25
  Nebula is designed to deliver an unusually strong balance of **memory**, **general reasoning**, **math**, and **retrieval-friendly behavior** for its size class, aiming to outperform many small models of a similar parameter range on non-code, industry-style benchmarks.
26
 
@@ -61,8 +61,7 @@ Traces use the following stenographic notation integrated into special tokens:
61
  This reasoning format is designed to remain expressive while being lightweight enough for a small model.
62
 
63
  ## 3. Fine-Tuning/RL
64
- Nebula has been successfully fine-tuned for a variety of tasks including text classification and
65
- <a href="https://x.com/darrenangle/status/1990259914602856831">poetry writing</a>.
66
 
67
  Because Nebula is a reasoning-oriented model, it is expected to train well with reinforcement learning methods such as **GRPO**, both for **verifiable tasks** (with objective rewards) and for subjective tasks using an **LLM-as-a-judge**.
68
 
 
20
  </p>
21
 
22
  ## 1. Introduction
23
+ **Nebula** is a **320M-parameter** generalist Small Reasoning Model trained on **200B+ tokens**.
24
 
25
  Nebula is designed to deliver an unusually strong balance of **memory**, **general reasoning**, **math**, and **retrieval-friendly behavior** for its size class, aiming to outperform many small models of a similar parameter range on non-code, industry-style benchmarks.
26
 
 
61
  This reasoning format is designed to remain expressive while being lightweight enough for a small model.
62
 
63
  ## 3. Fine-Tuning/RL
64
+ Nebula has been successfully fine-tuned for a variety of tasks
 
65
 
66
  Because Nebula is a reasoning-oriented model, it is expected to train well with reinforcement learning methods such as **GRPO**, both for **verifiable tasks** (with objective rewards) and for subjective tasks using an **LLM-as-a-judge**.
67