Text Generation
Transformers
Safetensors
English
qwen2
conversational
text-generation-inference

Improve language tag

#5
by lbourdois - opened
Files changed (1) hide show
  1. README.md +73 -61
README.md CHANGED
@@ -1,62 +1,74 @@
1
- ---
2
- library_name: transformers
3
- datasets:
4
- - codeparrot/apps
5
- - BAAI/TACO
6
- - AI-MO/NuminaMath-CoT
7
- language:
8
- - en
9
- base_model:
10
- - Qwen/Qwen2.5-32B-Instruct
11
- license: apache-2.0
12
- ---
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
-
18
- <!-- Provide a longer summary of what this model is. -->
19
-
20
- This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.
21
- Please see our [blog post](https://novasky-ai.github.io/posts/sky-t1/) for more details.
22
-
23
- - **Developed by:** NovaSky Team from Sky Computing Lab at UC Berkeley.
24
-
25
- ## Training Details
26
-
27
- ### Training Data
28
-
29
- 17K verified correct responses from Qwen/QwQ-32B-Preview on coding, math. In addition, we add the science portion from the [Still-2 paper](https://arxiv.org/pdf/2412.09413).
30
-
31
- ### Training Procedure
32
- We perform supervised fine tuning on the data, with a batch size of 96.
33
-
34
- #### Speeds
35
-
36
- We use Llama-Factory for training. On 8 H100, the training takes 19 hours with DeepSpeed Zero-3 Offload.
37
-
38
-
39
- ## Evaluation
40
- | | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ | o1-preview |
41
- |-----------------------|---------------------|--------|-------|------------|
42
- | Math500 | 82.4 | 76.2 | 85.4 | 81.4 |
43
- | AIME2024 | 43.3 | 16.7 | 50.0 | 40.0 |
44
- | LiveCodeBench-Easy | 86.3 | 84.6 | 90.7 | 92.9 |
45
- | LiveCodeBench-Medium | 56.8 | 40.8 | 56.3 | 54.9 |
46
- | LiveCodeBench-Hard | 17.9 | 9.8 | 17.1 | 16.3 |
47
- | GPQA-Diamond | 56.8 | 45.5 | 52.5 | 75.2 |
48
-
49
- ## Acknowledgement
50
- We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/). We would like to thanks the academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and [Junyang Lin](https://justinlin610.github.io/) from the [Qwen Team](https://qwenlm.github.io/).
51
-
52
- ## Citation
53
- Please considering citing our blog post if you found it useful for your research. Thank you!
54
-
55
- ```bibtex
56
- @misc{sky_t1_2025,
57
- author = {NovaSky Team},
58
- title = {Sky-T1: Fully open-source reasoning model with o1-preview performance in $450 budget},
59
- howpublished = {https://novasky-ai.github.io/posts/sky-t1},
60
- note = {Accessed: 2025-01-09},
61
- year = {2025}
 
 
 
 
 
 
 
 
 
 
 
 
62
  }
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - codeparrot/apps
5
+ - BAAI/TACO
6
+ - AI-MO/NuminaMath-CoT
7
+ language:
8
+ - zho
9
+ - eng
10
+ - fra
11
+ - spa
12
+ - por
13
+ - deu
14
+ - ita
15
+ - rus
16
+ - jpn
17
+ - kor
18
+ - vie
19
+ - tha
20
+ - ara
21
+ base_model:
22
+ - Qwen/Qwen2.5-32B-Instruct
23
+ license: apache-2.0
24
+ ---
25
+
26
+ ## Model Details
27
+
28
+ ### Model Description
29
+
30
+ <!-- Provide a longer summary of what this model is. -->
31
+
32
+ This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.
33
+ Please see our [blog post](https://novasky-ai.github.io/posts/sky-t1/) for more details.
34
+
35
+ - **Developed by:** NovaSky Team from Sky Computing Lab at UC Berkeley.
36
+
37
+ ## Training Details
38
+
39
+ ### Training Data
40
+
41
+ 17K verified correct responses from Qwen/QwQ-32B-Preview on coding, math. In addition, we add the science portion from the [Still-2 paper](https://arxiv.org/pdf/2412.09413).
42
+
43
+ ### Training Procedure
44
+ We perform supervised fine tuning on the data, with a batch size of 96.
45
+
46
+ #### Speeds
47
+
48
+ We use Llama-Factory for training. On 8 H100, the training takes 19 hours with DeepSpeed Zero-3 Offload.
49
+
50
+
51
+ ## Evaluation
52
+ | | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ | o1-preview |
53
+ |-----------------------|---------------------|--------|-------|------------|
54
+ | Math500 | 82.4 | 76.2 | 85.4 | 81.4 |
55
+ | AIME2024 | 43.3 | 16.7 | 50.0 | 40.0 |
56
+ | LiveCodeBench-Easy | 86.3 | 84.6 | 90.7 | 92.9 |
57
+ | LiveCodeBench-Medium | 56.8 | 40.8 | 56.3 | 54.9 |
58
+ | LiveCodeBench-Hard | 17.9 | 9.8 | 17.1 | 16.3 |
59
+ | GPQA-Diamond | 56.8 | 45.5 | 52.5 | 75.2 |
60
+
61
+ ## Acknowledgement
62
+ We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/). We would like to thanks the academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and [Junyang Lin](https://justinlin610.github.io/) from the [Qwen Team](https://qwenlm.github.io/).
63
+
64
+ ## Citation
65
+ Please considering citing our blog post if you found it useful for your research. Thank you!
66
+
67
+ ```bibtex
68
+ @misc{sky_t1_2025,
69
+ author = {NovaSky Team},
70
+ title = {Sky-T1: Fully open-source reasoning model with o1-preview performance in $450 budget},
71
+ howpublished = {https://novasky-ai.github.io/posts/sky-t1},
72
+ note = {Accessed: 2025-01-09},
73
+ year = {2025}
74
  }