Add pipeline tag, license, paper link and Github link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -18
README.md CHANGED
@@ -1,13 +1,15 @@
1
  ---
2
  library_name: transformers
 
 
3
  tags: []
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
@@ -17,20 +19,20 @@ tags: []
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
  - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
@@ -41,19 +43,19 @@ This is the model card of a 🤗 transformers model that has been pushed on the
41
 
42
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
  ### Downstream Use [optional]
47
 
48
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
 
52
  ### Out-of-Scope Use
53
 
54
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
 
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
@@ -79,7 +81,7 @@ Use the code below to get started with the model.
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
@@ -89,7 +91,6 @@ Use the code below to get started with the model.
89
 
90
  [More Information Needed]
91
 
92
-
93
  #### Training Hyperparameters
94
 
95
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
@@ -130,8 +131,6 @@ Use the code below to get started with the model.
130
 
131
  #### Summary
132
 
133
-
134
-
135
  ## Model Examination [optional]
136
 
137
  <!-- Relevant interpretability work for the model goes here -->
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ pipeline_tag: text-generation
5
  tags: []
6
  ---
7
 
8
  # Model Card for Model ID
9
 
10
+ This is the Sky-T1-32B-Preview model, as described in [LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!](https://hf.co/papers/2502.07374). The code is available at [https://github.com/NovaSky-AI/SkyThought](https://github.com/NovaSky-AI/SkyThought).
 
11
 
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
 
14
  ## Model Details
15
 
 
19
 
20
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
21
 
22
+ - **Developed by:** NovaSky AI
23
+ - **Funded by [optional]:** Berkeley Sky Computing Lab, Lambda Labs, Anyscale, and Databricks
24
+ - **Shared by [optional]:** NovaSky AI
25
+ - **Model type:** Qwen2ForCausalLM
26
+ - **Language(s) (NLP):** English
27
+ - **License:** Apache 2.0
28
+ - **Finetuned from model [optional]:** Qwen2
29
 
30
  ### Model Sources [optional]
31
 
32
  <!-- Provide the basic links for the model. -->
33
 
34
+ - **Repository:** [https://github.com/NovaSky-AI/SkyThought](https://github.com/NovaSky-AI/SkyThought)
35
+ - **Paper [optional]:** [LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!](https://hf.co/papers/2502.07374)
36
  - **Demo [optional]:** [More Information Needed]
37
 
38
  ## Uses
 
43
 
44
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
45
 
46
+ This model is intended for research purposes, specifically for exploring long chain-of-thought reasoning in large language models. It can be used for math and coding benchmarks.
47
 
48
  ### Downstream Use [optional]
49
 
50
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
51
 
52
+ This model can be fine-tuned for specific reasoning tasks or integrated into larger applications that require complex reasoning capabilities.
53
 
54
  ### Out-of-Scope Use
55
 
56
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
57
 
58
+ This model should not be used for generating malicious content or for tasks that could cause harm.
59
 
60
  ## Bias, Risks, and Limitations
61
 
 
81
 
82
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
83
 
84
+ Trained with 17k long CoT training samples, the Qwen2.5-32B-Instruct model achieves significant improvements on a wide range of math and coding benchmarks
85
 
86
  ### Training Procedure
87
 
 
91
 
92
  [More Information Needed]
93
 
 
94
  #### Training Hyperparameters
95
 
96
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
131
 
132
  #### Summary
133
 
 
 
134
  ## Model Examination [optional]
135
 
136
  <!-- Relevant interpretability work for the model goes here -->