Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ datasets:
|
|
| 5 |
- HuggingFaceTB/smoltalk
|
| 6 |
- HuggingFaceH4/ultrafeedback_binarized
|
| 7 |
base_model:
|
| 8 |
-
-
|
| 9 |
language:
|
| 10 |
- en
|
| 11 |
pipeline_tag: question-answering
|
|
@@ -26,8 +26,8 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
|
|
| 26 |
```python
|
| 27 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
|
| 28 |
|
| 29 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 30 |
-
model = AutoModelForCausalLM.from_pretrained("
|
| 31 |
|
| 32 |
generation_config = GenerationConfig(
|
| 33 |
max_new_tokens=100,
|
|
@@ -70,14 +70,14 @@ We build the Doge-Instruct by first SFT on [SmolTalk](https://huggingface.co/dat
|
|
| 70 |
**SFT**:
|
| 71 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
| 72 |
|---|---|---|---|---|---|---|
|
| 73 |
-
| [Doge-20M-Instruct-SFT](https://huggingface.co/
|
| 74 |
-
| [Doge-60M-Instruct](https://huggingface.co/
|
| 75 |
|
| 76 |
**DPO**:
|
| 77 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
| 78 |
|---|---|---|---|---|---|---|
|
| 79 |
-
| [Doge-20M-Instruct](https://huggingface.co/
|
| 80 |
-
| [Doge-60M-Instruct](https://huggingface.co/
|
| 81 |
|
| 82 |
|
| 83 |
**Procedure**:
|
|
|
|
| 5 |
- HuggingFaceTB/smoltalk
|
| 6 |
- HuggingFaceH4/ultrafeedback_binarized
|
| 7 |
base_model:
|
| 8 |
+
- SmallDoge/Doge-20M
|
| 9 |
language:
|
| 10 |
- en
|
| 11 |
pipeline_tag: question-answering
|
|
|
|
| 26 |
```python
|
| 27 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
|
| 28 |
|
| 29 |
+
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
|
| 30 |
+
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct", trust_remote_code=True)
|
| 31 |
|
| 32 |
generation_config = GenerationConfig(
|
| 33 |
max_new_tokens=100,
|
|
|
|
| 70 |
**SFT**:
|
| 71 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
| 72 |
|---|---|---|---|---|---|---|
|
| 73 |
+
| [Doge-20M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-20M-Instruct-SFT) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 8e-4 | 0.25M | bfloat16 |
|
| 74 |
+
| [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 6e-4 | 0.25M | bfloat16 |
|
| 75 |
|
| 76 |
**DPO**:
|
| 77 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
| 78 |
|---|---|---|---|---|---|---|
|
| 79 |
+
| [Doge-20M-Instruct](https://huggingface.co/SmallDoge/Doge-20M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 8e-5 | 0.125M | bfloat16 |
|
| 80 |
+
| [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 6e-5 | 0.125M | bfloat16 |
|
| 81 |
|
| 82 |
|
| 83 |
**Procedure**:
|