Add metadata and link to paper
#12
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,9 +1,19 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
|
| 5 |
This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
|
| 6 |
|
|
|
|
|
|
|
| 7 |
## Results
|
| 8 |
PPL and zero-shot accuracy:
|
| 9 |
| Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
|
|
@@ -35,4 +45,4 @@ python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
|
|
| 35 |
--output_path result.json \
|
| 36 |
--num_fewshot 0 \
|
| 37 |
--ctx_size 2048
|
| 38 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
license: mit
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
library_name: transformers
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
|
| 14 |
|
| 15 |
+
It was described in [T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge](https://huggingface.co/papers/2407.00088) with code available at https://github.com/microsoft/T-MAC.
|
| 16 |
+
|
| 17 |
## Results
|
| 18 |
PPL and zero-shot accuracy:
|
| 19 |
| Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
|
|
|
|
| 45 |
--output_path result.json \
|
| 46 |
--num_fewshot 0 \
|
| 47 |
--ctx_size 2048
|
| 48 |
+
```
|