Add metadata and link to paper

#12
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -1,9 +1,19 @@
1
  ---
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
  This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
6
 
 
 
7
  ## Results
8
  PPL and zero-shot accuracy:
9
  | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
@@ -35,4 +45,4 @@ python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
35
  --output_path result.json \
36
  --num_fewshot 0 \
37
  --ctx_size 2048
38
- ```
 
1
  ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ license: mit
5
+ ---
6
+
7
+ ---
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
  license: mit
11
  ---
12
 
13
  This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
14
 
15
+ It was described in [T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge](https://huggingface.co/papers/2407.00088) with code available at https://github.com/microsoft/T-MAC.
16
+
17
  ## Results
18
  PPL and zero-shot accuracy:
19
  | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
 
45
  --output_path result.json \
46
  --num_fewshot 0 \
47
  --ctx_size 2048
48
+ ```