1bitLLM
/

bitnet_b1_58-3B

Text Generation

text-generation-inference

Model card Files Files and versions

Add metadata and link to paper

#12

by nielsr HF Staff - opened Mar 26, 2025

base: refs/heads/main

←

from: refs/pr/12

Discussion Files changed

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -1,9 +1,19 @@
 ---
 license: mit
 ---
 This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
 ## Results
 PPL and zero-shot accuracy:
 | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
@@ -35,4 +45,4 @@ python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
     --output_path result.json \
     --num_fewshot 0 \
     --ctx_size 2048
-```

 ---
+library_name: transformers
+pipeline_tag: text-generation
+license: mit
+---
+---
+library_name: transformers
+pipeline_tag: text-generation
 license: mit
 ---
 This is a reproduction of the <a href="https://arxiv.org/abs/2402.17764"> BitNet b1.58</a> paper. The models are trained with <a href="https://github.com/togethercomputer/RedPajama-Data">RedPajama dataset</a> for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following <a href="https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf">paper</a>. All models are open-source in the <a href="https://huggingface.co/1bitLLM">repo</a>. We will train larger models and/or more tokens when resource is available.
+It was described in [T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge](https://huggingface.co/papers/2407.00088) with code available at https://github.com/microsoft/T-MAC.
 ## Results
 PPL and zero-shot accuracy:
 | Models | PPL| ARCe| ARCc| HS | BQ | OQ | PQ | WGe | Avg
     --output_path result.json \
     --num_fewshot 0 \
     --ctx_size 2048
+```