Add `library_name: transformers` and enhance model card with detailed usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +152 -35
README.md CHANGED
@@ -1,18 +1,18 @@
1
  ---
2
- license: llama2
3
  base_model: meta-llama/Llama-2-7b-hf
4
- tags:
5
- - llama-2
6
- - quantization
7
- - qat
8
- - complex-valued
9
- - 2-bit
10
- - text-generation
11
- - recursive
12
- - safetensors
13
  language:
14
- - en
 
15
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Fairy2i-W2
@@ -67,25 +67,41 @@ To further reduce quantization error, we recursively quantize the residual error
67
  - **Fairy2i-W2** achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
68
  - **Fairy2i-W1 (1-bit)** outperforms real-valued binary and ternary baselines at the same or lower bit budgets
69
 
70
- ## Quick Start
71
 
72
  **Fairy2i-W2** is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.
73
 
74
- ### Installation
75
 
76
  ```bash
77
- pip install torch transformers safetensors huggingface_hub
78
  ```
79
 
80
- ### Loading the Model
81
 
82
- Please refer to `load_model.py` for detailed implementation. Basic usage:
83
 
84
  ```python
85
- from load_model import load_model
86
-
87
- # Load Fairy2i-W2 model
88
- model, tokenizer = load_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  # The model is ready to use!
91
  prompt = "Hello, how are you?"
@@ -103,7 +119,94 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
103
  print(response)
104
  ```
105
 
106
- ### Model Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  - **Base Model**: LLaMA-2 7B
109
  - **Quantization Method**: Complex-Phase V2 (2-step recursive residual quantization)
@@ -111,34 +214,48 @@ print(response)
111
  - **Codebook**: {Β±1, Β±i} (fourth roots of unity)
112
  - **Training**: QAT (Quantization-Aware Training) on 30B tokens from RedPajama dataset
113
 
114
- ### Files in Repository
115
 
116
- - `load_model.py`: Model loading script
117
- - `qat_modules.py`: QAT linear layer implementations
118
- - `quantization.py`: Quantization functions (PhaseQuant, BitNet, etc.)
119
- - `config.json`: Model configuration (identical to LLaMA-2 7B)
120
- - `model.safetensors.index.json`: Weight file index
121
- - `model-0000X-of-00003.safetensors`: Sharded model weights
122
- - Tokenizer files: `tokenizer.json`, `tokenizer_config.json`, etc.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
- ### Citation
125
 
126
  If you use Fairy2i-W2 in your research, please cite:
127
 
128
  ```bibtex
129
  @article{wang2025fairy2i,
130
- title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {Β±1, Β±i}},
131
  author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
132
  journal={arXiv preprint},
133
  year={2025}
134
  }
135
  ```
136
 
137
- ### License
138
 
139
  This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.
140
 
141
- ### Contact
142
-
143
- For questions or issues, please contact: tanxinyu330@gmail.com
144
 
 
 
1
  ---
 
2
  base_model: meta-llama/Llama-2-7b-hf
 
 
 
 
 
 
 
 
 
3
  language:
4
+ - en
5
+ license: llama2
6
  pipeline_tag: text-generation
7
+ library_name: transformers
8
+ tags:
9
+ - llama-2
10
+ - quantization
11
+ - qat
12
+ - complex-valued
13
+ - 2-bit
14
+ - recursive
15
+ - safetensors
16
  ---
17
 
18
  # Fairy2i-W2
 
67
  - **Fairy2i-W2** achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
68
  - **Fairy2i-W1 (1-bit)** outperforms real-valued binary and ternary baselines at the same or lower bit budgets
69
 
70
+ ## πŸš€ Quick Start
71
 
72
  **Fairy2i-W2** is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.
73
 
74
+ ### πŸ“¦ Installation
75
 
76
  ```bash
77
+ pip install torch transformers safetensors huggingface_hub accelerate datasets lm-eval
78
  ```
79
 
80
+ ### πŸ”„ Loading the Model
81
 
82
+ The model can be loaded using the `model_module` package. Here's a basic example:
83
 
84
  ```python
85
+ from transformers import AutoModelForCausalLM, AutoTokenizer
86
+ from model_module.qat_modules import replace_modules_for_qat, convert_to_inference_mode
87
+ import torch
88
+
89
+ # Load base model
90
+ model_path = "meta-llama/Llama-2-7b-hf" # or your local path
91
+ model = AutoModelForCausalLM.from_pretrained(
92
+ model_path,
93
+ attn_implementation="flash_attention_2",
94
+ torch_dtype=torch.bfloat16,
95
+ device_map="auto",
96
+ trust_remote_code=True,
97
+ )
98
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
99
+
100
+ # Replace linear layers with QAT modules
101
+ replace_modules_for_qat(model, "complex_phase_v2", skip_lm_head=False)
102
+
103
+ # Convert to inference mode for faster inference
104
+ convert_to_inference_mode(model)
105
 
106
  # The model is ready to use!
107
  prompt = "Hello, how are you?"
 
119
  print(response)
120
  ```
121
 
122
+ ### πŸ“Š Data Processing
123
+
124
+ The training data is processed from RedPajama-Data-1T using two sequential steps:
125
+
126
+ #### Step 1: Sample 100B tokens from RedPajama-Data-1T
127
+
128
+ Use `dataset/sample.py` to sample 100B tokens from the RedPajama-Data-1T dataset:
129
+
130
+ ```bash
131
+ cd dataset
132
+ python sample.py
133
+ ```
134
+
135
+ This script:
136
+ - Loads the RedPajama-Data-1T dataset from Hugging Face
137
+ - Samples approximately 100B tokens using 10 parallel processes
138
+ - Saves the sampled data to `new_dataset_100B_redpajama_final_dataset{0-9}` directories
139
+
140
+ #### Step 2: Process into 2048-token aligned blocks
141
+
142
+ Use `dataset/padding_and_cut.py` to chunk the sampled data into 2048-token aligned blocks:
143
+
144
+ ```bash
145
+ cd dataset
146
+ python padding_and_cut.py
147
+ ```
148
+
149
+ This script:
150
+ - Loads the sampled datasets from Step 1
151
+ - Processes data into 2048-token aligned blocks using `group_and_chunk` function
152
+ - Saves the processed data to `dataset_100B_redpajama_2048_aligned/` directory
153
+
154
+ **Note:** Make sure to update the input paths in `padding_and_cut.py` to point to your sampled dataset directories.
155
+
156
+ #### Custom DataCollator
157
+
158
+ The training uses a custom `MyDataCollatorForLanguageModeling` class defined in `train/mydatacollator.py`. This collator is specifically designed to work with the 2048-token aligned data blocks.
159
+
160
+ **To use the custom DataCollator:**
161
+
162
+ You can directly copy `train/mydatacollator.py` into `transformers.data.data_collator` module (version-independent). The custom collator handles:
163
+ - Proper label masking for aligned 2048-token blocks
164
+ - EOS token position handling for causal language modeling
165
+ - Compatibility with the pre-processed aligned dataset format
166
+
167
+ The custom collator is automatically imported in the training script via:
168
+ ```python
169
+ from transformers.data.data_collator import MyDataCollatorForLanguageModeling
170
+ ```
171
+
172
+ ### πŸ‹οΈ Training
173
+
174
+ To train a model with QAT, use the training script:
175
+
176
+ ```bash
177
+ cd train
178
+ bash train.sh
179
+ ```
180
+
181
+ **Note:** For Fairy2i-W2, the training uses fixed parameters:
182
+ - `--quant_method complex_phase_v2` (1-step recursive residual quantization)
183
+ - `--skip_lm_head False` (lm_head will be replaced)
184
+
185
+ The training script supports the following arguments:
186
+ - `--quant_method`: QAT quantization method (choices: `bitnet`, `complex_phase_v1`, `complex_phase_v2`, `complex_phase_v3`, `complex_phase_v4`)
187
+ - `--skip_lm_head`: Whether to skip replacement of lm_head layer (default: False)
188
+
189
+ ### βœ… Evaluation
190
+
191
+ #### πŸ“‰ Perplexity Evaluation
192
+
193
+ Evaluate perplexity on Wikitext-2 and C4 datasets:
194
+
195
+ ```bash
196
+ cd eval
197
+ bash eval_ppl.sh
198
+ ```
199
+
200
+ #### 🎯 Task Evaluation
201
+
202
+ Evaluate on downstream tasks using lm-eval:
203
+
204
+ ```bash
205
+ cd eval
206
+ bash eval_task.sh
207
+ ```
208
+
209
+ ### ℹ️ Model Details
210
 
211
  - **Base Model**: LLaMA-2 7B
212
  - **Quantization Method**: Complex-Phase V2 (2-step recursive residual quantization)
 
214
  - **Codebook**: {Β±1, Β±i} (fourth roots of unity)
215
  - **Training**: QAT (Quantization-Aware Training) on 30B tokens from RedPajama dataset
216
 
217
+ ## πŸ“ Repository Structure
218
 
219
+ ```
220
+ fairy2i-w2-repo-github/
221
+ β”œβ”€β”€ README.md
222
+ β”œβ”€β”€ model_module/
223
+ β”‚ β”œβ”€β”€ __init__.py
224
+ β”‚ β”œβ”€β”€ qat_modules.py # QAT linear layer implementations
225
+ β”‚ └── quantization.py # Quantization functions (PhaseQuant, BitNet, etc.)
226
+ β”œβ”€β”€ dataset/
227
+ β”‚ β”œβ”€β”€ sample.py # Sample 100B tokens from RedPajama-Data-1T
228
+ β”‚ └── padding_and_cut.py # Process data into 2048-token aligned blocks
229
+ β”œβ”€β”€ train/
230
+ β”‚ β”œβ”€β”€ train.py # Training script
231
+ β”‚ β”œβ”€β”€ train.sh # Training launch script
232
+ β”‚ β”œβ”€β”€ mydatacollator.py # Custom DataCollator for aligned data
233
+ β”‚ └── complexnet_config.yaml # Accelerate configuration
234
+ └── eval/
235
+ β”œβ”€β”€ eval_ppl.py # Perplexity evaluation script
236
+ β”œβ”€β”€ eval_ppl.sh # Perplexity evaluation launcher
237
+ β”œβ”€β”€ eval_task.py # Task evaluation script
238
+ β”œβ”€β”€ eval_task.sh # Task evaluation launcher
239
+ └── eval_utils.py # Evaluation utilities
240
+ ```
241
 
242
+ ## πŸ“š Citation
243
 
244
  If you use Fairy2i-W2 in your research, please cite:
245
 
246
  ```bibtex
247
  @article{wang2025fairy2i,
248
+ title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {$\\pm 1, \\pm i$}},
249
  author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
250
  journal={arXiv preprint},
251
  year={2025}
252
  }
253
  ```
254
 
255
+ ## βš–οΈ License
256
 
257
  This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.
258
 
259
+ ## πŸ“§ Contact
 
 
260
 
261
+ For questions or issues, please contact: tanxinyu330@gmail.com