PKU-DS-LAB
/

Fairy2i-W2

@@ -1,18 +1,18 @@
 ---
-license: llama2
 base_model: meta-llama/Llama-2-7b-hf
-tags:
-  - llama-2
-  - quantization
-  - qat
-  - complex-valued
-  - 2-bit
-  - text-generation
-  - recursive
-  - safetensors
 language:
-  - en
 pipeline_tag: text-generation
 ---
 # Fairy2i-W2
@@ -67,25 +67,41 @@ To further reduce quantization error, we recursively quantize the residual error
 - **Fairy2i-W2** achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
 - **Fairy2i-W1 (1-bit)** outperforms real-valued binary and ternary baselines at the same or lower bit budgets
-## Quick Start
 **Fairy2i-W2** is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.
-### Installation
 ```bash
-pip install torch transformers safetensors huggingface_hub
 ```
-### Loading the Model
-Please refer to `load_model.py` for detailed implementation. Basic usage:
 ```python
-from load_model import load_model
-# Load Fairy2i-W2 model
-model, tokenizer = load_model()
 # The model is ready to use!
 prompt = "Hello, how are you?"
@@ -103,7 +119,94 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 ```
-### Model Details
 - **Base Model**: LLaMA-2 7B
 - **Quantization Method**: Complex-Phase V2 (2-step recursive residual quantization)
@@ -111,34 +214,48 @@ print(response)
 - **Codebook**: {±1, ±i} (fourth roots of unity)
 - **Training**: QAT (Quantization-Aware Training) on 30B tokens from RedPajama dataset
-### Files in Repository
-- `load_model.py`: Model loading script
-- `qat_modules.py`: QAT linear layer implementations
-- `quantization.py`: Quantization functions (PhaseQuant, BitNet, etc.)
-- `config.json`: Model configuration (identical to LLaMA-2 7B)
-- `model.safetensors.index.json`: Weight file index
-- `model-0000X-of-00003.safetensors`: Sharded model weights
-- Tokenizer files: `tokenizer.json`, `tokenizer_config.json`, etc.
-### Citation
 If you use Fairy2i-W2 in your research, please cite:
 ```bibtex
 @article{wang2025fairy2i,
-  title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {±1, ±i}},
   author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
   journal={arXiv preprint},
   year={2025}
 }
 ```
-### License
 This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.
-### Contact
-For questions or issues, please contact: tanxinyu330@gmail.com

 ---
 base_model: meta-llama/Llama-2-7b-hf
 language:
+- en
+license: llama2
 pipeline_tag: text-generation
+library_name: transformers
+tags:
+- llama-2
+- quantization
+- qat
+- complex-valued
+- 2-bit
+- recursive
+- safetensors
 ---
 # Fairy2i-W2
 - **Fairy2i-W2** achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
 - **Fairy2i-W1 (1-bit)** outperforms real-valued binary and ternary baselines at the same or lower bit budgets
+## 🚀 Quick Start
 **Fairy2i-W2** is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.
+### 📦 Installation
 ```bash
+pip install torch transformers safetensors huggingface_hub accelerate datasets lm-eval
 ```
+### 🔄 Loading the Model
+The model can be loaded using the `model_module` package. Here's a basic example:
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from model_module.qat_modules import replace_modules_for_qat, convert_to_inference_mode
+import torch
+# Load base model
+model_path = "meta-llama/Llama-2-7b-hf"  # or your local path
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    attn_implementation="flash_attention_2",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+# Replace linear layers with QAT modules
+replace_modules_for_qat(model, "complex_phase_v2", skip_lm_head=False)
+# Convert to inference mode for faster inference
+convert_to_inference_mode(model)
 # The model is ready to use!
 prompt = "Hello, how are you?"
 print(response)
 ```
+### 📊 Data Processing
+The training data is processed from RedPajama-Data-1T using two sequential steps:
+#### Step 1: Sample 100B tokens from RedPajama-Data-1T
+Use `dataset/sample.py` to sample 100B tokens from the RedPajama-Data-1T dataset:
+```bash
+cd dataset
+python sample.py
+```
+This script:
+- Loads the RedPajama-Data-1T dataset from Hugging Face
+- Samples approximately 100B tokens using 10 parallel processes
+- Saves the sampled data to `new_dataset_100B_redpajama_final_dataset{0-9}` directories
+#### Step 2: Process into 2048-token aligned blocks
+Use `dataset/padding_and_cut.py` to chunk the sampled data into 2048-token aligned blocks:
+```bash
+cd dataset
+python padding_and_cut.py
+```
+This script:
+- Loads the sampled datasets from Step 1
+- Processes data into 2048-token aligned blocks using `group_and_chunk` function
+- Saves the processed data to `dataset_100B_redpajama_2048_aligned/` directory
+**Note:** Make sure to update the input paths in `padding_and_cut.py` to point to your sampled dataset directories.
+#### Custom DataCollator
+The training uses a custom `MyDataCollatorForLanguageModeling` class defined in `train/mydatacollator.py`. This collator is specifically designed to work with the 2048-token aligned data blocks.
+**To use the custom DataCollator:**
+You can directly copy `train/mydatacollator.py` into `transformers.data.data_collator` module (version-independent). The custom collator handles:
+- Proper label masking for aligned 2048-token blocks
+- EOS token position handling for causal language modeling
+- Compatibility with the pre-processed aligned dataset format
+The custom collator is automatically imported in the training script via:
+```python
+from transformers.data.data_collator import MyDataCollatorForLanguageModeling
+```
+### 🏋️ Training
+To train a model with QAT, use the training script:
+```bash
+cd train
+bash train.sh
+```
+**Note:** For Fairy2i-W2, the training uses fixed parameters:
+- `--quant_method complex_phase_v2` (1-step recursive residual quantization)
+- `--skip_lm_head False` (lm_head will be replaced)
+The training script supports the following arguments:
+- `--quant_method`: QAT quantization method (choices: `bitnet`, `complex_phase_v1`, `complex_phase_v2`, `complex_phase_v3`, `complex_phase_v4`)
+- `--skip_lm_head`: Whether to skip replacement of lm_head layer (default: False)
+### ✅ Evaluation
+#### 📉 Perplexity Evaluation
+Evaluate perplexity on Wikitext-2 and C4 datasets:
+```bash
+cd eval
+bash eval_ppl.sh
+```
+#### 🎯 Task Evaluation
+Evaluate on downstream tasks using lm-eval:
+```bash
+cd eval
+bash eval_task.sh
+```
+### ℹ️ Model Details
 - **Base Model**: LLaMA-2 7B
 - **Quantization Method**: Complex-Phase V2 (2-step recursive residual quantization)
 - **Codebook**: {±1, ±i} (fourth roots of unity)
 - **Training**: QAT (Quantization-Aware Training) on 30B tokens from RedPajama dataset
+## 📁 Repository Structure
+```
+fairy2i-w2-repo-github/
+├── README.md
+├── model_module/
+│   ├── __init__.py
+│   ├── qat_modules.py          # QAT linear layer implementations
+│   └── quantization.py         # Quantization functions (PhaseQuant, BitNet, etc.)
+├── dataset/
+│   ├── sample.py               # Sample 100B tokens from RedPajama-Data-1T
+│   └── padding_and_cut.py     # Process data into 2048-token aligned blocks
+├── train/
+│   ├── train.py                # Training script
+│   ├── train.sh                # Training launch script
+│   ├── mydatacollator.py       # Custom DataCollator for aligned data
+│   └── complexnet_config.yaml  # Accelerate configuration
+└── eval/
+    ├── eval_ppl.py             # Perplexity evaluation script
+    ├── eval_ppl.sh             # Perplexity evaluation launcher
+    ├── eval_task.py            # Task evaluation script
+    ├── eval_task.sh            # Task evaluation launcher
+    └── eval_utils.py            # Evaluation utilities
+```
+## 📚 Citation
 If you use Fairy2i-W2 in your research, please cite:
 ```bibtex
 @article{wang2025fairy2i,
+  title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {$\\pm 1, \\pm i$}},
   author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
   journal={arXiv preprint},
   year={2025}
 }
 ```
+## ⚖️ License
 This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.
+## 📧 Contact
+For questions or issues, please contact: tanxinyu330@gmail.com