FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Paper
•
2502.21309
•
Published
•
1
Model Card for FANformer-1B
FANformer-1B is a 1.1-billion-parameter autoregressive language model pre-trained from scratch to enhance language modeling through effective periodicity mechanisms. Its revised architecture (olmo/model.py) introduces the FAN Layer, a novel component designed to capture periodic patterns in training data, enabling superior learning efficiency and performance.
Inference Example:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
input_text = "The concept of periodicity serves as a fundamental organizing principle across the natural world, human societies, and even abstract systems. From the rhythmic cycles of celestial bodies governing seasons and tides to the biological clocks regulating sleep and metabolism in living organisms, recurring patterns create stability amid chaos. In ecosystems, predator-prey population oscillations maintain balance, while the carbon cycle ensures Earth's climate resilience. Culturally, humanity has structured civilizations around agricultural cycles, religious calendars, and economic fluctuations—harvest festivals marking seasonal abundance, financial markets swaying between boom and bust. Even at the quantum level, wave functions reveal inherent periodicity that underpins material reality. This universal recurrence enables prediction, adaptation, and innovation: by recognizing cycles, we"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, do_sample=True, temperature=0.6, top_p=0.8)
print(tokenizer.decode(outputs[0]))
| Standard Benchmarks | Llama-3.2-1B | TinyLLaMA-v1.1 (3T) | MobiLLaMA-1B (1.3T) | OLMo-1B (2T) | OpenELM-1_1B (1.8T) | OLMo-1B-0724 (3T) | AMD-OLMo-1B (1.3T) | FANformer-1B (1T) |
|---|---|---|---|---|---|---|---|---|
| arc_easy | 56.84 | 55.47 | 56.65 | 57.28 | 55.43 | 56.65 | 63.64 | 72.456 |
| arc_challenge | 38.13 | 32.68 | 32.00 | 31.06 | 32.34 | 32.34 | 33.70 | 43.813 |
| hellaswag | 64.00 | 61.47 | 61.80 | 62.92 | 64.81 | 66.12 | 63.61 | 64.758 |
| piqa | 73.80 | 73.56 | 75.30 | 75.14 | 75.57 | 75.08 | 75.57 | 75.547 |
| boolq | 64.30 | 55.99 | 60.83 | 61.74 | 63.58 | 66.18 | 60.58 | 64.924 |
| sciq | 92.30 | 89.30 | 88.20 | 87.00 | 90.60 | 92.70 | 93.20 | 94.80 |
| winogrande | 61.20 | 59.43 | 59.27 | 59.98 | 61.72 | 61.72 | 61.64 | 61.80 |
| openbookqa | 46.00 | 36.80 | 35.40 | 36.20 | 36.20 | 35.60 | 35.80 | 48.20 |
| gsm8k | 6.83 | 1.82 | 0.00 | 2.50 | 2.81 | 8.95 | 2.88 | 15.74 |
| Average | 55.93 | 51.84 | 52.16 | 52.65 | 53.67 | 55.04 | 54.51 | 60.23 |
@article{dong2025fanformer,
title={FANformer: Improving Large Language Models Through Effective Periodicity Modeling},
author={Dong, Yihong and Li, Ge and Jiang, Xue and Tao, Yongding and Zhang, Kechi and Zhu, Hao and Liu, Huanyu and Ding, Jiazheng and Li, Jia and Deng, Jinliang and Mei, Hong},
journal={arXiv preprint arXiv:2502.21309},
year={2025}
}
@article{dong2024fan,
title={FAN: Fourier Analysis Networks},
author={Yihong Dong and Ge Li and Yongding Tao and Xue Jiang and Kechi Zhang and Jia Li and Jing Su and Jun Zhang and Jingjing Xu},
journal={arXiv preprint arXiv:2410.02675},
year={2024}
}