Model Card for FANformer-1B

Model Description

Model Name: FANformer-1B
Non-embedding Parameters: 1.1B
Training Tokens: 1 trillion
Release Date: March 2025
Model Type: Decoder-only LLM with enhanced periodicity modeling
License: MIT License
Repository: GitHub
Paper: arXiv:2502.21309

FANformer-1B is a 1.1-billion-parameter autoregressive language model pre-trained from scratch to enhance language modeling through effective periodicity mechanisms. Its revised architecture (olmo/model.py) introduces the FAN Layer, a novel component designed to capture periodic patterns in training data, enabling superior learning efficiency and performance.

Training Details

Hardware: 80 A100 40G GPUs
Training Data: Subset of Dolma Dataset (OLMo-1B’s training corpus)
Maximum Context Length: 2,048 tokens

Intended Uses

Primary Use: General-purpose text generation and understanding.
Downstream Use: Can be fine-tuned for tasks like summarization, question answering, and dialogue.
Limitations: May inherit biases from training data. Performance on low-resource languages is not guaranteed.

How to Use

Inference Example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)

input_text = "The concept of periodicity serves as a fundamental organizing principle across the natural world, human societies, and even abstract systems. From the rhythmic cycles of celestial bodies governing seasons and tides to the biological clocks regulating sleep and metabolism in living organisms, recurring patterns create stability amid chaos. In ecosystems, predator-prey population oscillations maintain balance, while the carbon cycle ensures Earth's climate resilience. Culturally, humanity has structured civilizations around agricultural cycles, religious calendars, and economic fluctuations—harvest festivals marking seasonal abundance, financial markets swaying between boom and bust. Even at the quantum level, wave functions reveal inherent periodicity that underpins material reality. This universal recurrence enables prediction, adaptation, and innovation: by recognizing cycles, we"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, do_sample=True, temperature=0.6, top_p=0.8)

print(tokenizer.decode(outputs[0]))

Evaluation

Standard Benchmarks	Llama-3.2-1B	TinyLLaMA-v1.1 (3T)	MobiLLaMA-1B (1.3T)	OLMo-1B (2T)	OpenELM-1_1B (1.8T)	OLMo-1B-0724 (3T)	AMD-OLMo-1B (1.3T)	FANformer-1B (1T)
arc_easy	56.84	55.47	56.65	57.28	55.43	56.65	63.64	72.456
arc_challenge	38.13	32.68	32.00	31.06	32.34	32.34	33.70	43.813
hellaswag	64.00	61.47	61.80	62.92	64.81	66.12	63.61	64.758
piqa	73.80	73.56	75.30	75.14	75.57	75.08	75.57	75.547
boolq	64.30	55.99	60.83	61.74	63.58	66.18	60.58	64.924
sciq	92.30	89.30	88.20	87.00	90.60	92.70	93.20	94.80
winogrande	61.20	59.43	59.27	59.98	61.72	61.72	61.64	61.80
openbookqa	46.00	36.80	35.40	36.20	36.20	35.60	35.80	48.20
gsm8k	6.83	1.82	0.00	2.50	2.81	8.95	2.88	15.74
Average	55.93	51.84	52.16	52.65	53.67	55.04	54.51	60.23

Citation

@article{dong2025fanformer,
  title={FANformer: Improving Large Language Models Through Effective Periodicity Modeling},
  author={Dong, Yihong and Li, Ge and Jiang, Xue and Tao, Yongding and Zhang, Kechi and Zhu, Hao and Liu, Huanyu and Ding, Jiazheng and Li, Jia and Deng, Jinliang and Mei, Hong},
  journal={arXiv preprint arXiv:2502.21309},
  year={2025}
}

@article{dong2024fan,
  title={FAN: Fourier Analysis Networks},
  author={Yihong Dong and Ge Li and Yongding Tao and Xue Jiang and Kechi Zhang and Jia Li and Jing Su and Jun Zhang and Jingjing Xu},
  journal={arXiv preprint arXiv:2410.02675},
  year={2024}
}

Downloads last month: 10

Safetensors

Model size

1B params

Tensor type

F32

Dataset used to train dongyh/FANformer-1B

Papers for dongyh/FANformer-1B

FANformer: Improving Large Language Models Through Effective Periodicity Modeling

Paper • 2502.21309 • Published Feb 28, 2025 • 1

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29