Sefer 1.2B Base Model
A 1.2 billion parameter language model using CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers combined with attention layers.
Model Architecture
- Parameters: 1.2B
- Architecture: Hybrid CFDRA + Attention
- Layers: 24 total (18 CFDRA + 6 Attention)
- Hidden size: 1792
- Attention heads: 14 (with 2 KV heads for GQA)
- Sequence length: 2048
- Vocab size: 151,936 (Qwen tokenizer)
Training Status
- Checkpoint: Step 3,000 / 200,000
- Tokens seen: ~197M tokens
- Dataset: FineWeb 100BT sample
CFDRA Features
CFDRA layers use damped oscillator modes and FFT-based convolution for efficient sequence mixing:
- Inherent positional information (no positional encoding needed)
- Linear complexity for long sequences
- Decay parameter regularization for diverse time scales
Usage
This is a base model checkpoint. For chat capabilities, fine-tuning on instruction data is required.
Citation
Part of the Sefer project by Fractal AGI.
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support