fractal-agi
/

sefer-1b-base

Model card Files Files and versions

Sefer 1.2B Base Model

A 1.2 billion parameter language model using CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers combined with attention layers.

Model Architecture

Parameters: 1.2B
Architecture: Hybrid CFDRA + Attention
Layers: 24 total (18 CFDRA + 6 Attention)
Hidden size: 1792
Attention heads: 14 (with 2 KV heads for GQA)
Sequence length: 2048
Vocab size: 151,936 (Qwen tokenizer)

Training Status

Checkpoint: Step 3,000 / 200,000
Tokens seen: ~197M tokens
Dataset: FineWeb 100BT sample

CFDRA Features

CFDRA layers use damped oscillator modes and FFT-based convolution for efficient sequence mixing:

Inherent positional information (no positional encoding needed)
Linear complexity for long sequences
Decay parameter regularization for diverse time scales

Usage

This is a base model checkpoint. For chat capabilities, fine-tuning on instruction data is required.

Citation

Part of the Sefer project by Fractal AGI.

Downloads last month: 19

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support