Sefer 1.2B Base Model

A 1.2 billion parameter language model using CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers combined with attention layers.

Model Architecture

  • Parameters: 1.2B
  • Architecture: Hybrid CFDRA + Attention
  • Layers: 24 total (18 CFDRA + 6 Attention)
  • Hidden size: 1792
  • Attention heads: 14 (with 2 KV heads for GQA)
  • Sequence length: 2048
  • Vocab size: 151,936 (Qwen tokenizer)

Training Status

  • Checkpoint: Step 3,000 / 200,000
  • Tokens seen: ~197M tokens
  • Dataset: FineWeb 100BT sample

CFDRA Features

CFDRA layers use damped oscillator modes and FFT-based convolution for efficient sequence mixing:

  • Inherent positional information (no positional encoding needed)
  • Linear complexity for long sequences
  • Decay parameter regularization for diverse time scales

Usage

This is a base model checkpoint. For chat capabilities, fine-tuning on instruction data is required.

Citation

Part of the Sefer project by Fractal AGI.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support