YAML Metadata
Warning:
The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Sefer-270M-Chat
A 270M parameter chat model fine-tuned from Sefer-270M-Base on the UltraChat dataset. This model combines CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers with Transformer attention.
Model Description
Sefer-270M-Chat is an instruction-tuned version of the Sefer-270M base model, designed for conversational AI tasks.
Training Pipeline
- Pre-training (Base Model): 60K steps on FineWeb (~14.7B tokens)
- Fine-tuning (This Model): 11K steps on UltraChat (~720M tokens)
Architecture
| Component | Value |
|---|---|
| Parameters | ~269M |
| Hidden Size (d_model) | 768 |
| Layers | 20 (15 CFDRA + 5 Attention) |
| Attention Heads | 12 |
| KV Heads (GQA) | 4 |
| FFN Expansion | 4x |
| Vocab Size | 151,936 |
| Context Length | 2,048 |
| Tokenizer | Qwen/Qwen2.5-1.5B |
Key Innovation: CFDRA Layers
The model uses CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers for efficient sequence modeling:
- Damped oscillator modes for multi-scale temporal patterns
- FFT-based convolution for O(n log n) complexity
- Frozen decay parameters preserving diverse time scales
Training Details
Fine-tuning Configuration
- Base Model: sefer-270m-base-60k (60K steps pre-training)
- Dataset: UltraChat 200K
- Steps: 11,000
- Batch Size: 32 effective (8 × 4 gradient accumulation)
- Learning Rate: 2e-5 (cosine schedule)
- Precision: bfloat16
- Final Eval Loss: 1.83
Chat Format
The model uses the ChatML format:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>
Usage
Installation
git clone https://github.com/fractal-agi/tcfdra-sefer.git
cd tcfdra-sefer
pip install -r requirements.txt
Loading the Model
import torch
from src.model.tcfdra_moe import TCFDRAConfig, TCFDRAModel
from transformers import AutoTokenizer
# Load config
config = TCFDRAConfig(
d_model=768,
vocab_size=151936,
n_layers=20,
cfdra_ratio=3,
use_attention=True,
R=48,
M=384,
kernel_len=2048,
chunk_size=512,
n_heads=12,
n_kv_heads=4,
ffn_expansion=4,
dropout=0.0,
freeze_decay=True,
)
# Create model and load weights
model = TCFDRAModel(config)
state_dict = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
Chat Example
def chat(model, tokenizer, messages, max_new_tokens=256):
# Format messages in ChatML
prompt = ""
for msg in messages:
role = msg["role"]
content = msg["content"]
prompt += f"<|im_start|>{role}\n{content}<|im_end|>\n"
prompt += "<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(
inputs["input_ids"],
max_new_tokens=max_new_tokens,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(output_ids[0], skip_special_tokens=False)
# Extract assistant response
response = response.split("<|im_start|>assistant\n")[-1]
response = response.split("<|im_end|>")[0]
return response
# Example usage
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
]
response = chat(model, tokenizer, messages)
print(response)
Limitations
- Small model size: 270M parameters limits reasoning capabilities compared to larger models
- Limited pre-training: Base model trained on ~14.7B tokens (vs trillions for frontier models)
- English only: Primarily trained on English text
- Experimental architecture: CFDRA is a novel architecture still being researched
Demo
Try the model: Sefer-270M Chat Demo
Citation
@misc{sefer270mchat2025,
title={Sefer-270M-Chat: A Hybrid CFDRA-Transformer Chat Model},
author={Fractal AGI},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/fractal-agi/sefer-270m-chat}
}
License
Apache 2.0
Contact
- Organization: Fractal AGI
- Downloads last month
- 10
Model tree for fractal-agi/sefer-270m-chat
Base model
fractal-agi/sefer-270m-base-60k