YAML Metadata Warning: The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Sefer-270M-Chat

A 270M parameter chat model fine-tuned from Sefer-270M-Base on the UltraChat dataset. This model combines CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers with Transformer attention.

Model Description

Sefer-270M-Chat is an instruction-tuned version of the Sefer-270M base model, designed for conversational AI tasks.

Training Pipeline

  1. Pre-training (Base Model): 60K steps on FineWeb (~14.7B tokens)
  2. Fine-tuning (This Model): 11K steps on UltraChat (~720M tokens)

Architecture

Component Value
Parameters ~269M
Hidden Size (d_model) 768
Layers 20 (15 CFDRA + 5 Attention)
Attention Heads 12
KV Heads (GQA) 4
FFN Expansion 4x
Vocab Size 151,936
Context Length 2,048
Tokenizer Qwen/Qwen2.5-1.5B

Key Innovation: CFDRA Layers

The model uses CFDRA (Convolutional Frequency-Domain Recurrent Architecture) layers for efficient sequence modeling:

  • Damped oscillator modes for multi-scale temporal patterns
  • FFT-based convolution for O(n log n) complexity
  • Frozen decay parameters preserving diverse time scales

Training Details

Fine-tuning Configuration

  • Base Model: sefer-270m-base-60k (60K steps pre-training)
  • Dataset: UltraChat 200K
  • Steps: 11,000
  • Batch Size: 32 effective (8 × 4 gradient accumulation)
  • Learning Rate: 2e-5 (cosine schedule)
  • Precision: bfloat16
  • Final Eval Loss: 1.83

Chat Format

The model uses the ChatML format:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>

Usage

Installation

git clone https://github.com/fractal-agi/tcfdra-sefer.git
cd tcfdra-sefer
pip install -r requirements.txt

Loading the Model

import torch
from src.model.tcfdra_moe import TCFDRAConfig, TCFDRAModel
from transformers import AutoTokenizer

# Load config
config = TCFDRAConfig(
    d_model=768,
    vocab_size=151936,
    n_layers=20,
    cfdra_ratio=3,
    use_attention=True,
    R=48,
    M=384,
    kernel_len=2048,
    chunk_size=512,
    n_heads=12,
    n_kv_heads=4,
    ffn_expansion=4,
    dropout=0.0,
    freeze_decay=True,
)

# Create model and load weights
model = TCFDRAModel(config)
state_dict = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")

Chat Example

def chat(model, tokenizer, messages, max_new_tokens=256):
    # Format messages in ChatML
    prompt = ""
    for msg in messages:
        role = msg["role"]
        content = msg["content"]
        prompt += f"<|im_start|>{role}\n{content}<|im_end|>\n"
    prompt += "<|im_start|>assistant\n"
    
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        output_ids = model.generate(
            inputs["input_ids"],
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            top_p=0.9,
        )
    
    response = tokenizer.decode(output_ids[0], skip_special_tokens=False)
    # Extract assistant response
    response = response.split("<|im_start|>assistant\n")[-1]
    response = response.split("<|im_end|>")[0]
    return response

# Example usage
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is machine learning?"}
]

response = chat(model, tokenizer, messages)
print(response)

Limitations

  • Small model size: 270M parameters limits reasoning capabilities compared to larger models
  • Limited pre-training: Base model trained on ~14.7B tokens (vs trillions for frontier models)
  • English only: Primarily trained on English text
  • Experimental architecture: CFDRA is a novel architecture still being researched

Demo

Try the model: Sefer-270M Chat Demo

Citation

@misc{sefer270mchat2025,
  title={Sefer-270M-Chat: A Hybrid CFDRA-Transformer Chat Model},
  author={Fractal AGI},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/fractal-agi/sefer-270m-chat}
}

License

Apache 2.0

Contact

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fractal-agi/sefer-270m-chat

Finetuned
(1)
this model

Datasets used to train fractal-agi/sefer-270m-chat

Space using fractal-agi/sefer-270m-chat 1