Csermely (MLX)

MLX version of Csermely — a 138M parameter Hungarian language model optimized for Apple Silicon. Part of the Emese model family.

This is the native MLX bfloat16 checkpoint. For the HuggingFace transformers version, see emese-tech/csermely.

Model Details

Parameters 137.8M
Architecture LLaMA-style (decoder-only transformer)
Context length 8,192 tokens (YaRN RoPE)
Training context 2,048 tokens
Precision bfloat16
Vocabulary 32,000 (SentencePiece Unigram, Hungarian)
Training data ~1B tokens of Hungarian text
Framework MLX (Apple Silicon)
License MIT

Architecture

  • 16 transformer layers
  • 768 hidden dimension
  • 12 attention heads
  • 2048 FFN intermediate size
  • RMSNorm pre-layer normalization
  • Rotary positional embeddings (RoPE) with YaRN extension
  • SwiGLU feed-forward activation
  • Tied input/output embeddings

Usage

import mlx.core as mx
from model import Emese, ModelConfig

config = ModelConfig()
model = Emese(config)
model.load_weights("model.safetensors")
Downloads last month
35
Safetensors
Model size
0.1B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support