--- language: - hu license: mit tags: - hungarian - causal-lm - llama - mlx - apple-silicon - sentencepiece library_name: mlx pipeline_tag: text-generation model-index: - name: csermely-mlx results: [] --- # Csermely (MLX) **MLX version of Csermely** — a 138M parameter Hungarian language model optimized for Apple Silicon. Part of the [Emese](https://emese.tech) model family. This is the native MLX bfloat16 checkpoint. For the HuggingFace transformers version, see [emese-tech/csermely](https://huggingface.co/emese-tech/csermely). ## Model Details | | | |---|---| | **Parameters** | 137.8M | | **Architecture** | LLaMA-style (decoder-only transformer) | | **Context length** | 8,192 tokens (YaRN RoPE) | | **Training context** | 2,048 tokens | | **Precision** | bfloat16 | | **Vocabulary** | 32,000 (SentencePiece Unigram, Hungarian) | | **Training data** | ~1B tokens of Hungarian text | | **Framework** | MLX (Apple Silicon) | | **License** | MIT | ## Architecture - 16 transformer layers - 768 hidden dimension - 12 attention heads - 2048 FFN intermediate size - RMSNorm pre-layer normalization - Rotary positional embeddings (RoPE) with YaRN extension - SwiGLU feed-forward activation - Tied input/output embeddings ## Usage ```python import mlx.core as mx from model import Emese, ModelConfig config = ModelConfig() model = Emese(config) model.load_weights("model.safetensors") ```