csermely-mlx / README.md
gyopak's picture
v0.1
f2855e3 verified
metadata
language:
  - hu
license: mit
tags:
  - hungarian
  - causal-lm
  - llama
  - mlx
  - apple-silicon
  - sentencepiece
library_name: mlx
pipeline_tag: text-generation
model-index:
  - name: csermely-mlx
    results: []

Csermely (MLX)

MLX version of Csermely — a 138M parameter Hungarian language model optimized for Apple Silicon. Part of the Emese model family.

This is the native MLX bfloat16 checkpoint. For the HuggingFace transformers version, see emese-tech/csermely.

Model Details

Parameters 137.8M
Architecture LLaMA-style (decoder-only transformer)
Context length 8,192 tokens (YaRN RoPE)
Training context 2,048 tokens
Precision bfloat16
Vocabulary 32,000 (SentencePiece Unigram, Hungarian)
Training data ~1B tokens of Hungarian text
Framework MLX (Apple Silicon)
License MIT

Architecture

  • 16 transformer layers
  • 768 hidden dimension
  • 12 attention heads
  • 2048 FFN intermediate size
  • RMSNorm pre-layer normalization
  • Rotary positional embeddings (RoPE) with YaRN extension
  • SwiGLU feed-forward activation
  • Tied input/output embeddings

Usage

import mlx.core as mx
from model import Emese, ModelConfig

config = ModelConfig()
model = Emese(config)
model.load_weights("model.safetensors")