emese-tech
/

csermely-mlx

Text Generation

Model card Files Files and versions

csermely-mlx / README.md

gyopak's picture

v0.1

f2855e3 verified 12 days ago

|

history blame contribute delete

1.41 kB

	---
	language:
	- hu
	license: mit
	tags:
	- hungarian
	- causal-lm
	- llama
	- mlx
	- apple-silicon
	- sentencepiece
	library_name: mlx
	pipeline_tag: text-generation
	model-index:
	- name: csermely-mlx
	results: []
	---

	# Csermely (MLX)

	MLX version of Csermely — a 138M parameter Hungarian language model optimized for Apple Silicon. Part of the [Emese](https://emese.tech) model family.

	This is the native MLX bfloat16 checkpoint. For the HuggingFace transformers version, see [emese-tech/csermely](https://huggingface.co/emese-tech/csermely).

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Parameters \| 137.8M \|
	\| Architecture \| LLaMA-style (decoder-only transformer) \|
	\| Context length \| 8,192 tokens (YaRN RoPE) \|
	\| Training context \| 2,048 tokens \|
	\| Precision \| bfloat16 \|
	\| Vocabulary \| 32,000 (SentencePiece Unigram, Hungarian) \|
	\| Training data \| ~1B tokens of Hungarian text \|
	\| Framework \| MLX (Apple Silicon) \|
	\| License \| MIT \|

	## Architecture

	- 16 transformer layers
	- 768 hidden dimension
	- 12 attention heads
	- 2048 FFN intermediate size
	- RMSNorm pre-layer normalization
	- Rotary positional embeddings (RoPE) with YaRN extension
	- SwiGLU feed-forward activation
	- Tied input/output embeddings

	## Usage

	```python
	import mlx.core as mx
	from model import Emese, ModelConfig

	config = ModelConfig()
	model = Emese(config)
	model.load_weights("model.safetensors")
	```