Instructions to use newmindai/Mecellem-Qwen3-4B-TR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use newmindai/Mecellem-Qwen3-4B-TR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="newmindai/Mecellem-Qwen3-4B-TR")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("newmindai/Mecellem-Qwen3-4B-TR")
model = AutoModelForCausalLM.from_pretrained("newmindai/Mecellem-Qwen3-4B-TR")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use newmindai/Mecellem-Qwen3-4B-TR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "newmindai/Mecellem-Qwen3-4B-TR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "newmindai/Mecellem-Qwen3-4B-TR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/newmindai/Mecellem-Qwen3-4B-TR

SGLang

How to use newmindai/Mecellem-Qwen3-4B-TR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "newmindai/Mecellem-Qwen3-4B-TR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "newmindai/Mecellem-Qwen3-4B-TR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "newmindai/Mecellem-Qwen3-4B-TR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "newmindai/Mecellem-Qwen3-4B-TR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use newmindai/Mecellem-Qwen3-4B-TR with Docker Model Runner:
```
docker model run hf.co/newmindai/Mecellem-Qwen3-4B-TR
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Mecellem-Qwen3-4B-TR

This repository contains the Mecellem-Qwen3-4B-TR model, as presented in the paper Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain.

GitHub Repository: newmindai/mecellem-models
Paper: arXiv:2601.16018

Model Description

Mecellem-Qwen3-4B-TR is a Turkish legal language model adapted through Continual Pre-training (CPT) on Turkish legal and official texts. The model is based on Qwen3-4B decoder architecture (4B parameters) and trained using a single-phase, large-scale CPT process. Unlike the 1.7B model's four-phase curriculum learning, this model employs a single-phase training strategy on a comprehensive dataset, demonstrating that larger model capacity can benefit from direct large-scale domain adaptation.

Key Features:

Continual pre-training on approximately 270.8 billion tokens in a single phase
Single-phase large-scale CPT process (270,791,712,595 tokens)
Dataset includes Turkish legal sources (Yargıtay, Danıştay, YÖKTEZ) and general Turkish web data (FineWeb2, CulturaX)
Preserves general language capabilities while injecting domain-specific legal knowledge

Model Type: Decoder-only Language Model
Parameters: 4B
Base Model: Qwen/Qwen3-4B
Architecture: Qwen3 decoder with grouped query attention (GQA)

Architecture Details

Max Position Embeddings: 40,960 tokens
Number of Layers: 36 transformer layers
Hidden Size: 2,560
FFN Hidden Size: 9,728
Number of Heads: 32
Number of KV Heads (GQA): 8
Activation Function: SwiGLU
Position Encodings: RoPE (Rotary Position Embeddings)
Layer Norm: RMSNorm

Training Details

Continual Pre-training (CPT):

Total Training Tokens: ~270.8 billion tokens (270,791,712,595 tokens)
Training Method: Single-phase large-scale CPT
Framework: NVIDIA NeMo with Megatron-Core
Precision: BF16 mixed precision
Hardware Infrastructure:
- System: MareNostrum 5 ACC partition at Barcelona Supercomputing Center (BSC)
- Compute Nodes: 100 nodes
- GPUs: 400× NVIDIA Hopper H100 64GB GPUs (SXM) (4 GPUs per node)
- Node Configuration: Each node equipped with 4× H100 GPUs, 80 CPU cores, 512GB DDR5 memory
- Interconnect: 800 Gb/s InfiniBand for distributed training
- GPU Interconnect: NVLink for intra-node GPU communication (4 GPUs per node connected via NVLink)
- Distributed Training: Data-parallel multi-node and multi-GPU distributed architecture with 4 GPUs per node
- InfiniBand Network: Enabled efficient processing of large-scale token flow and ensured high scalability and training stability in long-term CPT training
- Hardware Utilization: 18.7% median MFU, 2.57M tokens/sec throughput

Dataset Composition:

Legal Sources:
- Court of Cassation (Yargıtay): 10.3M sequences, ~3.43B tokens
- Council of State (Danıştay): 151K sequences, ~0.11B tokens
- Academic theses (YÖKTEZ): 21.1M sequences, ~9.61B tokens (after DocsOCR processing)
General Turkish Sources:
- FineWeb2: General Turkish web data
- CulturaX: Multilingual corpus (Turkish subset)
- Total general Turkish: 212M sequences, ~96.17B tokens
Additional Categories: English, Mathematics, Python code, multilingual content (Spanish, Arabic, Russian, Chinese)

Training Hyperparameters:

Sequence Length: 4,096 tokens
Optimizer: Adam with cosine learning rate schedule
Max Learning Rate: 5×10⁻⁵
Min Learning Rate: 5×10⁻⁶
Weight Decay: 0.01
Warmup Steps: 7,675 steps
Max Steps: 153,508 steps
Global Batch Size: 400
Per-GPU Batch Size: 1
Gradient Accumulation: 16

Training Visualization

The following visualizations show the model's training progress and dataset distribution:

Qwen3-4B CPT Dataset Distribution Single Phase. The model was trained using a single-phase, large-scale CPT process.

Qwen3-4B CPT Training and Validation Loss Curves. The model shows consistent improvement throughout training.

Benchmark Performance

The model was evaluated using the Muhakim reward model on Turkish legal tasks:

Benchmark Performance of 4B Decoder-Only Models Across Context Lengths Using the Muhakim Reward Model. Mecellem-Qwen3-4B-TR consistently outperforms the base Qwen3-4B model across all five legal quality objectives.

Rewards Comparison Analysis

The following visualization compares rewards across different token lengths for base vs CPT models:

Rewards Comparison: Base vs CPT Models Across Token Lengths. Mecellem-Qwen3-4B-TR shows consistent improvements over the base model across all context length settings, demonstrating the effectiveness of Turkish legal domain adaptation.

Usage

Installation

pip install transformers torch

Text Generation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("newmindai/Mecellem-Qwen3-4B-TR")
model = AutoModelForCausalLM.from_pretrained("newmindai/Mecellem-Qwen3-4B-TR")

# Example prompt
prompt = "Türk hukuk sisteminde sözleşme feshi"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Use Cases

Turkish legal text generation
Legal document summarization
Legal question answering
Legal text completion
Domain-specific language modeling for Turkish legal domain
Retrieval-Augmented Generation (RAG) applications

Acknowledgments

This work was supported by the EuroHPC Joint Undertaking through project etur46 with access to the MareNostrum 5 supercomputer, hosted by Barcelona Supercomputing Center (BSC), Spain. MareNostrum 5 is owned by EuroHPC JU and operated by BSC. We are grateful to the BSC support team for their assistance with job scheduling, environment configuration, and technical guidance throughout the project.

The numerical calculations reported in this work were fully/partially performed at TÜBİTAK ULAKBİM, High Performance and Grid Computing Center (TRUBA resources). The authors gratefully acknowledge the know-how provided by the MINERVA Support for expert guidance and collaboration opportunities in HPC-AI integration.

References

If you use this model, please cite our paper:

@article{mecellem2026,
  title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
  author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, Ömer Can Sağbaş},
  journal={arXiv preprint arXiv:2601.16018},
  year={2026},
  month={January},
  url={https://arxiv.org/abs/2601.16018},
  doi={10.48550/arXiv.2601.16018},
  eprint={2601.16018},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Base Model References

@article{qwen2024,
  title={Qwen3: A Large Language Model Series},
  author={Qwen Team},
  journal={arXiv preprint arXiv:2409.00000},
  year={2024}
}

Downloads last month: 59

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for newmindai/Mecellem-Qwen3-4B-TR

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(672)

this model

Quantizations

3 models

Collection including newmindai/Mecellem-Qwen3-4B-TR

Mecellem Models

Collection

Project: Turkish Embeddings from Scratch and CPT Decoders Infrastructure: MareNostrum 5 (BSC) • 10 items • Updated Mar 2 • 4

Paper for newmindai/Mecellem-Qwen3-4B-TR

Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

Paper • 2601.16018 • Published Jan 22 • 7