🗣️ Neura-FA-EN-1.9B

Neura-FA-EN-1.9B is an open‑source, ultra‑lightweight bilingual conversational model developed by the Neuracoder team (a leading Iranian AI company). With 1.9 billion parameters and based on the modern Qwen2 architecture (architecture only – not derived from any existing Qwen model), it is specifically designed for natural, fast, and local conversation in Persian (Farsi) and English.

Unlike giant multilingual models that require heavy GPUs or cloud APIs, Neura‑FA‑EN runs smoothly on laptops, CPU‑only systems, and even Raspberry Pi. It gives Persian‑speaking developers, researchers, and hobbyists a powerful offline assistant for everyday questions, simple summarization, informal translation, and general knowledge retrieval – all under a permissive Apache 2.0 license.


neura-fa-en-1.9b

✨ Key Features

  • Truly bilingual (Persian + English) – Understands and generates both languages fluently, with natural code‑switching (e.g., “بگو hello world به انگلیسی”).
  • Ultra‑lightweight – Only 1.9B parameters, ~1.6 GB (FP16) / ~0.9 GB (INT8). Runs on 4 GB RAM devices.
  • Offline & private – No internet connection or API key needed after download.
  • Fast inference – 40–60 tok/s on T4 GPU, 8–12 tok/s on Intel i7 CPU, 2–3 tok/s on Raspberry Pi 4.
  • Long context – 32,768 tokens (≈24,000 Persian words), enough for long conversations or short stories.
  • Iranian‑made, Apache 2.0 – Free for commercial and personal use, with full transparency.
  • Research‑friendly – Released as a research model to help the Persian AI community fine‑tune, quantise, or build upon it.

🎯 Suitable Use Cases

  • Daily chit‑chat – Casual conversation, small talk, jokes, and friendly assistant tasks.
  • Simple Q&A – Answering general knowledge questions (e.g., “پایتخت فرانسه کجاست?” / “What is the capital of France?”).
  • Informal translation – Translating short sentences or phrases between Persian and English (not professional/legal grade).
  • Light summarisation – Summarising a paragraph or a short article in Persian or English.
  • Brainstorming & writing help – Generating ideas, rewriting a sentence, fixing simple grammar.
  • Educational tool for language learning – Practicing Persian or English conversations (basic to intermediate level).
  • Offline assistant for edge devices – Embedded in chatbots, local web UIs, or Telegram bots (simple integration).

❌ Not suitable for:

  • Code generation, debugging, or programming assistance.
  • Complex mathematical reasoning or multi‑step logic.
  • Professional translation (e.g., legal, medical).
  • Long document processing (>32k tokens).
  • Any task requiring up‑to‑date information after mid‑2024.

📊 Evaluation & Performance Metrics

We evaluated Neura‑FA‑EN‑1.9B on standard Persian and English benchmarks for conversational models.

Dataset Metric Score Note
ParsiMMLU (5‑shot) Accuracy 48.7% General knowledge in Persian
PersianQA Exact Match 56.2% Reading comprehension (questions in Persian)
MMLU (English, 5‑shot) Accuracy 51.3% General knowledge in English
XNLI (fa) Accuracy 62.1% Natural language inference (Persian)
XNLI (en) Accuracy 68.5% Natural language inference (English)
Perplexity (fa‑wikitext) PPL 18.3 Fluency on Persian texts

Interpretation: The model performs on par with much larger multilingual models (e.g., XLM‑R 3B) on Persian tasks while being 40% smaller. For English, it stays competitive with dedicated 1.5B models.


📈 Comparison with Similar‑Sized Models

Model Params Persian MMLU English MMLU VRAM (FP16) Speed (tok/s, T4) License
Neura-FA-EN-1.9B 1.9B 48.7% 51.3% ~3.8 GB 48 Apache 2.0
Arian‑2B (Persian) 2.0B 44.2% 28.7% ~4.0 GB 45 Apache 2.0
Phi‑2 (2.7B, English‑only) 2.7B N/A 57.8% ~5.4 GB 40 MIT
Gemma‑2B (English‑only) 2.0B N/A 52.6% ~4.0 GB 52 Gemma

Key points: Neura‑FA‑EN is the only 1.9B model that provides strong performance on both Persian and English.


🧪 Technical Details & Training Process

Built on the Qwen2 architecture (only the architecture, not derived from any existing model) and trained from scratch by Neuracoder.

Architecture

  • Layers: 28 decoder‑only layers.
  • Attention: Grouped Query Attention (GQA) – 12 query heads, 2 key/value heads.
  • Activation: SwiGLU.
  • Context length: 32,768 tokens.
  • Embedding size: 2048.
  • Intermediate size: 5632.

Pre‑training

  • Data: 350 billion tokens – 60% Persian (web texts, books, news, forums), 35% English (common crawl, books, Wikipedia), 5% code (to preserve basic formatting).
  • Duration: 18 days on 8× NVIDIA A100 (80GB) using DeepSpeed ZeRO‑3.
  • Hyperparameters: AdamW (lr=3e-4), cosine decay, warmup 2000 steps, batch size 512, seq len 2048 (later extended to 8192 with RoPE scaling).

Supervised Fine‑Tuning (SFT)

  • Data: 150,000 conversation pairs in Persian and English:
    • 80,000 from public Persian chat datasets (ParsiNLU, FaChat).
    • 50,000 from translated and cleaned ShareGPT data.
    • 20,000 hand‑written by Neuracoder team for natural code‑switching and cultural relevance.
  • Format: {"system": "You are a helpful assistant.", "user": "...", "assistant": "..."}
  • Hyperparameters: 3 epochs, lr=1e-5, batch size 128, LoRA (rank=32) then full fine‑tune last 6 layers.

Validation

  • Every 500 steps evaluated on held‑out Persian and English test sets.
  • Final checkpoint chosen by lowest perplexity on Persian validation and highest MMLU score.

⚡ Inference Speed & Hardware Requirements

Hardware Weight format Avg tokens/sec (gen 256 tokens) Memory usage
NVIDIA A100 (40GB) FP16 78 tok/s 4.1 GB
NVIDIA T4 (16GB) FP16 48 tok/s 3.9 GB
NVIDIA T4 (16GB) INT8 55 tok/s 2.3 GB
NVIDIA GTX 1060 (6GB) FP16 28 tok/s 3.9 GB
CPU (Intel i7-12700K) INT8 9 tok/s 2.1 GB
Raspberry Pi 4 (4GB) INT8 (ONNX) 2–3 tok/s 1.6 GB

Recommendation: Use FP16 on any GPU with 6+ GB VRAM. For CPU or low‑memory devices, use INT8 quantised version (available separately).


🚀 Usage Guide

Installation

pip install transformers torch accelerate sentencepiece

Example 1: Basic Persian conversation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuracoder/neura-fa-en-1.9b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "به نظرت بهترین راه برای یادگیری زبان انگلیسی چیه؟"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Example 2: Mixed Persian‑English query

prompt = "یه جمله انگلیسی بنویس که معنی 'خورشید می‌تابد' رو برسونه"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.6)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example 3: Simple summarisation (English)

article = """The Persian cat is a long-haired breed characterized by its round face and short muzzle. 
It is one of the oldest cat breeds, originating from Persia (modern-day Iran)."""
prompt = f"Summarise the following text in one sentence:\n\n{article}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Download the model directly

git lfs install
git clone https://huggingface.co/neuracoder/neura-fa-en-1.9b

Or via Python:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="neuracoder/neura-fa-en-1.9b", local_dir="./neura-fa-en-1.9b")

⚠️ Limitations

  • Not a code model – Cannot write or debug programs reliably.
  • Not a mathematical engine – Struggles with multi‑step arithmetic or symbolic reasoning.
  • Knowledge cutoff – Mid‑2024. Unaware of very recent events or new APIs.
  • Persian dialect – Trained on standard Persian (Farsi); may not understand Dari or Tajik well.
  • Formal translation – Not suitable for legal, medical, or highly technical documents.
  • Hallucinations – Like all LLMs, may produce plausible but incorrect facts.
  • Context length – While 32k is generous, very long documents may degrade attention quality.

🗺️ Roadmap

  • Q1 2026: Release of quantised versions (INT4, INT8, GGUF) for even lighter deployment.
  • Q2 2026: Neura‑FA‑EN‑3B – 3.5B parameters, expanded Persian vocabulary, improved reasoning.
  • Q3 2026: Fine‑tuned variant for formal translation (Persian ↔ English).
  • Ongoing: Open‑source training datasets (Persian conversational data) and evaluation benchmarks.

🤝 Contribute

This model is free and open‑source. You can help by:

  • Reporting bugs or suggesting improvements in the Discussions tab.
  • Providing high‑quality Persian conversational data (anonymised) to improve future versions.
  • Building tools (Gradio UI, Ollama modelfile, Telegram bot) using this model.
  • Financial sponsorship – Contact the Neuracoder team.
  • Spreading the word – Every user helps the Persian AI community grow.

📜 License

Apache License 2.0 – You may freely use, modify, distribute, and even sell this model as part of your product, provided you include the original license and copyright notice. No other restrictions.


📞 Contact


ساخته شده با ❤️ در ایران – تیم neuracoder
دموکراتیزه کردن هوش مصنوعی مکالمه‌ای برای فارسی‌زبانان، سریع، محلی و رایگان برای همه.

Downloads last month
100
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support