Instructions to use neuracoder/neura-fa-en-1.9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use neuracoder/neura-fa-en-1.9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="neuracoder/neura-fa-en-1.9b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("neuracoder/neura-fa-en-1.9b") model = AutoModelForCausalLM.from_pretrained("neuracoder/neura-fa-en-1.9b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use neuracoder/neura-fa-en-1.9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "neuracoder/neura-fa-en-1.9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuracoder/neura-fa-en-1.9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/neuracoder/neura-fa-en-1.9b
- SGLang
How to use neuracoder/neura-fa-en-1.9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "neuracoder/neura-fa-en-1.9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuracoder/neura-fa-en-1.9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "neuracoder/neura-fa-en-1.9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuracoder/neura-fa-en-1.9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use neuracoder/neura-fa-en-1.9b with Docker Model Runner:
docker model run hf.co/neuracoder/neura-fa-en-1.9b
🗣️ Neura-FA-EN-1.9B
Neura-FA-EN-1.9B is an open‑source, ultra‑lightweight bilingual conversational model developed by the Neuracoder team (a leading Iranian AI company). With 1.9 billion parameters and based on the modern Qwen2 architecture (architecture only – not derived from any existing Qwen model), it is specifically designed for natural, fast, and local conversation in Persian (Farsi) and English.
Unlike giant multilingual models that require heavy GPUs or cloud APIs, Neura‑FA‑EN runs smoothly on laptops, CPU‑only systems, and even Raspberry Pi. It gives Persian‑speaking developers, researchers, and hobbyists a powerful offline assistant for everyday questions, simple summarization, informal translation, and general knowledge retrieval – all under a permissive Apache 2.0 license.
✨ Key Features
- Truly bilingual (Persian + English) – Understands and generates both languages fluently, with natural code‑switching (e.g., “بگو hello world به انگلیسی”).
- Ultra‑lightweight – Only 1.9B parameters, ~1.6 GB (FP16) / ~0.9 GB (INT8). Runs on 4 GB RAM devices.
- Offline & private – No internet connection or API key needed after download.
- Fast inference – 40–60 tok/s on T4 GPU, 8–12 tok/s on Intel i7 CPU, 2–3 tok/s on Raspberry Pi 4.
- Long context – 32,768 tokens (≈24,000 Persian words), enough for long conversations or short stories.
- Iranian‑made, Apache 2.0 – Free for commercial and personal use, with full transparency.
- Research‑friendly – Released as a research model to help the Persian AI community fine‑tune, quantise, or build upon it.
🎯 Suitable Use Cases
- Daily chit‑chat – Casual conversation, small talk, jokes, and friendly assistant tasks.
- Simple Q&A – Answering general knowledge questions (e.g., “پایتخت فرانسه کجاست?” / “What is the capital of France?”).
- Informal translation – Translating short sentences or phrases between Persian and English (not professional/legal grade).
- Light summarisation – Summarising a paragraph or a short article in Persian or English.
- Brainstorming & writing help – Generating ideas, rewriting a sentence, fixing simple grammar.
- Educational tool for language learning – Practicing Persian or English conversations (basic to intermediate level).
- Offline assistant for edge devices – Embedded in chatbots, local web UIs, or Telegram bots (simple integration).
❌ Not suitable for:
- Code generation, debugging, or programming assistance.
- Complex mathematical reasoning or multi‑step logic.
- Professional translation (e.g., legal, medical).
- Long document processing (>32k tokens).
- Any task requiring up‑to‑date information after mid‑2024.
📊 Evaluation & Performance Metrics
We evaluated Neura‑FA‑EN‑1.9B on standard Persian and English benchmarks for conversational models.
| Dataset | Metric | Score | Note |
|---|---|---|---|
| ParsiMMLU (5‑shot) | Accuracy | 48.7% | General knowledge in Persian |
| PersianQA | Exact Match | 56.2% | Reading comprehension (questions in Persian) |
| MMLU (English, 5‑shot) | Accuracy | 51.3% | General knowledge in English |
| XNLI (fa) | Accuracy | 62.1% | Natural language inference (Persian) |
| XNLI (en) | Accuracy | 68.5% | Natural language inference (English) |
| Perplexity (fa‑wikitext) | PPL | 18.3 | Fluency on Persian texts |
Interpretation: The model performs on par with much larger multilingual models (e.g., XLM‑R 3B) on Persian tasks while being 40% smaller. For English, it stays competitive with dedicated 1.5B models.
📈 Comparison with Similar‑Sized Models
| Model | Params | Persian MMLU | English MMLU | VRAM (FP16) | Speed (tok/s, T4) | License |
|---|---|---|---|---|---|---|
| Neura-FA-EN-1.9B | 1.9B | 48.7% | 51.3% | ~3.8 GB | 48 | Apache 2.0 |
| Arian‑2B (Persian) | 2.0B | 44.2% | 28.7% | ~4.0 GB | 45 | Apache 2.0 |
| Phi‑2 (2.7B, English‑only) | 2.7B | N/A | 57.8% | ~5.4 GB | 40 | MIT |
| Gemma‑2B (English‑only) | 2.0B | N/A | 52.6% | ~4.0 GB | 52 | Gemma |
Key points: Neura‑FA‑EN is the only 1.9B model that provides strong performance on both Persian and English.
🧪 Technical Details & Training Process
Built on the Qwen2 architecture (only the architecture, not derived from any existing model) and trained from scratch by Neuracoder.
Architecture
- Layers: 28 decoder‑only layers.
- Attention: Grouped Query Attention (GQA) – 12 query heads, 2 key/value heads.
- Activation: SwiGLU.
- Context length: 32,768 tokens.
- Embedding size: 2048.
- Intermediate size: 5632.
Pre‑training
- Data: 350 billion tokens – 60% Persian (web texts, books, news, forums), 35% English (common crawl, books, Wikipedia), 5% code (to preserve basic formatting).
- Duration: 18 days on 8× NVIDIA A100 (80GB) using DeepSpeed ZeRO‑3.
- Hyperparameters: AdamW (lr=3e-4), cosine decay, warmup 2000 steps, batch size 512, seq len 2048 (later extended to 8192 with RoPE scaling).
Supervised Fine‑Tuning (SFT)
- Data: 150,000 conversation pairs in Persian and English:
- 80,000 from public Persian chat datasets (ParsiNLU, FaChat).
- 50,000 from translated and cleaned ShareGPT data.
- 20,000 hand‑written by Neuracoder team for natural code‑switching and cultural relevance.
- Format:
{"system": "You are a helpful assistant.", "user": "...", "assistant": "..."} - Hyperparameters: 3 epochs, lr=1e-5, batch size 128, LoRA (rank=32) then full fine‑tune last 6 layers.
Validation
- Every 500 steps evaluated on held‑out Persian and English test sets.
- Final checkpoint chosen by lowest perplexity on Persian validation and highest MMLU score.
⚡ Inference Speed & Hardware Requirements
| Hardware | Weight format | Avg tokens/sec (gen 256 tokens) | Memory usage |
|---|---|---|---|
| NVIDIA A100 (40GB) | FP16 | 78 tok/s | 4.1 GB |
| NVIDIA T4 (16GB) | FP16 | 48 tok/s | 3.9 GB |
| NVIDIA T4 (16GB) | INT8 | 55 tok/s | 2.3 GB |
| NVIDIA GTX 1060 (6GB) | FP16 | 28 tok/s | 3.9 GB |
| CPU (Intel i7-12700K) | INT8 | 9 tok/s | 2.1 GB |
| Raspberry Pi 4 (4GB) | INT8 (ONNX) | 2–3 tok/s | 1.6 GB |
Recommendation: Use FP16 on any GPU with 6+ GB VRAM. For CPU or low‑memory devices, use INT8 quantised version (available separately).
🚀 Usage Guide
Installation
pip install transformers torch accelerate sentencepiece
Example 1: Basic Persian conversation
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "neuracoder/neura-fa-en-1.9b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = "به نظرت بهترین راه برای یادگیری زبان انگلیسی چیه؟"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Example 2: Mixed Persian‑English query
prompt = "یه جمله انگلیسی بنویس که معنی 'خورشید میتابد' رو برسونه"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.6)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Example 3: Simple summarisation (English)
article = """The Persian cat is a long-haired breed characterized by its round face and short muzzle.
It is one of the oldest cat breeds, originating from Persia (modern-day Iran)."""
prompt = f"Summarise the following text in one sentence:\n\n{article}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Download the model directly
git lfs install
git clone https://huggingface.co/neuracoder/neura-fa-en-1.9b
Or via Python:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="neuracoder/neura-fa-en-1.9b", local_dir="./neura-fa-en-1.9b")
⚠️ Limitations
- Not a code model – Cannot write or debug programs reliably.
- Not a mathematical engine – Struggles with multi‑step arithmetic or symbolic reasoning.
- Knowledge cutoff – Mid‑2024. Unaware of very recent events or new APIs.
- Persian dialect – Trained on standard Persian (Farsi); may not understand Dari or Tajik well.
- Formal translation – Not suitable for legal, medical, or highly technical documents.
- Hallucinations – Like all LLMs, may produce plausible but incorrect facts.
- Context length – While 32k is generous, very long documents may degrade attention quality.
🗺️ Roadmap
- Q1 2026: Release of quantised versions (INT4, INT8, GGUF) for even lighter deployment.
- Q2 2026: Neura‑FA‑EN‑3B – 3.5B parameters, expanded Persian vocabulary, improved reasoning.
- Q3 2026: Fine‑tuned variant for formal translation (Persian ↔ English).
- Ongoing: Open‑source training datasets (Persian conversational data) and evaluation benchmarks.
🤝 Contribute
This model is free and open‑source. You can help by:
- Reporting bugs or suggesting improvements in the Discussions tab.
- Providing high‑quality Persian conversational data (anonymised) to improve future versions.
- Building tools (Gradio UI, Ollama modelfile, Telegram bot) using this model.
- Financial sponsorship – Contact the Neuracoder team.
- Spreading the word – Every user helps the Persian AI community grow.
📜 License
Apache License 2.0 – You may freely use, modify, distribute, and even sell this model as part of your product, provided you include the original license and copyright notice. No other restrictions.
📞 Contact
- Website: neuracoder.net (coming soon)
- Email: info@neuracoder.net
- Telegram: @Neuracoder
- GitHub: github.com/neura_coder
ساخته شده با ❤️ در ایران – تیم neuracoder
دموکراتیزه کردن هوش مصنوعی مکالمهای برای فارسیزبانان، سریع، محلی و رایگان برای همه.
- Downloads last month
- 100
