AI & ML interests

None defined yet.

Recent Activity

Organization Card

গড়েইপা লৌশিং (AI) | Bishnupriya Manipuri Language Development Project

Building open-source AI for 500,000+ Bishnupriya Manipuri speakers worldwide

Bishnupriya Manipuri (BPY) is spoken across Assam, Tripura, Manipur, and Bangladesh. Despite its rich cultural heritage, it has zero support in Google Translate, Microsoft Translator, or any major AI model.

We are changing that.

🎯 Our Mission

Develop and maintain open-source NLP resources for Bishnupriya Manipuri, including:

  • Machine Translation - English ↔ BPY
  • Language Models - LLM pretraining and fine-tuning
  • Training Datasets - Parallel corpora, monolingual text, evaluation sets
  • Tools & Research - Tokenizers, OCR, speech models

All resources are free, open-source, and community-driven under MIT/Apache 2.0 licenses.

🚀 Our Models

NLLB Bishnupriya Manipuri Series

Fine-tuned versions of Meta AI's NLLB-200 for English → BPY translation.

Model Status Notes
nllb-bpy-beng-v8-5-3-merged ✅ Production Latest stable. 95%+ accuracy. Fixes number+noun patterns. Live endpoint available.
nllb-bpy-beng-v8-5-3 Adapter LoRA weights for fine-tuning. Requires base NLLB-200.
nllb-bpy-beng-v9-0 🔨 In Progress Training on Wikipedia + scanned book corpus. Target: 10k+ pairs.

Live Demo: https://manipuri.com/articles/bpy.php#translator

Key Fixes in V8.5.3:

  • Fifty books → য়াংখেইহান লেরিক (not লেরিকহান লেরিকহান)
  • Uses ben_Beng token for proper Bengali script rendering
  • 1000x weighted number+noun patterns eliminate repetition bugs

📊 Datasets

Coming Soon: BPY Training Data Repository

We are building the first comprehensive open dataset for Bishnupriya Manipuri:

Planned releases:

  1. bpy-parallel-v1 - 10k+ English↔BPY sentence pairs from Wikipedia, books, community
  2. bpy-monolingual-v1 - 100k+ BPY sentences for LM pretraining
  3. bpy-eval-v1 - Standard test set for MT evaluation

Current sources:

  • BPY Wikipedia: 25,958 articles → extracting clean sentences
  • Scanned books: Community contributions, OCR pipeline ready
  • Community submissions: Accepting parallel text at [contact link]

Want to contribute data? See (CONTRIBUTING) or email us.

🤝 Collaborate With Us

We welcome researchers, developers, and BPY speakers to join:

We need help with:

  1. Data Collection - Scan books, transcribe text, translate sentences
  2. Model Training - Fine-tune LLMs, experiment with architectures
  3. Evaluation - Build test sets, human eval, error analysis
  4. Tools - OCR for Bengali script, tokenizers, text normalization
  5. Applications - Chatbots, TTS, ASR, educational tools

Tech Stack: PyTorch, Transformers, PEFT/LoRA, Hugging Face Hub, PHP/Python

Join the Community

📚 Resources & Research

Base Models Used:

Papers & Docs:

  • NLLB: No Language Left Behind [Meta AI, 2022]
  • Fine-tuning methodology: [Technical report coming soon]

Language Info:

  • ISO 639-3: bpy
  • Script: Bengali (ben_Beng)
  • Speakers: ~500,000
  • Regions: Assam, Tripura, Manipur (India), Bangladesh

📜 License & Citation

All models and datasets released under MIT License - free for commercial use.

If you use our work, please cite:

@misc{bishnupriya-manipuri-nllb-2026,
  title={NLLB Bishnupriya Manipuri: Open-Source Machine Translation},
  author={Bishnupriya Manipuri Language Development Project},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/BishnupriyaManipuri}
}

থাকাত | Thank you for supporting low-resource language AI.

গড়েইপা লৌশিং (AI) - Let's build AI together.

datasets 0

None public yet