--- library_name: transformers license: mit language: bn base_model: meta-llama/Llama-3.1-8B-Instruct --- # Abegi-Llama3 ## Model Card Summary **Abegi-Llama3** is a Bangla-focused large language model fine-tuned from **Meta LLaMA‑3.1‑8B‑Instruct**. The model is optimized for Bangla (bn) conversational text generation and instruction-following tasks, while retaining general-purpose reasoning and generation capabilities inherited from the base model. --- ## Model Details ### Model Description Abegi-Llama3 is a decoder-only Transformer-based causal language model fine-tuned to improve naturalness, fluency, and instruction-following behavior in Bangla. It is suitable for chat-style interactions, content generation, and educational or research use cases involving the Bangla language. * **Developed by:** Promit123546 * **Model type:** Causal Language Model (Decoder-only Transformer) * **Base model:** meta-llama/Llama-3.1-8B-Instruct * **Language(s):** Bangla (bn), with partial English support inherited from the base model * **License:** LLaMA 3 License (inherited from base model) * **Fine-tuned from:** meta-llama/Llama-3.1-8B-Instruct ### Model Sources * **Repository:** [https://huggingface.co/Promit123546/Abegi-Llama3](https://huggingface.co/Promit123546/Abegi-Llama3) * **Base Model:** [https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) --- ## Uses ### Direct Use The model can be used directly for: * Bangla conversational agents and chatbots * Bangla text generation and rewriting * Question answering in Bangla * Educational and experimental NLP applications ### Downstream Use With further fine-tuning, the model can be adapted for: * Domain-specific Bangla assistants (education, customer support, documentation) * Bangla instruction-following systems * Research on low-resource or regional language modeling ### Out-of-Scope Use The model is **not recommended** for: * Medical, legal, or financial decision-making * High-stakes or safety-critical systems * Generating harmful, misleading, or malicious content --- ## Bias, Risks, and Limitations * The model may reflect biases present in the training and fine-tuning data * It can produce hallucinated or incorrect information * Performance may degrade for tasks outside Bangla or conversational generation * Cultural or linguistic nuances may not always be handled perfectly ### Recommendations * Verify critical outputs using trusted external sources * Apply moderation and safety filters in production environments * Avoid use in sensitive or high-risk applications without human oversight --- ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "Promit123546/Abegi-Llama3" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) prompt = "বাংলায় কৃত্রিম বুদ্ধিমত্তা কী?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=150) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Training Details ### Training Data * **Description:** Not publicly disclosed * **Notes:** The model was fine-tuned on curated Bangla and instruction-style text data suitable for conversational generation ### Training Procedure #### Preprocessing * Tokenization using the LLaMA‑3.1 tokenizer * Standard text normalization and prompt–response formatting #### Training Hyperparameters * **Training regime:** Mixed precision (fp16 or bf16) #### Speeds, Sizes, Times * **Checkpoint size:** ~8B parameters (base model) * **Training duration:** Not publicly disclosed --- ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data * Internal and informal Bangla prompt-based evaluation #### Factors * Fluency in Bangla * Instruction adherence * Coherence and relevance #### Metrics * Qualitative human evaluation ### Results The model demonstrates fluent Bangla text generation and stable conversational behavior. No formal benchmark results are currently published. --- ## Model Examination No formal interpretability or probing studies have been conducted. --- ## Environmental Impact Environmental impact metrics were not recorded during training. Carbon emissions may be estimated using the Machine Learning Impact Calculator (Lacoste et al., 2019) if compute details become available. --- ## Technical Specifications ### Model Architecture and Objective * Decoder-only Transformer architecture * Auto-regressive next-token prediction objective ### Compute Infrastructure #### Hardware * Not publicly disclosed #### Software * Python * PyTorch * Hugging Face Transformers --- ## Citation If you use this model, please cite the base LLaMA‑3.1 model and this repository. --- ## Model Card Authors * Promit123546 ## Model Card Contact For questions or issues, please use the Hugging Face model page discussion section.