|
|
--- |
|
|
|
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: bn |
|
|
base_model: meta-llama/Llama-3.1-8B-Instruct |
|
|
|
|
|
--- |
|
|
|
|
|
# Abegi-Llama3 |
|
|
|
|
|
## Model Card Summary |
|
|
|
|
|
**Abegi-Llama3** is a Bangla-focused large language model fine-tuned from **Meta LLaMA‑3.1‑8B‑Instruct**. The model is optimized for Bangla (bn) conversational text generation and instruction-following tasks, while retaining general-purpose reasoning and generation capabilities inherited from the base model. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
Abegi-Llama3 is a decoder-only Transformer-based causal language model fine-tuned to improve naturalness, fluency, and instruction-following behavior in Bangla. It is suitable for chat-style interactions, content generation, and educational or research use cases involving the Bangla language. |
|
|
|
|
|
* **Developed by:** Promit123546 |
|
|
* **Model type:** Causal Language Model (Decoder-only Transformer) |
|
|
* **Base model:** meta-llama/Llama-3.1-8B-Instruct |
|
|
* **Language(s):** Bangla (bn), with partial English support inherited from the base model |
|
|
* **License:** LLaMA 3 License (inherited from base model) |
|
|
* **Fine-tuned from:** meta-llama/Llama-3.1-8B-Instruct |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
* **Repository:** [https://huggingface.co/Promit123546/Abegi-Llama3](https://huggingface.co/Promit123546/Abegi-Llama3) |
|
|
* **Base Model:** [https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model can be used directly for: |
|
|
|
|
|
* Bangla conversational agents and chatbots |
|
|
* Bangla text generation and rewriting |
|
|
* Question answering in Bangla |
|
|
* Educational and experimental NLP applications |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
With further fine-tuning, the model can be adapted for: |
|
|
|
|
|
* Domain-specific Bangla assistants (education, customer support, documentation) |
|
|
* Bangla instruction-following systems |
|
|
* Research on low-resource or regional language modeling |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
The model is **not recommended** for: |
|
|
|
|
|
* Medical, legal, or financial decision-making |
|
|
* High-stakes or safety-critical systems |
|
|
* Generating harmful, misleading, or malicious content |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* The model may reflect biases present in the training and fine-tuning data |
|
|
* It can produce hallucinated or incorrect information |
|
|
* Performance may degrade for tasks outside Bangla or conversational generation |
|
|
* Cultural or linguistic nuances may not always be handled perfectly |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
* Verify critical outputs using trusted external sources |
|
|
* Apply moderation and safety filters in production environments |
|
|
* Avoid use in sensitive or high-risk applications without human oversight |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_id = "Promit123546/Abegi-Llama3" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
|
|
prompt = "বাংলায় কৃত্রিম বুদ্ধিমত্তা কী?" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=150) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
* **Description:** Not publicly disclosed |
|
|
* **Notes:** The model was fine-tuned on curated Bangla and instruction-style text data suitable for conversational generation |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### Preprocessing |
|
|
|
|
|
* Tokenization using the LLaMA‑3.1 tokenizer |
|
|
* Standard text normalization and prompt–response formatting |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
* **Training regime:** Mixed precision (fp16 or bf16) |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
|
|
|
* **Checkpoint size:** ~8B parameters (base model) |
|
|
* **Training duration:** Not publicly disclosed |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
* Internal and informal Bangla prompt-based evaluation |
|
|
|
|
|
#### Factors |
|
|
|
|
|
* Fluency in Bangla |
|
|
* Instruction adherence |
|
|
* Coherence and relevance |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
* Qualitative human evaluation |
|
|
|
|
|
### Results |
|
|
|
|
|
The model demonstrates fluent Bangla text generation and stable conversational behavior. No formal benchmark results are currently published. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Examination |
|
|
|
|
|
No formal interpretability or probing studies have been conducted. |
|
|
|
|
|
--- |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
Environmental impact metrics were not recorded during training. |
|
|
|
|
|
Carbon emissions may be estimated using the Machine Learning Impact Calculator (Lacoste et al., 2019) if compute details become available. |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
* Decoder-only Transformer architecture |
|
|
* Auto-regressive next-token prediction objective |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
#### Hardware |
|
|
|
|
|
* Not publicly disclosed |
|
|
|
|
|
#### Software |
|
|
|
|
|
* Python |
|
|
* PyTorch |
|
|
* Hugging Face Transformers |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the base LLaMA‑3.1 model and this repository. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
* Promit123546 |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions or issues, please use the Hugging Face model page discussion section. |
|
|
|