|
|
--- |
|
|
library_name: peft |
|
|
license: gemma |
|
|
base_model: google/gemma-3n-E2B-it |
|
|
tags: |
|
|
- axolotl |
|
|
- base_model:adapter:google/gemma-3n-E2B-it |
|
|
- lora |
|
|
- transformers |
|
|
- bambara |
|
|
- bamanankan |
|
|
- low-resource-languages |
|
|
- african-languages |
|
|
- multilingual |
|
|
- Mali |
|
|
datasets: |
|
|
- sudoping01/bambara-instructions |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: maliba-llm |
|
|
results: [] |
|
|
language: |
|
|
- bm |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# MALIBA-LLM: Bambara Large Language Model [experimental] |
|
|
|
|
|
MALIBA-LLM is a fine-tuned version of [google/gemma-3n-E2B-it](https://huggingface.co/google/gemma-3n-E2B-it), for instruction-following, text generation, and language understanding in Bambara (Bamanankan). As the first open-source Large Language Model for Bambara language spoken by over 15 million people in West Africa, it aims to enhance AI accessibility for Bambara-speaking communities in education, information retrieval, and digital inclusion. |
|
|
|
|
|
This model supports Bambara with French|English code-switching for technical terms, reflecting natural linguistic patterns. It forms part of the broader MALIBA-AI project to advance AI for Malian languages. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: google/gemma-3n-E2B-it (instruction-tuned, multimodal-capable foundation) |
|
|
- **Adapter**: LoRA (Low-Rank Adaptation) |
|
|
- **Parameters**: Effective 2B (compressed via MatFormer architecture) |
|
|
* **Primary Language**: *Bambara (Bamanankan)* |
|
|
* **Additional Languages**: All languages supported by the Gemma-3n foundation model |
|
|
* **Context Window**: 4,096 tokens |
|
|
* **Core Capabilities**: |
|
|
|
|
|
* Instruction following |
|
|
* Conversational reasoning |
|
|
* Knowledge retrieval |
|
|
* Content generation in Bambara |
|
|
* Translation (Bambara ↔ French/English) |
|
|
* Mathematical reasoning |
|
|
* Coding support |
|
|
* Logical problem-solving |
|
|
* Mali-specific knowledge (history, institutions, administration) |
|
|
* **License**: MIT |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
This model is intended for research and development in low-resource NLP, particularly: |
|
|
|
|
|
- Generating Bambara-language educational and informational content |
|
|
- Enabling conversational AI interfaces for Bambara speakers |
|
|
- Supporting preliminary translation tasks involving Bambara |
|
|
- Facilitating access to AI tools in underserved West African regions |
|
|
- Serving as a base for further fine-tuning in domain-specific applications |
|
|
|
|
|
This work contributes to a broader global push to ensure low-resource languages are not left behind in the AI era. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
As an early-stage model for a low-resource language: |
|
|
|
|
|
-Performance degradation over long conversations, where contextual tracking and coherence may gradually decline |
|
|
- Reliance on French code-switching for advanced vocabulary, which may not align with pure Bambara preferences |
|
|
- Potential grammatical inconsistencies in longer or intricate outputs |
|
|
- Inherited biases from base model and training data; lacks Bambara-specific safeguards |
|
|
- Experimental nature requires output verification for practical use |
|
|
|
|
|
Despite these limitations, the model retains all core capabilities of Gemma-3n, with additional instruction-following and conversational strength in Bambara. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was fine-tuned on a cleaned subset of the MALIBA-Instructions dataset (1M examples). For comprehensive details on the dataset, including sources and preparation, please refer to the [repository](https://huggingface.co/datasets/sudoping01/bambara-instructions). |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluated on a 1% validation split, the model achieved a final validation loss of 0.4952 (93.4% reduction from initial 7.4595). Human assessments by native speakers indicate reasonable quality in conversational and knowledge-based tasks. |
|
|
|
|
|
 |
|
|
## Training Procedure |
|
|
|
|
|
Supervised fine-tuning was conducted with distributed training across 8 devices, peaking at 57.85 GiB memory usage. |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
|
|
|
<details><summary>CONFIGURATIONS</summary> |
|
|
|
|
|
axolotl version: `0.12.2` |
|
|
```yaml |
|
|
base_model: google/gemma-3n-E2B-it |
|
|
hub_model_id: sudoping01/bambara-llm-exp3 |
|
|
plugins: |
|
|
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin |
|
|
cut_cross_entropy: true |
|
|
load_in_4bit: false |
|
|
gradient_checkpointing: true |
|
|
gradient_checkpointing_kwargs: |
|
|
use_reentrant: false |
|
|
ddp: true |
|
|
chat_template: gemma3n |
|
|
eot_tokens: |
|
|
- <end_of_turn> |
|
|
special_tokens: |
|
|
eot_token: <end_of_turn> |
|
|
datasets: |
|
|
- path: sudoping01/bambara-instructions |
|
|
type: chat_template |
|
|
split: train |
|
|
name: cleaned |
|
|
field_messages: messages |
|
|
message_property_mappings: |
|
|
role: role |
|
|
content: content |
|
|
val_set_size: 0.01 |
|
|
output_dir: ./outputs/bambara-gemma3n-lora-exp4 |
|
|
adapter: lora |
|
|
lora_r: 64 |
|
|
lora_alpha: 128 |
|
|
lora_dropout: 0.05 |
|
|
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' |
|
|
sequence_len: 4096 |
|
|
sample_packing: false |
|
|
pad_to_sequence_len: false |
|
|
micro_batch_size: 8 |
|
|
gradient_accumulation_steps: 2 |
|
|
num_epochs: 3 |
|
|
optimizer: adamw_8bit |
|
|
lr_scheduler: cosine |
|
|
learning_rate: 1.2e-4 |
|
|
warmup_ratio: 0.03 |
|
|
weight_decay: 0.01 |
|
|
bf16: auto |
|
|
tf32: false |
|
|
logging_steps: 10 |
|
|
saves_per_epoch: 2 |
|
|
evals_per_epoch: 2 |
|
|
``` |
|
|
|
|
|
</details><br> |
|
|
|
|
|
|
|
|
### Training Results |
|
|
|
|
|
| Epoch | Step | Training Loss | Validation Loss | Memory (GiB) | |
|
|
|-------|-------|---------------|-----------------|--------------| |
|
|
| 0 | 0 | - | 7.4595 | 19.86 | |
|
|
| 0.5 | 3521 | 0.8265 | 0.7787 | 57.85 | |
|
|
| 1.0 | 7042 | 0.7107 | 0.6745 | 57.85 | |
|
|
| 1.5 | 10563 | 0.6363 | 0.6026 | 57.85 | |
|
|
| 2.0 | 14084 | 0.5421 | 0.5429 | 57.85 | |
|
|
| 2.5 | 17605 | 0.5733 | 0.5039 | 57.85 | |
|
|
| 3.0 | 21126 | 0.5401 | 0.4952 | 57.85 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Loading |
|
|
|
|
|
```python |
|
|
from peft import PeftModel, PeftConfig |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "sudoping01/bambara-llm-exp3" |
|
|
config = PeftConfig.from_pretrained(model_name) |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
config.base_model_name_or_path, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) |
|
|
model = PeftModel.from_pretrained(base_model, model_name) |
|
|
``` |
|
|
|
|
|
### Inference |
|
|
|
|
|
Apply the Gemma chat template: |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "user", "content": "I ni ce! I ka kɛnɛ wa?"} |
|
|
] |
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
temperature=1.0, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
This model promotes AI equity for low-resource languages but demands responsible use: |
|
|
|
|
|
- **Cultural Respect**: Outputs may not fully reflect all Bambara dialects or nuances; verify with native speakers. |
|
|
- **Bias Awareness**: Potential propagation of source data biases; not for sensitive or decision-making applications without oversight. |
|
|
- **Accessibility Barriers**: Computational requirements may limit deployment in target regions. |
|
|
- **Misuse Prevention**: Ensure applications align with community needs and ethical standards. |
|
|
|
|
|
## Additional Information |
|
|
|
|
|
This model contributes to the MALIBA project for African AI. Source code and pipelines: [GitHub](https://github.com/sudoping01/instructions-gen). Future enhancements include multimodal integration as outlined in related research. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{diallo2025bambara, |
|
|
title={Bambara Large Language Model}, |
|
|
author={Diallo, Seydou}, |
|
|
journal={Unpublished manuscript}, |
|
|
year={2025}, |
|
|
month={July} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Framework Versions |
|
|
|
|
|
- PEFT 0.17.0 |
|
|
- Transformers 4.55.2 |
|
|
- PyTorch 2.6.0+cu124 |
|
|
- Datasets 4.0.0 |
|
|
- Tokenizers 0.21.4 |