# Madlad-400-3B-MT ONNX Optimized

This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model, 
optimized for reduced memory consumption following the NLLB optimization approach.

## Model Description

- **Base Model**: jbochi/madlad400-3b-mt
- **Optimization**: Component separation for reduced RAM usage
- **Target**: Mobile and edge deployment
- **Format**: ONNX with separated components

## Files Structure

### Optimized Components (`/model/`)
- `madlad_encoder.onnx` - Encoder component
- `madlad_decoder.onnx` - Decoder component  
- `madlad_decoder.onnx_data` - Decoder weights data
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `spiece.model` - SentencePiece tokenizer model
- `inference_script.py` - Python inference script

### Original Models (`/original_models/`)
- Complete original ONNX exports for reference

## Optimization Benefits

1. **Memory Reduction**: Separated shared components to avoid duplication
2. **Mobile Ready**: Optimized for deployment on mobile devices
3. **Modular**: Components can be loaded independently as needed

## Usage

```python
# Basic usage with the optimized models
from transformers import T5Tokenizer
import onnxruntime as ort

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model")

# Load ONNX models
encoder_session = ort.InferenceSession("model/madlad_encoder.onnx")
decoder_session = ort.InferenceSession("model/madlad_decoder.onnx")

# For detailed inference, see inference_script.py
```

## Translation Example

```python
# Input format: <2xx> text (where xx is target language code)
text = "<2pt> I love pizza!"  # Translate to Portuguese
# Expected output: "Eu amo pizza!"
```

## Language Codes

This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code:
- `<2pt>` - Portuguese
- `<2es>` - Spanish  
- `<2fr>` - French
- `<2de>` - German
- And many more...

## Performance Notes

- **Original Model Size**: ~3.3B parameters
- **Memory Optimization**: Reduced RAM usage through component separation
- **Inference Speed**: Optimized for faster generation with separated components

## Technical Details

### Optimization Approach

This optimization follows the same principles used for NLLB models:

1. **Component Separation**: Split encoder/decoder into separate files
2. **Weight Deduplication**: Avoid loading shared weights multiple times
3. **Memory Efficiency**: Load only required components during inference

### Export Process

The models were exported using:
```bash
optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3
```

## Requirements

```
torch>=1.9.0
transformers>=4.20.0  
onnxruntime>=1.12.0
sentencepiece>=0.1.95
optimum[onnxruntime]>=1.14.0
```

## Citation

```bibtex
@misc{madlad-onnx-optimized,
  title={Madlad-400-3B-MT ONNX Optimized},
  author={manancode},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized}
}
```

## Credits

- **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi
- **Optimization Technique**: Inspired by NLLB ONNX optimizations
- **Export Tools**: HuggingFace Optimum

## License

This work is based on the original Madlad-400 model. Please refer to the original model's license terms.