File size: 3,518 Bytes
3d81992 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# Madlad-400-3B-MT ONNX Optimized
This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model,
optimized for reduced memory consumption following the NLLB optimization approach.
## Model Description
- **Base Model**: jbochi/madlad400-3b-mt
- **Optimization**: Component separation for reduced RAM usage
- **Target**: Mobile and edge deployment
- **Format**: ONNX with separated components
## Files Structure
### Optimized Components (`/model/`)
- `madlad_encoder.onnx` - Encoder component
- `madlad_decoder.onnx` - Decoder component
- `madlad_decoder.onnx_data` - Decoder weights data
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `spiece.model` - SentencePiece tokenizer model
- `inference_script.py` - Python inference script
### Original Models (`/original_models/`)
- Complete original ONNX exports for reference
## Optimization Benefits
1. **Memory Reduction**: Separated shared components to avoid duplication
2. **Mobile Ready**: Optimized for deployment on mobile devices
3. **Modular**: Components can be loaded independently as needed
## Usage
```python
# Basic usage with the optimized models
from transformers import T5Tokenizer
import onnxruntime as ort
# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model")
# Load ONNX models
encoder_session = ort.InferenceSession("model/madlad_encoder.onnx")
decoder_session = ort.InferenceSession("model/madlad_decoder.onnx")
# For detailed inference, see inference_script.py
```
## Translation Example
```python
# Input format: <2xx> text (where xx is target language code)
text = "<2pt> I love pizza!" # Translate to Portuguese
# Expected output: "Eu amo pizza!"
```
## Language Codes
This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code:
- `<2pt>` - Portuguese
- `<2es>` - Spanish
- `<2fr>` - French
- `<2de>` - German
- And many more...
## Performance Notes
- **Original Model Size**: ~3.3B parameters
- **Memory Optimization**: Reduced RAM usage through component separation
- **Inference Speed**: Optimized for faster generation with separated components
## Technical Details
### Optimization Approach
This optimization follows the same principles used for NLLB models:
1. **Component Separation**: Split encoder/decoder into separate files
2. **Weight Deduplication**: Avoid loading shared weights multiple times
3. **Memory Efficiency**: Load only required components during inference
### Export Process
The models were exported using:
```bash
optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3
```
## Requirements
```
torch>=1.9.0
transformers>=4.20.0
onnxruntime>=1.12.0
sentencepiece>=0.1.95
optimum[onnxruntime]>=1.14.0
```
## Citation
```bibtex
@misc{madlad-onnx-optimized,
title={Madlad-400-3B-MT ONNX Optimized},
author={manancode},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized}
}
```
## Credits
- **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi
- **Optimization Technique**: Inspired by NLLB ONNX optimizations
- **Export Tools**: HuggingFace Optimum
## License
This work is based on the original Madlad-400 model. Please refer to the original model's license terms.
|