File size: 3,518 Bytes
3d81992
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

# Madlad-400-3B-MT ONNX Optimized

This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model, 
optimized for reduced memory consumption following the NLLB optimization approach.

## Model Description

- **Base Model**: jbochi/madlad400-3b-mt
- **Optimization**: Component separation for reduced RAM usage
- **Target**: Mobile and edge deployment
- **Format**: ONNX with separated components

## Files Structure

### Optimized Components (`/model/`)
- `madlad_encoder.onnx` - Encoder component
- `madlad_decoder.onnx` - Decoder component  
- `madlad_decoder.onnx_data` - Decoder weights data
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `spiece.model` - SentencePiece tokenizer model
- `inference_script.py` - Python inference script

### Original Models (`/original_models/`)
- Complete original ONNX exports for reference

## Optimization Benefits

1. **Memory Reduction**: Separated shared components to avoid duplication
2. **Mobile Ready**: Optimized for deployment on mobile devices
3. **Modular**: Components can be loaded independently as needed

## Usage

```python
# Basic usage with the optimized models
from transformers import T5Tokenizer
import onnxruntime as ort

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model")

# Load ONNX models
encoder_session = ort.InferenceSession("model/madlad_encoder.onnx")
decoder_session = ort.InferenceSession("model/madlad_decoder.onnx")

# For detailed inference, see inference_script.py
```

## Translation Example

```python
# Input format: <2xx> text (where xx is target language code)
text = "<2pt> I love pizza!"  # Translate to Portuguese
# Expected output: "Eu amo pizza!"
```

## Language Codes

This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code:
- `<2pt>` - Portuguese
- `<2es>` - Spanish  
- `<2fr>` - French
- `<2de>` - German
- And many more...

## Performance Notes

- **Original Model Size**: ~3.3B parameters
- **Memory Optimization**: Reduced RAM usage through component separation
- **Inference Speed**: Optimized for faster generation with separated components

## Technical Details

### Optimization Approach

This optimization follows the same principles used for NLLB models:

1. **Component Separation**: Split encoder/decoder into separate files
2. **Weight Deduplication**: Avoid loading shared weights multiple times
3. **Memory Efficiency**: Load only required components during inference

### Export Process

The models were exported using:
```bash
optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3
```

## Requirements

```
torch>=1.9.0
transformers>=4.20.0  
onnxruntime>=1.12.0
sentencepiece>=0.1.95
optimum[onnxruntime]>=1.14.0
```

## Citation

```bibtex
@misc{madlad-onnx-optimized,
  title={Madlad-400-3B-MT ONNX Optimized},
  author={manancode},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized}
}
```

## Credits

- **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi
- **Optimization Technique**: Inspired by NLLB ONNX optimizations
- **Export Tools**: HuggingFace Optimum

## License

This work is based on the original Madlad-400 model. Please refer to the original model's license terms.