# Madlad-400-3B-MT ONNX Optimized This repository contains the optimized ONNX export of the [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) model, optimized for reduced memory consumption following the NLLB optimization approach. ## Model Description - **Base Model**: jbochi/madlad400-3b-mt - **Optimization**: Component separation for reduced RAM usage - **Target**: Mobile and edge deployment - **Format**: ONNX with separated components ## Files Structure ### Optimized Components (`/model/`) - `madlad_encoder.onnx` - Encoder component - `madlad_decoder.onnx` - Decoder component - `madlad_decoder.onnx_data` - Decoder weights data - `tokenizer_config.json` - Tokenizer configuration - `special_tokens_map.json` - Special tokens mapping - `spiece.model` - SentencePiece tokenizer model - `inference_script.py` - Python inference script ### Original Models (`/original_models/`) - Complete original ONNX exports for reference ## Optimization Benefits 1. **Memory Reduction**: Separated shared components to avoid duplication 2. **Mobile Ready**: Optimized for deployment on mobile devices 3. **Modular**: Components can be loaded independently as needed ## Usage ```python # Basic usage with the optimized models from transformers import T5Tokenizer import onnxruntime as ort # Load tokenizer tokenizer = T5Tokenizer.from_pretrained("manancode/madlad400-3b-mt-onnx-optimized", subfolder="model") # Load ONNX models encoder_session = ort.InferenceSession("model/madlad_encoder.onnx") decoder_session = ort.InferenceSession("model/madlad_decoder.onnx") # For detailed inference, see inference_script.py ``` ## Translation Example ```python # Input format: <2xx> text (where xx is target language code) text = "<2pt> I love pizza!" # Translate to Portuguese # Expected output: "Eu amo pizza!" ``` ## Language Codes This model supports translation to 400+ languages. Use the format `<2xx>` where `xx` is the target language code: - `<2pt>` - Portuguese - `<2es>` - Spanish - `<2fr>` - French - `<2de>` - German - And many more... ## Performance Notes - **Original Model Size**: ~3.3B parameters - **Memory Optimization**: Reduced RAM usage through component separation - **Inference Speed**: Optimized for faster generation with separated components ## Technical Details ### Optimization Approach This optimization follows the same principles used for NLLB models: 1. **Component Separation**: Split encoder/decoder into separate files 2. **Weight Deduplication**: Avoid loading shared weights multiple times 3. **Memory Efficiency**: Load only required components during inference ### Export Process The models were exported using: ```bash optimum-cli export onnx --model jbochi/madlad400-3b-mt --task text2text-generation-with-past --optimize O3 ``` ## Requirements ``` torch>=1.9.0 transformers>=4.20.0 onnxruntime>=1.12.0 sentencepiece>=0.1.95 optimum[onnxruntime]>=1.14.0 ``` ## Citation ```bibtex @misc{madlad-onnx-optimized, title={Madlad-400-3B-MT ONNX Optimized}, author={manancode}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/manancode/madlad400-3b-mt-onnx-optimized} } ``` ## Credits - **Base Model**: [jbochi/madlad400-3b-mt](https://huggingface.co/jbochi/madlad400-3b-mt) by @jbochi - **Optimization Technique**: Inspired by NLLB ONNX optimizations - **Export Tools**: HuggingFace Optimum ## License This work is based on the original Madlad-400 model. Please refer to the original model's license terms.