Spaces:
Paused
Paused
| license: mit | |
| tags: | |
| - amop-optimized | |
| - onnx | |
| # AMOP-Optimized ONNX Model: {repo_name} | |
| This model was automatically optimized for CPU inference using the **Adaptive Model Optimization Pipeline (AMOP)**. | |
| - **Base Model:** [{model_id}](https://huggingface.co/{model_id}) | |
| - **Optimization Date:** {optimization_date} | |
| ## Optimization Details | |
| The following AMOP ONNX pipeline stages were applied: | |
| - **Pruning:** {pruning_status} (Percentage: {pruning_percent}%) | |
| - **Quantization & ONNX Conversion:** Enabled ({quant_type} Quantization) | |
| ## How to Use | |
| This model is in ONNX format and can be run with `optimum-onnxruntime`. Make sure you have `optimum`, `onnxruntime`, and `transformers` installed. | |
| ```python | |
| from optimum.onnxruntime import ORTModelForCausalLM | |
| from transformers import AutoTokenizer | |
| model_id = "{repo_id}" | |
| model = ORTModelForCausalLM.from_pretrained(model_id) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| prompt = "The future of AI is" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| gen_tokens = model.generate(**inputs) | |
| print(tokenizer.batch_decode(gen_tokens)) | |
| ``` | |
| ## AMOP Pipeline Log | |
| <details> | |
| <summary>Click to expand</summary> | |
| ``` | |
| {pipeline_log} | |
| ``` | |
| </details> |