microsoft/orca-math-word-problems-200k
Viewer • Updated • 200k • 7.66k • 486
This is an ONNX-converted and INT8-quantized version of Microsoft's Phi-3.5-mini-instruct model, optimized for deployment on edge devices and Qualcomm Snapdragon hardware.
✅ ONNX format for cross-platform compatibility
✅ INT8 quantization for reduced memory footprint
✅ Optimized for Qualcomm AI Hub deployment
✅ Includes tokenizer and configuration files
✅ Ready for edge deployment
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/phi-3.5-mini-instruct-onnx")
# Create ONNX Runtime session
providers = ['CPUExecutionProvider'] # or ['CUDAExecutionProvider'] for GPU
session = ort.InferenceSession("model.onnx", providers=providers)
# Prepare input
text = "Hello, how can I help you today?"
inputs = tokenizer(text, return_tensors="np")
# Run inference
outputs = session.run(None, {"input_ids": inputs["input_ids"]})
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model = ORTModelForCausalLM.from_pretrained("your-username/phi-3.5-mini-instruct-onnx")
tokenizer = AutoTokenizer.from_pretrained("your-username/phi-3.5-mini-instruct-onnx")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
This model is optimized for deployment on Qualcomm devices through AI Hub:
model.onnx - Main ONNX model filemodel.onnx_data - Model weights (external data format)tokenizer.json - Fast tokenizerconfig.json - Model configurationspecial_tokens_map.json - Special tokens mappingtokenizer_config.json - Tokenizer configurationIf you use this model, please cite:
@article{phi3,
title={Phi-3 Technical Report},
author={Microsoft},
year={2024}
}
This model is released under the MIT License, same as the original Phi-3.5 model.