| library_name: onnx | |
| tags: | |
| - text-generation | |
| - phi-3 | |
| - onnx | |
| - int4 | |
| - cpu | |
| - onnx | |
| - inference4j | |
| license: mit | |
| pipeline_tag: text-generation | |
| # Phi-3-mini-4k-instruct — ONNX (INT4) | |
| INT4-quantized ONNX export of [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), a 3.8B-parameter lightweight language model from Microsoft. Optimized for CPU inference with int4 RTN block-32 quantization. | |
| Mirrored for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java. | |
| ## Original Source | |
| - **Repository:** [Microsoft](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) | |
| - **License:** mit | |
| ## Usage with inference4j | |
| ```java | |
| try (TextGenerator gen = TextGenerator.builder().build()) { | |
| GenerationResult result = gen.generate("What is Java in one sentence?"); | |
| System.out.println(result.text()); | |
| } | |
| ``` | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | Architecture | Phi-3 (3.8B parameters, 32 layers, 3072 hidden) | | |
| | Task | Text generation / chat | | |
| | Context length | 4096 tokens | | |
| | Quantization | INT4 RTN block-32 acc-level-4 | | |
| | Original framework | PyTorch (transformers) | | |
| ## License | |
| This model is licensed under the [MIT License](https://opensource.org/licenses/MIT). Original model by [Microsoft](https://huggingface.co/microsoft). | |