| | --- |
| | |
| | base_model: microsoft/Phi-4-reasoning-plus |
| |
|
| | --- |
| | This is a quantization of the [Phi-4-reasoning-plus](https://huggingface.co/microsoft/Phi-4-reasoning-plus). |
| |
|
| | Phi-4-reasoning-plus, developed by Microsoft Research, stands out as a state-of-the-art language model specialized in reasoning and logic, particularly excelling in domains like math, science, and coding. Finetuned from the Phi-4 model, it uniquely combines supervised learning with reinforcement learning, enhancing accuracy and offering advanced reasoning capabilities in memory-constrained and latency-sensitive environments. The model generates responses with a distinct two-section format: a detailed reasoning chain-of-thought process followed by a concise solution, ensuring thorough and accurate answers. Despite being relatively compact with 14 billion parameters, it delivers strong performance across a wide range of complex reasoning tasks and demonstrates the capacity to maintain coherence over extended inputs, making it particularly suited for deep, multi-step reasoning applications. |
| | ## Evaluations |
| | This model provides an accuracy recovery of 100.0%. |
| |
|
| | | __English__ | __[Phi-4-reasoning-plus](https://huggingface.co/microsoft/Phi-4-reasoning-plus)__ | __[Phi-4-reasoning-plus-FP8-Dynamic (this)](https://huggingface.co/cortecs/Phi-4-reasoning-plus-FP8-Dynamic)__ | |
| | |:--------------|------------------------------------------------------------------------------------:|-----------------------------------------------------------------------------------------------------------------:| |
| | | Avg. | 70.77 | 70.77 | |
| | | ARC | 65.7 | 65.5 | |
| | | Hellaswag | 69 | 69.5 | |
| | | MMLU | 77.61 | 77.3 | |
| |
|
| | We did not check for data contamination. |
| | Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) with `limit=1000`. |
| | |
| | ## Usage |
| | Install **vLLM** and |
| | run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server): |
| | |
| | ``` |
| | python -m vllm.entrypoints.openai.api_server --model cortecs/Phi-4-reasoning-plus-FP8-Dynamic --max-model-len 32768 --gpu-memory-utilization 0.95 |
| | ``` |
| | Access the model: |
| | ``` |
| | curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' { |
| | "model": "cortecs/Phi-4-reasoning-plus-FP8-Dynamic", |
| | "prompt": "San Francisco is a" |
| | } ' |
| | ``` |
| | |