NexaAI
/

phi4-mini-npu-turbo

Text Generation

Model card Files Files and versions

phi4-mini-npu-turbo / README.md

zackli4ai's picture

Update README.md

881feda verified 4 months ago

|

history blame contribute delete

2.69 kB

	---
	pipeline_tag: text-generation
	tags:
	- NPU
	---
	# Phi-4-mini

	Run Phi-4-mini optimized for Qualcomm NPUs with [nexaSDK](https://sdk.nexa.ai).

	## Quickstart

	1. Install nexaSDK and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
	2. Activate your device with your access token:

	```bash
	nexa config set license '<access_token>'
	```
	3. Run the model on Qualcomm NPU in one line:

	```bash
	nexa infer NexaAI/phi4-mini-npu-turbo
	```

	## Model Description

	Phi-4-mini is a \~3.8B-parameter instruction-tuned model from Microsoft’s Phi-4 family.
	Trained on a blend of synthetic “textbook-style” data, filtered public web content, curated books/Q\&A, and high-quality supervised chat data, it emphasizes reasoning-dense capabilities while maintaining a compact footprint. This NPU Turbo build uses Nexa’s Qualcomm backend (QNN/Hexagon) to deliver lower latency and higher throughput on-device, with support for 128K context and efficient long-context memory handling.

	## Features

	* Lightweight yet capable: strong reasoning (math/logic) in a compact 3.8B model.
	* Instruction-following: enhanced SFT + DPO alignment for reliable chat.
	* Content generation: drafting, completion, summarization, code comments, and more.
	* Conversational AI: context-aware assistants/agents with long-context support (128K).
	* NPU-Turbo path: INT8/INT4 quantization, op fusion, and KV-cache residency for Snapdragon® NPUs via nexaSDK.
	* Customizable: fine-tune/adapt for domain-specific or enterprise use.

	## Use Cases

	* Personal & enterprise chatbots
	* On-device/offline assistants (latency-bound scenarios)
	* Document/report/email summarization
	* Education, tutoring, and STEM reasoning tools
	* Vertical applications (e.g., healthcare, finance, legal) with appropriate safeguards

	## Inputs and Outputs

	Input:

	* Text prompts or conversation history (chat-format, tokenized sequences).

	Output:

	* Generated text: responses, explanations, or creative content.
	* Optionally: raw logits/probabilities for advanced downstream tasks.

	## License
	This model is released under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license.
	Non-commercial use, modification, and redistribution are permitted with attribution.
	For commercial licensing, please contact dev@nexa.ai.

	## References
	📰 [Phi-4-mini Microsoft Blog](https://aka.ms/phi4-feb2025) <br>
	📖 [Phi-4-mini Technical Report](https://aka.ms/phi-4-multimodal/techreport) <br>
	👩‍🍳 [Phi Cookbook](https://github.com/microsoft/PhiCookBook) <br>
	🚀 [Model paper](https://huggingface.co/papers/2503.01743)