| | --- |
| | title: Llama-PHP |
| | emoji: 🏆 |
| | colorFrom: green |
| | colorTo: green |
| | sdk: docker |
| | pinned: true |
| | short_description: Llama-PHP Demo |
| | license: mit |
| | thumbnail: >- |
| | https://cdn-uploads.huggingface.co/production/uploads/64d4ca84887f55fb6ee86b87/AkUWiFa4keIpuIbQy2gFm.png |
| | --- |
| | |
| | # 🏆 Llama PHP Demo |
| |
|
| |  |
| |  |
| |
|
| | This Hugging Face Space demonstrates **[llama.php](https://github.com/enacimie/llama-php)**, a robust PHP wrapper for executing local Large Language Models using `llama.cpp` as the inference engine. |
| |
|
| | ## 🌟 About llama.php |
| |
|
| | llama.php is a modular, and productive PHP wrapper that lets you run Large Language Models completely offline. With a clean API similar to OpenAI or Hugging Face but 100% self-contained, it brings the power of LLMs to PHP applications without external dependencies. |
| |
|
| | ## ✨ Features Demonstrated |
| |
|
| | - **Local Inference**: Runs completely offline using CPU |
| | - **GGUF Support**: Works with quantized models (Q4_K_M, Q5_K_S, etc.) |
| | - **Chat Templates**: Includes templates for Qwen, Llama 3, Mistral, and more |
| | - **Text Generation**: Generate responses to prompts |
| | - **Embeddings**: Create vector embeddings from text |
| | - **JSON Output**: Force structured JSON output with schema validation |
| | - **Secure Execution**: Proper shell argument escaping to prevent injection |
| |
|
| | ## 🚀 How to Use This Demo |
| |
|
| | 1. **Text Generation**: Enter a prompt in the text box and click "Generate" |
| | 2. **Chat Mode**: Start a conversation with the model in chat interface |
| | 3. **Embedding Demo**: Convert text to vector embeddings |
| | 4. **JSON Mode**: Generate structured JSON output based on a schema |
| |
|
| | Adjust parameters like temperature, max tokens, and top-p to control the generation behavior. |
| |
|
| | ## ⚙️ Technical Details |
| |
|
| | - **PHP Version**: 8.2 |
| | - **Inference Engine**: llama.cpp |
| | - **Model**: Qwen3-0.6B-Q4_K_M (quantized for efficient CPU inference) |
| | - **Embedding Model**: Qwen3-Embedding-0.6B-Q4_K_M |
| | - **Docker Base**: Custom image with PHP 8.2 and llama.cpp binaries |
| |
|
| | ## 🤝 Credits |
| |
|
| | This demo is powered by **[llama.php](https://github.com/enacimie/llama-php)** created by Eduardo Nacimiento-García. |
| |
|
| | - **GitHub Repository**: https://github.com/enacimie/llama-php |
| | - **Original Models**: Qwen3 series from Alibaba |
| | - **Inference Backend**: https://github.com/ggerganov/llama.cpp |
| |
|
| | ## 📜 License |
| |
|
| | This demo and the underlying llama.php library are released under the MIT License. |
| |
|
| | --- |
| |
|
| | *Note: Due to resource limitations on Hugging Face Spaces, generation might be slower than on dedicated hardware. The model runs entirely on CPU with limited context window size.* |