Spaces:

enacimie
/

llama-php

Sleeping

App Files Files Community

llama-php / README.md

enacimie

Update README.md

1e73e89 verified about 2 months ago

preview code

raw

history blame contribute delete

2.71 kB

	---
	title: Llama-PHP
	emoji: 🏆
	colorFrom: green
	colorTo: green
	sdk: docker
	pinned: true
	short_description: Llama-PHP Demo
	license: mit
	thumbnail: >-
	https://cdn-uploads.huggingface.co/production/uploads/64d4ca84887f55fb6ee86b87/AkUWiFa4keIpuIbQy2gFm.png
	---

	# 🏆 Llama PHP Demo

	![Llama PHP](https://img.shields.io/badge/PHP-8.1%2B-777BB4?style=flat-square&logo=php)
	![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)

	This Hugging Face Space demonstrates [llama.php](https://github.com/enacimie/llama-php), a robust PHP wrapper for executing local Large Language Models using `llama.cpp` as the inference engine.

	## 🌟 About llama.php

	llama.php is a modular, and productive PHP wrapper that lets you run Large Language Models completely offline. With a clean API similar to OpenAI or Hugging Face but 100% self-contained, it brings the power of LLMs to PHP applications without external dependencies.

	## ✨ Features Demonstrated

	- Local Inference: Runs completely offline using CPU
	- GGUF Support: Works with quantized models (Q4_K_M, Q5_K_S, etc.)
	- Chat Templates: Includes templates for Qwen, Llama 3, Mistral, and more
	- Text Generation: Generate responses to prompts
	- Embeddings: Create vector embeddings from text
	- JSON Output: Force structured JSON output with schema validation
	- Secure Execution: Proper shell argument escaping to prevent injection

	## 🚀 How to Use This Demo

	1. Text Generation: Enter a prompt in the text box and click "Generate"
	2. Chat Mode: Start a conversation with the model in chat interface
	3. Embedding Demo: Convert text to vector embeddings
	4. JSON Mode: Generate structured JSON output based on a schema

	Adjust parameters like temperature, max tokens, and top-p to control the generation behavior.

	## ⚙️ Technical Details

	- PHP Version: 8.2
	- Inference Engine: llama.cpp
	- Model: Qwen3-0.6B-Q4_K_M (quantized for efficient CPU inference)
	- Embedding Model: Qwen3-Embedding-0.6B-Q4_K_M
	- Docker Base: Custom image with PHP 8.2 and llama.cpp binaries

	## 🤝 Credits

	This demo is powered by [llama.php](https://github.com/enacimie/llama-php) created by Eduardo Nacimiento-García.

	- GitHub Repository: https://github.com/enacimie/llama-php
	- Original Models: Qwen3 series from Alibaba
	- Inference Backend: https://github.com/ggerganov/llama.cpp

	## 📜 License

	This demo and the underlying llama.php library are released under the MIT License.

	---

	Note: Due to resource limitations on Hugging Face Spaces, generation might be slower than on dedicated hardware. The model runs entirely on CPU with limited context window size.