Spaces:

lyangas
/

free_llm_structure_output_docker

Sleeping

App Files Files Community

free_llm_structure_output_docker / README.md

lyangas

init repo

b269c5d 4 months ago

preview code

raw

history blame

4.45 kB

metadata

title: LLM Structured Output Docker
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Get structured JSON responses from LLM using Docker
tags:
  - llama-cpp
  - gguf
  - json-schema
  - structured-output
  - llm
  - docker
  - gradio

🤖 LLM Structured Output (Docker Version)

Dockerized application for getting structured responses from local GGUF language models in specified JSON format.

✨ Key Features

Docker containerized for easy deployment on HuggingFace Spaces
Local GGUF model support via llama-cpp-python
Optimized for containers with configurable resources
JSON schema support for structured output
Gradio web interface for convenient interaction
REST API for integration with other applications
Memory efficient with GGUF quantized models

🚀 Deployment on HuggingFace Spaces

This version is specifically designed for HuggingFace Spaces with Docker SDK:

Clone this repository
Push to HuggingFace Spaces with sdk: docker in README.md
The application will automatically build and deploy

🐳 Local Docker Usage

Build the image:

docker build -t llm-structured-output .

Run the container:

docker run -p 7860:7860 -e MODEL_REPO="lmstudio-community/gemma-3n-E4B-it-text-GGUF" llm-structured-output

With custom configuration:

docker run -p 7860:7860 \
  -e MODEL_REPO="lmstudio-community/gemma-3n-E4B-it-text-GGUF" \
  -e MODEL_FILENAME="gemma-3n-E4B-it-Q8_0.gguf" \
  -e N_CTX="4096" \
  -e MAX_NEW_TOKENS="512" \
  llm-structured-output

🌐 Application Access

Web interface: http://localhost:7860
API: Available through the same port
Health check: http://localhost:7860/health (when API mode is enabled)

📝 Environment Variables

Configure the application using environment variables:

Variable	Default	Description
`MODEL_REPO`	`lmstudio-community/gemma-3n-E4B-it-text-GGUF`	HuggingFace model repository
`MODEL_FILENAME`	`gemma-3n-E4B-it-Q8_0.gguf`	Model file name
`N_CTX`	`4096`	Context window size
`N_GPU_LAYERS`	`0`	GPU layers (0 for CPU-only)
`N_THREADS`	`4`	CPU threads
`MAX_NEW_TOKENS`	`256`	Maximum response length
`TEMPERATURE`	`0.1`	Generation temperature
`HUGGINGFACE_TOKEN`	``	HF token for private models

📋 Usage Examples

Example JSON Schema:

{
  "type": "object",
  "properties": {
    "summary": {"type": "string"},
    "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

Example Prompt:

Analyze this review: "The product exceeded my expectations! Great quality and fast delivery."

🔧 Docker Optimizations

This Docker version includes several optimizations:

Reduced memory usage with smaller context window and batch sizes
CPU-optimized configuration by default
Efficient layer caching for faster builds
Security: Runs as non-root user
Multi-stage build capabilities for production

🏗️ Architecture

Base Image: Python 3.10 slim
ML Backend: llama-cpp-python with OpenBLAS
Web Interface: Gradio 4.x
API: FastAPI with automatic documentation
Model Storage: Downloaded on first run to /app/models/

💡 Performance Tips

Memory: Start with smaller models (7B or less)
CPU: Adjust N_THREADS based on available cores
Context: Reduce N_CTX if experiencing memory issues
Batch size: Lower N_BATCH for memory-constrained environments

🔍 Troubleshooting

Container fails to start:

Check available memory (minimum 4GB recommended)
Verify model repository accessibility
Ensure proper environment variable formatting

Model download issues:

Check internet connectivity in container
Verify HUGGINGFACE_TOKEN for private models
Ensure sufficient disk space

Performance issues:

Reduce N_CTX and MAX_NEW_TOKENS
Adjust N_THREADS to match CPU cores
Consider using smaller/quantized models

📄 License

MIT License - see LICENSE file for details.

For more information about HuggingFace Spaces Docker configuration, see: https://huggingface.co/docs/hub/spaces-config-reference