lyangas's picture
init repo
b269c5d
|
raw
history blame
4.45 kB
metadata
title: LLM Structured Output Docker
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Get structured JSON responses from LLM using Docker
tags:
  - llama-cpp
  - gguf
  - json-schema
  - structured-output
  - llm
  - docker
  - gradio

πŸ€– LLM Structured Output (Docker Version)

Dockerized application for getting structured responses from local GGUF language models in specified JSON format.

✨ Key Features

  • Docker containerized for easy deployment on HuggingFace Spaces
  • Local GGUF model support via llama-cpp-python
  • Optimized for containers with configurable resources
  • JSON schema support for structured output
  • Gradio web interface for convenient interaction
  • REST API for integration with other applications
  • Memory efficient with GGUF quantized models

πŸš€ Deployment on HuggingFace Spaces

This version is specifically designed for HuggingFace Spaces with Docker SDK:

  1. Clone this repository
  2. Push to HuggingFace Spaces with sdk: docker in README.md
  3. The application will automatically build and deploy

🐳 Local Docker Usage

Build the image:

docker build -t llm-structured-output .

Run the container:

docker run -p 7860:7860 -e MODEL_REPO="lmstudio-community/gemma-3n-E4B-it-text-GGUF" llm-structured-output

With custom configuration:

docker run -p 7860:7860 \
  -e MODEL_REPO="lmstudio-community/gemma-3n-E4B-it-text-GGUF" \
  -e MODEL_FILENAME="gemma-3n-E4B-it-Q8_0.gguf" \
  -e N_CTX="4096" \
  -e MAX_NEW_TOKENS="512" \
  llm-structured-output

🌐 Application Access

πŸ“ Environment Variables

Configure the application using environment variables:

Variable Default Description
MODEL_REPO lmstudio-community/gemma-3n-E4B-it-text-GGUF HuggingFace model repository
MODEL_FILENAME gemma-3n-E4B-it-Q8_0.gguf Model file name
N_CTX 4096 Context window size
N_GPU_LAYERS 0 GPU layers (0 for CPU-only)
N_THREADS 4 CPU threads
MAX_NEW_TOKENS 256 Maximum response length
TEMPERATURE 0.1 Generation temperature
HUGGINGFACE_TOKEN `` HF token for private models

πŸ“‹ Usage Examples

Example JSON Schema:

{
  "type": "object",
  "properties": {
    "summary": {"type": "string"},
    "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

Example Prompt:

Analyze this review: "The product exceeded my expectations! Great quality and fast delivery."

πŸ”§ Docker Optimizations

This Docker version includes several optimizations:

  • Reduced memory usage with smaller context window and batch sizes
  • CPU-optimized configuration by default
  • Efficient layer caching for faster builds
  • Security: Runs as non-root user
  • Multi-stage build capabilities for production

πŸ—οΈ Architecture

  • Base Image: Python 3.10 slim
  • ML Backend: llama-cpp-python with OpenBLAS
  • Web Interface: Gradio 4.x
  • API: FastAPI with automatic documentation
  • Model Storage: Downloaded on first run to /app/models/

πŸ’‘ Performance Tips

  1. Memory: Start with smaller models (7B or less)
  2. CPU: Adjust N_THREADS based on available cores
  3. Context: Reduce N_CTX if experiencing memory issues
  4. Batch size: Lower N_BATCH for memory-constrained environments

πŸ” Troubleshooting

Container fails to start:

  • Check available memory (minimum 4GB recommended)
  • Verify model repository accessibility
  • Ensure proper environment variable formatting

Model download issues:

  • Check internet connectivity in container
  • Verify HUGGINGFACE_TOKEN for private models
  • Ensure sufficient disk space

Performance issues:

  • Reduce N_CTX and MAX_NEW_TOKENS
  • Adjust N_THREADS to match CPU cores
  • Consider using smaller/quantized models

πŸ“„ License

MIT License - see LICENSE file for details.


For more information about HuggingFace Spaces Docker configuration, see: https://huggingface.co/docs/hub/spaces-config-reference