Spaces:

lyangas
/

free_llm_structure_output

Runtime error

App Files Files Community

lyangas commited on Aug 23, 2025

Commit

824d8df

1 Parent(s): ba2ded2

init repo

Browse files

Files changed (6) hide show

.env.example +29 -0
.gitignore +70 -0
README.md +214 -7
app.py +422 -0
config.py +62 -0
requirements.txt +18 -0

.env.example ADDED Viewed

	@@ -0,0 +1,29 @@

+# Example environment configuration for HF Spaces
+# Copy this file to .env and modify as needed
+# Model Configuration
+MODEL_REPO=lmstudio-community/gemma-3n-E4B-it-text-GGUF
+MODEL_FILENAME=gemma-3n-E4B-it-Q8_0.gguf
+MODEL_PATH=./models/gemma-3n-E4B-it-Q8_0.gguf
+HUGGINGFACE_TOKEN=
+# GPU Optimization Settings (for HF Spaces with GPU)
+N_CTX=8192
+N_GPU_LAYERS=-1
+N_THREADS=8
+N_BATCH=1024
+USE_MLOCK=false
+USE_MMAP=true
+F16_KV=true
+SEED=42
+# Server Settings
+HOST=0.0.0.0
+GRADIO_PORT=7860
+# Generation Settings
+MAX_NEW_TOKENS=512
+TEMPERATURE=0.1
+# File Upload Settings
+MAX_FILE_SIZE=10485760

.gitignore ADDED Viewed

	@@ -0,0 +1,70 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# Virtual environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Models
+models/
+*.gguf
+*.bin
+*.safetensors
+# Logs
+*.log
+logs/
+# OS
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+# Jupyter
+.ipynb_checkpoints/
+# Gradio
+flagged/
+gradio_cached_examples/
+# Temporary files
+tmp/
+temp/
+*.tmp
+*.temp

README.md CHANGED Viewed

@@ -1,13 +1,220 @@
 ---
-title: Free Llm Structure Output
-emoji: 😻
-colorFrom: green
-colorTo: red
 sdk: gradio
-sdk_version: 5.43.1
 app_file: app.py
 pinned: false
-license: gemma
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: LLM Structured Output
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.1
 app_file: app.py
 pinned: false
+license: mit
+hardware: t4-small
 ---
+# 🤖 LLM Structured Output - Hugging Face Spaces
+Приложение для генерации структурированных ответов с использованием локальных GGUF моделей через llama-cpp-python, оптимизированное для работы в Hugging Face Spaces с GPU поддержкой.
+## ✨ Возможности
+- 🚀 **GPU Ускорение**: Оптимизировано для работы с GPU в HF Spaces с использованием декоратора `@spaces.GPU`
+- 📊 **Структурированный вывод**: Генерация ответов согласно JSON схеме
+- 🎯 **Высокая точность**: Использование локальных GGUF моделей
+- 🎨 **Удобный интерфейс**: Современный Gradio интерфейс
+- 🔧 **Гибкая настройка**: Поддержка различных моделей и параметров
+- ⚡ **Умное управление ресурсами**: GPU сессии выделяются на 120 секунд на запрос
+## 🚀 Быстрый старт
+### Развертывание в Hugging Face Spaces
+1. Создайте новый Space в Hugging Face
+2. Выберите тип Space: **Gradio**
+3. Выберите аппаратное обеспечение: **GPU** (T4 или выше)
+4. Загрузите файлы проекта
+5. Space автоматически запустится
+### Локальный запуск
+```bash
+# Клонирование репозитория
+git clone <your-repo>
+cd free_llm_structure_output
+# Установка зависимостей
+pip install -r requirements.txt
+# Запуск приложения
+python app.py
+```
+## 📋 Структура проекта
+```
+free_llm_structure_output/
+├── app.py              # Основное приложение Gradio
+├── config.py           # Конфигурация для HF Spaces
+├── requirements.txt    # Зависимости Python
+└── README.md          # Документация
+```
+## ⚙️ Конфигурация
+Основные параметры настраиваются через переменные окружения или файл `config.py`:
+### Настройки модели
+- `MODEL_REPO`: Репозиторий модели на HuggingFace (по умолчанию: lmstudio-community/gemma-3n-E4B-it-text-GGUF)
+- `MODEL_FILENAME`: Имя файла модели (по умолчанию: gemma-3n-E4B-it-Q8_0.gguf)
+- `HUGGINGFACE_TOKEN`: Токен HF для приватных моделей
+### GPU оптимизация
+- `N_GPU_LAYERS`: Количество слоев на GPU (-1 для всех)
+- `N_CTX`: Размер контекста (8192 для GPU)
+- `N_BATCH`: Размер батча (1024 для GPU)
+- `N_THREADS`: Количество потоков (8 для HF Spaces)
+### Генерация
+- `MAX_NEW_TOKENS`: Максимальная длина ответа (512)
+- `TEMPERATURE`: Температура генерации (0.1)
+## 🎯 Использование
+### Базовый пример
+1. **Введите промпт**: Опишите что вы хотите проанализировать
+2. **Задайте JSON схему**: Определите структуру ответа
+3. **Нажмите "Generate Response"**: Получите структурированный ответ
+### Пример JSON схемы
+```json
+{
+  "type": "object",
+  "properties": {
+    "summary": {
+      "type": "string",
+      "description": "Краткое описание"
+    },
+    "sentiment": {
+      "type": "string",
+      "enum": ["positive", "negative", "neutral"],
+      "description": "Эмоциональная окраска"
+    },
+    "confidence": {
+      "type": "number",
+      "minimum": 0,
+      "maximum": 1,
+      "description": "Уровень уверенности"
+    }
+  },
+  "required": ["summary", "sentiment"]
+}
+```
+## 🔧 Продвинутые настройки
+### Переменные окружения для HF Spaces
+Создайте файл `.env` в настройках Space или задайте переменные:
+```env
+# Модель
+MODEL_REPO=lmstudio-community/gemma-3n-E4B-it-text-GGUF
+MODEL_FILENAME=gemma-3n-E4B-it-Q8_0.gguf
+HUGGINGFACE_TOKEN=your_token_here
+# GPU настройки
+N_GPU_LAYERS=-1
+N_CTX=8192
+N_BATCH=1024
+N_THREADS=8
+# Генерация
+MAX_NEW_TOKENS=512
+TEMPERATURE=0.1
+```
+### Использование других моделей
+Поддерживаются любые GGUF модели из HuggingFace Hub:
+```python
+# В config.py или переменных окружения
+MODEL_REPO = "microsoft/Phi-3-mini-4k-instruct-gguf"
+MODEL_FILENAME = "Phi-3-mini-4k-instruct-q4.gguf"
+```
+## 📊 Производительность
+### Рекомендуемые конфигурации HF Spaces
+| Размер модели | GPU | N_CTX | N_BATCH | N_GPU_LAYERS |
+|---------------|-----|-------|---------|--------------|
+| 3B-7B         | T4  | 4096  | 512     | -1           |
+| 7B-13B        | A10G| 8192  | 1024    | -1           |
+| 13B+          | A100| 16384 | 2048    | -1           |
+### Оптимизация скорости
+- Используйте quantized модели (Q4_0, Q8_0)
+- Настройте `N_BATCH` под размер GPU памяти
+- Установите `N_GPU_LAYERS=-1` для полного GPU ускорения
+## 🛠️ Отладка
+### Проблемы с загрузкой модели
+1. Проверьте доступность модели в HF Hub
+2. Убедитесь в корректности `HUGGINGFACE_TOKEN`
+3. Проверите размер GPU памяти
+4. Используйте менее ресурсоемкую модель
+### Логи
+Включите детальное логирование:
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+## 🎨 Примеры использования
+### Анализ текста
+```
+Промпт: "Проанализируй отзыв: 'Отличный продукт, рекомендую!'"
+Схема: {"sentiment": "string", "rating": "number", "keywords": "array"}
+```
+### Извлечение данных
+```
+Промпт: "Извлеки информацию о компании из текста"
+Схема: {"name": "string", "industry": "string", "employees": "number"}
+```
+### Генерация структур
+```
+Промпт: "Создай план обучения Python"
+Схема: {"weeks": "array", "topics": "array", "hours": "number"}
+```
+## 📄 Лицензия
+MIT License
+## 🤝 Поддержка
+- 🐛 Сообщения об ошибках: создайте Issue
+- 💡 Предложения: создайте Discussion
+- 📧 Прямая связь: через HuggingFace
+## 🔗 Полезные ссылки
+- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
+- [Gradio Documentation](https://gradio.app/docs/)
+- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
+- [GGUF Models Hub](https://huggingface.co/models?library=gguf)
+---
+⭐ **Нравится проект?** Поставьте звезду и поделитесь с коллегами!

app.py ADDED Viewed

	@@ -0,0 +1,422 @@

+import spaces
+import os
+import json
+import subprocess
+from llama_cpp import Llama
+import gradio as gr
+from huggingface_hub import hf_hub_download
+from typing import Optional, Dict, Any, Union
+from PIL import Image
+from pydantic import BaseModel
+import logging
+from config import Config
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Get Hugging Face token
+huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
+# Download model if needed
+def download_model_if_needed():
+    """Download model from Hugging Face if it doesn't exist locally"""
+    model_path = Config.get_model_path()
+    if os.path.exists(model_path):
+        logger.info(f"Model already exists at: {model_path}")
+        return model_path
+    # Check alternative locations for HF Spaces
+    alternative_paths = [
+        f"./models/{Config.MODEL_FILENAME}",
+        f"/tmp/models/{Config.MODEL_FILENAME}",
+        f"./{Config.MODEL_FILENAME}"
+    ]
+    for alt_path in alternative_paths:
+        if os.path.exists(alt_path):
+            logger.info(f"Found model at alternative location: {alt_path}")
+            return alt_path
+    logger.info(f"Downloading model {Config.MODEL_REPO}/{Config.MODEL_FILENAME}...")
+    # Create models directory if it doesn't exist
+    models_dir = Config.get_models_dir()
+    os.makedirs(models_dir, exist_ok=True)
+    try:
+        # Download model
+        model_path = hf_hub_download(
+            repo_id=Config.MODEL_REPO,
+            filename=Config.MODEL_FILENAME,
+            local_dir=models_dir,
+            token=huggingface_token if huggingface_token else None
+        )
+        logger.info(f"Model downloaded to: {model_path}")
+        return model_path
+    except Exception as e:
+        logger.error(f"Failed to download model: {e}")
+        raise
+# Download model at startup
+try:
+    download_model_if_needed()
+except Exception as e:
+    logger.error(f"Error downloading model: {e}")
+# Global variables for model management
+llm = None
+llm_model = None
+class StructuredOutputRequest(BaseModel):
+    prompt: str
+    image: Optional[str] = None  # base64 encoded image
+    json_schema: Dict[str, Any]
+def _validate_json_schema(schema: str) -> Dict[str, Any]:
+    """Validate and parse JSON schema"""
+    try:
+        parsed_schema = json.loads(schema)
+        return parsed_schema
+    except json.JSONDecodeError as e:
+        raise ValueError(f"Invalid JSON schema: {e}")
+def _format_prompt_with_schema(prompt: str, json_schema: Dict[str, Any]) -> str:
+    """Format prompt for structured output generation"""
+    schema_str = json.dumps(json_schema, ensure_ascii=False, indent=2)
+    formatted_prompt = f"""User: {prompt}
+Please respond in strict accordance with the following JSON schema:
+```json
+{schema_str}
+```
+Return ONLY valid JSON without additional comments or explanations."""
+    return formatted_prompt
+@spaces.GPU(duration=120, concurrency_limit=1)
+def generate_structured_response(
+    prompt: str,
+    json_schema_str: str,
+    image: Optional[Image.Image] = None,
+    model: str = Config.MODEL_FILENAME,
+    max_tokens: int = Config.MAX_NEW_TOKENS,
+    temperature: float = Config.TEMPERATURE,
+    top_p: float = 0.9,
+    top_k: int = 40,
+    repeat_penalty: float = 1.1,
+) -> Dict[str, Any]:
+    """
+    Generate structured response from local GGUF model with GPU acceleration
+    """
+    global llm
+    global llm_model
+    try:
+        # Load or reload model if needed
+        if llm is None or llm_model != model:
+            logger.info(f"Loading model: {model}")
+            # Find model path
+            model_path = Config.get_model_path()
+            if not os.path.exists(model_path):
+                # Try alternative paths
+                alternative_paths = [
+                    f"./models/{model}",
+                    f"/tmp/models/{model}",
+                    f"./{model}"
+                ]
+                for alt_path in alternative_paths:
+                    if os.path.exists(alt_path):
+                        model_path = alt_path
+                        break
+                else:
+                    raise FileNotFoundError(f"Model file not found: {model}")
+            # Initialize Llama model with GPU optimization
+            llm = Llama(
+                model_path=model_path,
+                n_ctx=Config.N_CTX,
+                n_batch=Config.N_BATCH,
+                n_gpu_layers=Config.N_GPU_LAYERS,  # Use all GPU layers
+                use_mlock=Config.USE_MLOCK,
+                use_mmap=Config.USE_MMAP,
+                vocab_only=False,
+                f16_kv=Config.F16_KV,
+                logits_all=False,
+                embedding=False,
+                n_threads=Config.N_THREADS,
+                last_n_tokens_size=128,
+                lora_base=None,
+                lora_path=None,
+                seed=Config.SEED,
+                verbose=True,
+                main_gpu=0,  # Use first GPU
+                tensor_split=None,
+                rope_scaling_type=None,
+                rope_freq_base=0.0,
+                rope_freq_scale=0.0,
+            )
+            llm_model = model
+            logger.info("Model successfully loaded with GPU acceleration")
+        # Validate and parse JSON schema
+        try:
+            parsed_schema = _validate_json_schema(json_schema_str)
+        except Exception as e:
+            return {
+                "error": f"Schema validation error: {str(e)}",
+                "raw_response": ""
+            }
+        # Format prompt
+        formatted_prompt = _format_prompt_with_schema(prompt, parsed_schema)
+        # Warning about images (not supported in this implementation)
+        if image is not None:
+            logger.warning("Image processing is not supported with this local model")
+        # Generate response with GPU optimization
+        logger.info("Generating response with GPU acceleration...")
+        response = llm(
+            formatted_prompt,
+            max_tokens=max_tokens,
+            temperature=temperature,
+            stop=["User:", "\n\n", "Assistant:", "Human:"],
+            echo=False,
+            top_p=top_p,
+            top_k=top_k,
+            repeat_penalty=repeat_penalty,
+            presence_penalty=0.0,
+            frequency_penalty=0.0,
+        )
+        # Extract generated text
+        generated_text = response['choices'][0]['text']
+        # Attempt to parse JSON response
+        try:
+            # Find JSON in response
+            json_start = generated_text.find('{')
+            json_end = generated_text.rfind('}') + 1
+            if json_start != -1 and json_end > json_start:
+                json_str = generated_text[json_start:json_end]
+                parsed_response = json.loads(json_str)
+                return {
+                    "success": True,
+                    "data": parsed_response,
+                    "raw_response": generated_text
+                }
+            else:
+                return {
+                    "error": "Could not find JSON in model response",
+                    "raw_response": generated_text
+                }
+        except json.JSONDecodeError as e:
+            return {
+                "error": f"JSON parsing error: {e}",
+                "raw_response": generated_text
+            }
+    except Exception as e:
+        logger.error(f"Unexpected error: {e}")
+        return {
+            "error": f"Generation error: {str(e)}"
+        }
+def process_request(prompt: str,
+                   json_schema: str,
+                   image: Optional[Image.Image] = None) -> str:
+    """
+    Process request through Gradio interface
+    """
+    if not prompt.strip():
+        return json.dumps({"error": "Prompt cannot be empty"}, ensure_ascii=False, indent=2)
+    if not json_schema.strip():
+        return json.dumps({"error": "JSON schema cannot be empty"}, ensure_ascii=False, indent=2)
+    result = generate_structured_response(prompt, json_schema, image)
+    return json.dumps(result, ensure_ascii=False, indent=2)
+# Examples for demonstration
+example_schema = """{
+  "type": "object",
+  "properties": {
+    "summary": {
+      "type": "string",
+      "description": "Brief summary of the response"
+    },
+    "sentiment": {
+      "type": "string",
+      "enum": ["positive", "negative", "neutral"],
+      "description": "Emotional tone"
+    },
+    "confidence": {
+      "type": "number",
+      "minimum": 0,
+      "maximum": 1,
+      "description": "Confidence level in the response"
+    },
+    "keywords": {
+      "type": "array",
+      "items": {
+        "type": "string"
+      },
+      "description": "Key words"
+    }
+  },
+  "required": ["summary", "sentiment", "confidence"]
+}"""
+example_prompt = "Analyze the following text and provide a structured assessment: 'The company's new product received enthusiastic user reviews. Sales exceeded all expectations by 150%.'"
+def create_gradio_interface():
+    """Create Gradio interface optimized for HF Spaces"""
+    with gr.Blocks(title="LLM Structured Output - HF Spaces", theme=gr.themes.Soft()) as demo:
+        gr.Markdown("# 🤖 LLM with Structured Output")
+        gr.Markdown(f"✨ **Running on Hugging Face Spaces with GPU acceleration**")
+        gr.Markdown(f"🚀 Model: **{Config.MODEL_REPO}/{Config.MODEL_FILENAME}**")
+        gr.Markdown("✅ **Status**: Model ready with GPU acceleration via @spaces.GPU decorator")
+        with gr.Row():
+            with gr.Column():
+                prompt_input = gr.Textbox(
+                    label="Prompt for model",
+                    placeholder="Enter your request...",
+                    lines=5,
+                    value=example_prompt
+                )
+                image_input = gr.Image(
+                    label="Image (optional, for multimodal models)",
+                    type="pil"
+                )
+                schema_input = gr.Textbox(
+                    label="JSON schema for response structure",
+                    placeholder="Enter JSON schema...",
+                    lines=15,
+                    value=example_schema
+                )
+                submit_btn = gr.Button("🚀 Generate Response", variant="primary", size="lg")
+            with gr.Column():
+                output = gr.Textbox(
+                    label="Structured Response",
+                    lines=20,
+                    interactive=False
+                )
+        submit_btn.click(
+            fn=process_request,
+            inputs=[prompt_input, schema_input, image_input],
+            outputs=output
+        )
+        # Examples
+        gr.Markdown("## 📋 Usage Examples")
+        examples = gr.Examples(
+            examples=[
+                [
+                    "Describe today's weather in New York",
+                    """{
+  "type": "object",
+  "properties": {
+    "temperature": {"type": "number"},
+    "description": {"type": "string"},
+    "humidity": {"type": "number"}
+  }
+}""",
+                    None
+                ],
+                [
+                    "Create a Python learning plan for one month",
+                    """{
+  "type": "object",
+  "properties": {
+    "weeks": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "week_number": {"type": "integer"},
+          "topics": {"type": "array", "items": {"type": "string"}},
+          "practice_hours": {"type": "number"}
+        }
+      }
+    },
+    "total_hours": {"type": "number"}
+  }
+}""",
+                    None
+                ],
+                [
+                    "Analyze this business proposal and extract key metrics",
+                    """{
+  "type": "object",
+  "properties": {
+    "feasibility_score": {"type": "number", "minimum": 0, "maximum": 10},
+    "risk_factors": {"type": "array", "items": {"type": "string"}},
+    "investment_required": {"type": "number"},
+    "expected_roi": {"type": "number"},
+    "timeline_months": {"type": "integer"}
+  },
+  "required": ["feasibility_score", "risk_factors"]
+}""",
+                    None
+                ]
+            ],
+            inputs=[prompt_input, schema_input, image_input]
+        )
+        # Model information
+        gr.Markdown(f"""
+## ℹ️ Model Information
+- **Model**: {Config.MODEL_REPO}/{Config.MODEL_FILENAME}
+- **Local path**: {Config.MODEL_PATH}
+- **Context window**: {Config.N_CTX} tokens
+- **Batch size**: {Config.N_BATCH}
+- **GPU layers**: {Config.N_GPU_LAYERS if Config.N_GPU_LAYERS >= 0 else "All (GPU accelerated)"}
+- **CPU threads**: {Config.N_THREADS}
+- **Maximum response length**: {Config.MAX_NEW_TOKENS} tokens
+- **Temperature**: {Config.TEMPERATURE}
+- **Memory lock**: {"Enabled" if Config.USE_MLOCK else "Disabled"}
+- **Memory mapping**: {"Enabled" if Config.USE_MMAP else "Disabled"}
+- **GPU Acceleration**: Enabled via @spaces.GPU decorator (120 seconds duration)
+💡 **Tips**:
+- Use clear and specific JSON schemas for better results
+- The model is optimized for GPU acceleration on Hugging Face Spaces
+- Structured output helps ensure consistent API responses
+- GPU sessions are allocated for 120 seconds per request
+🎯 **Perfect for**: API response generation, data extraction, content analysis, and structured data creation
+        """)
+    return demo
+if __name__ == "__main__":
+    # Create and launch Gradio interface for HF Spaces
+    demo = create_gradio_interface()
+    demo.launch(
+        server_name=Config.HOST,
+        server_port=Config.GRADIO_PORT,
+        share=False,
+        debug=False,  # Disabled for production
+        show_error=True
+    )

config.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import os
+from typing import Optional
+class Config:
+    """Application configuration for Hugging Face Spaces with GPU support"""
+    # Model settings - optimized for HF Spaces with GPU
+    MODEL_REPO: str = os.getenv("MODEL_REPO", "lmstudio-community/gemma-3n-E4B-it-text-GGUF")
+    MODEL_FILENAME: str = os.getenv("MODEL_FILENAME", "gemma-3n-E4B-it-Q8_0.gguf")
+    MODEL_PATH: str = os.getenv("MODEL_PATH", "./models/gemma-3n-E4B-it-Q8_0.gguf")
+    HUGGINGFACE_TOKEN: str = os.getenv("HUGGINGFACE_TOKEN", "")
+    # Model loading settings - optimized for HF Spaces GPU
+    N_CTX: int = int(os.getenv("N_CTX", "8192"))  # Larger context for GPU
+    N_GPU_LAYERS: int = int(os.getenv("N_GPU_LAYERS", "-1"))  # Use all GPU layers
+    N_THREADS: int = int(os.getenv("N_THREADS", "8"))  # More threads for HF GPU
+    N_BATCH: int = int(os.getenv("N_BATCH", "1024"))  # Larger batch for GPU
+    USE_MLOCK: bool = os.getenv("USE_MLOCK", "false").lower() == "true"  # Keep disabled
+    USE_MMAP: bool = os.getenv("USE_MMAP", "true").lower() == "true"  # Keep memory mapping
+    F16_KV: bool = os.getenv("F16_KV", "true").lower() == "true"  # Use 16-bit keys and values
+    SEED: int = int(os.getenv("SEED", "42"))  # Random seed for reproducibility
+    # Server settings - HF Spaces compatible
+    HOST: str = os.getenv("HOST", "0.0.0.0")
+    GRADIO_PORT: int = int(os.getenv("GRADIO_PORT", "7860"))  # Standard HuggingFace Spaces port
+    # Generation settings - optimized for GPU performance
+    MAX_NEW_TOKENS: int = int(os.getenv("MAX_NEW_TOKENS", "512"))  # Increased for GPU
+    TEMPERATURE: float = float(os.getenv("TEMPERATURE", "0.1"))
+    # File upload settings
+    MAX_FILE_SIZE: int = int(os.getenv("MAX_FILE_SIZE", "10485760"))  # 10MB
+    ALLOWED_IMAGE_EXTENSIONS: set = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".webp"}
+    @classmethod
+    def is_model_available(cls) -> bool:
+        """Check if local model file exists"""
+        return os.path.exists(cls.MODEL_PATH)
+    @classmethod
+    def get_model_path(cls) -> str:
+        """Get absolute path to model file"""
+        return os.path.abspath(cls.MODEL_PATH)
+    @classmethod
+    def get_models_dir(cls) -> str:
+        """Get models directory path"""
+        return os.path.dirname(cls.MODEL_PATH)
+    @classmethod
+    def load_from_env_file(cls, env_file: str = ".env") -> None:
+        """Load configuration from .env file"""
+        if os.path.exists(env_file):
+            with open(env_file, 'r') as f:
+                for line in f:
+                    line = line.strip()
+                    if line and not line.startswith('#') and '=' in line:
+                        key, value = line.split('=', 1)
+                        os.environ[key.strip()] = value.strip()
+# Automatically load from .env file on import
+Config.load_from_env_file()

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+# Core dependencies for Hugging Face Spaces with GPU support
+huggingface_hub==0.25.2
+spaces
+# GPU-optimized llama-cpp-python
+# llama-cpp-python>=0.3.4
+https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.16-cu124/llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl
+# Web interface
+gradio==4.44.1
+# Data processing
+pillow>=9.0.0,<11.0.0
+pydantic==2.10.6
+numpy>=1.24.0,<2.0.0
+# HTTP requests
+requests>=2.28.0