Spaces:

MarcosFRGames
/

TeleChars-AI-API

Running

App Files Files

MarcosFRGames commited on Dec 1, 2025

Commit

a9f151a

verified ·

1 Parent(s): 0feaca0

Upload 5 files

Browse files

Files changed (5) hide show

Dockerfile (1).txt +26 -0
README (4).md +191 -0
app (3).py +365 -0
gitattributes.txt +35 -0
requirements (2).txt +5 -0

Dockerfile (1).txt ADDED Viewed

	@@ -0,0 +1,26 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Instalar dependencias del sistema necesarias para llama-cpp-python
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    build-essential \
+    curl \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+# Copiar requirements primero (para mejor cache de Docker)
+COPY requirements.txt .
+# Instalar dependencias de Python
+RUN pip install --no-cache-dir -r requirements.txt
+# Copiar aplicación
+COPY app.py .
+# Exponer puerto
+EXPOSE 7860
+# Comando de inicio
+CMD ["python", "-m", "gunicorn", "--bind", "0.0.0.0:7860", "--workers", "1", "--timeout", "120", "app:app"]

README (4).md ADDED Viewed

	@@ -0,0 +1,191 @@

+---
+title: Ollama API Space
+emoji: 🚀
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+---
+# 🚀 Ollama API Space
+A Hugging Face Space that provides a REST API interface for Ollama models, allowing you to run local LLMs through a web API.
+## 🌟 Features
+- **Model Management**: List and pull Ollama models
+- **Text Generation**: Generate text using any available Ollama model
+- **REST API**: Simple HTTP endpoints for easy integration
+- **Health Monitoring**: Built-in health checks and status monitoring
+- **OpenWebUI Integration**: Compatible with OpenWebUI for a full chat interface
+## 🚀 Quick Start
+### 1. Deploy to Hugging Face Spaces
+1. Fork this repository or create a new Space
+2. Upload these files to your Space
+3. **No environment variables needed** - Ollama runs inside the Space!
+4. Wait for the build to complete (may take 10-15 minutes due to Ollama installation)
+### 2. Local Development
+```bash
+# Clone the repository
+git clone <your-repo-url>
+cd ollama-space
+# Install dependencies
+pip install -r requirements.txt
+# Install Ollama locally
+curl -fsSL https://ollama.ai/install.sh | sh
+# Start Ollama in another terminal
+ollama serve
+# Run the application
+python app.py
+```
+## 📡 API Endpoints
+### GET `/api/models`
+List all available Ollama models.
+**Response:**
+```json
+{
+  "status": "success",
+  "models": ["llama2", "codellama", "neural-chat"],
+  "count": 3
+}
+```
+### POST `/api/models/pull`
+Pull a model from Ollama.
+**Request Body:**
+```json
+{
+  "name": "llama2"
+}
+```
+**Response:**
+```json
+{
+  "status": "success",
+  "model": "llama2"
+}
+```
+### POST `/api/generate`
+Generate text using a model.
+**Request Body:**
+```json
+{
+  "model": "llama2",
+  "prompt": "Hello, how are you?",
+  "temperature": 0.7,
+  "max_tokens": 100
+}
+```
+**Response:**
+```json
+{
+  "status": "success",
+  "response": "Hello! I'm doing well, thank you for asking...",
+  "model": "llama2",
+  "usage": {
+    "prompt_tokens": 7,
+    "completion_tokens": 15,
+    "total_tokens": 22
+  }
+}
+```
+### GET `/health`
+Health check endpoint.
+**Response:**
+```json
+{
+  "status": "healthy",
+  "ollama_connection": "connected",
+  "available_models": 3
+}
+```
+## 🔧 Configuration
+### Environment Variables
+- `OLLAMA_BASE_URL`: URL to your Ollama instance (default: `http://localhost:11434` - **Ollama runs inside this Space!**)
+- `MODELS_DIR`: Directory for storing models (default: `/models`)
+- `ALLOWED_MODELS`: Comma-separated list of allowed models (default: all models)
+**Note**: This Space now includes Ollama installed directly inside it, so you don't need an external Ollama instance!
+### Supported Models
+By default, the following models are allowed:
+- `llama2`
+- `llama2:13b`
+- `llama2:70b`
+- `codellama`
+- `neural-chat`
+You can customize this list by setting the `ALLOWED_MODELS` environment variable.
+## 🌐 Integration with OpenWebUI
+This Space is designed to work seamlessly with OpenWebUI. You can:
+1. Use this Space as a backend API for OpenWebUI
+2. Configure OpenWebUI to connect to this Space's endpoints
+3. Enjoy a full chat interface with your local Ollama models
+## 🐳 Docker Support
+The Space includes a Dockerfile for containerized deployment:
+```bash
+# Build the image
+docker build -t ollama-space .
+# Run the container
+docker run -p 7860:7860 -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ollama-space
+```
+## 🔒 Security Considerations
+- The Space only allows access to models specified in `ALLOWED_MODELS`
+- All API endpoints are publicly accessible (consider adding authentication for production use)
+- The Space connects to your Ollama instance - ensure proper network security
+## 🚨 Troubleshooting
+### Common Issues
+1. **Connection to Ollama failed**: Check if Ollama is running and accessible
+2. **Model not found**: Ensure the model is available in your Ollama instance
+3. **Timeout errors**: Large models may take time to load - increase timeout values
+### Health Check
+Use the `/health` endpoint to monitor the Space's status and Ollama connection.
+## 📝 License
+This project is open source and available under the MIT License.
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## 📞 Support
+If you encounter any issues or have questions, please open an issue on the repository.

app (3).py ADDED Viewed

	@@ -0,0 +1,365 @@

+from flask import Flask, request, jsonify, Response
+import os
+import logging
+import time
+from llama_cpp import Llama
+import requests
+import tempfile
+app = Flask(__name__)
+logging.basicConfig(level=logging.INFO)
+# CONFIGURACIÓN DE TOKENS
+MAX_CONTEXT_TOKENS = 1024 * 8
+MAX_GENERATION_TOKENS = 1024 * 4
+MODELS = [
+    {
+        "url": "https://huggingface.co/Novaciano/Qwen2.5-0.5B-NSFW_Amoral_Christmas-GGUF/resolve/main/Qwen2.5-0.5b-NSFW_Amoral_Christmas.gguf",
+        "name": "qwen2.5-0.5b-nsfw-amoral-christmas"
+    },
+    {
+        "url": "https://huggingface.co/afrideva/dolphin-2_6-phi-2_oasst2_chatML_V2-GGUF/resolve/main/dolphin-2_6-phi-2_oasst2_chatml_v2.q4_k_m.gguf",
+        "name": "phi-2"
+    }
+]
+class LLMManager:
+    def __init__(self, models_config):
+        self.models = {}
+        self.models_config = models_config
+        self.load_all_models()
+    def load_all_models(self):
+        """Cargar todos los modelos en RAM"""
+        for model_config in self.models_config:
+            try:
+                model_name = model_config["name"]
+                logging.info(f"🚀 Cargando modelo: {model_name}")
+                temp_path = self._download_model(model_config["url"])
+                actual_size = os.path.getsize(temp_path)
+                actual_gb = actual_size / (1024*1024*1024)
+                logging.info(f"📊 Tamaño descargado para {model_name}: {actual_gb:.2f} GB")
+                logging.info(f"🔄 Cargando {model_name} en RAM…")
+                llm_instance = Llama(
+                    model_path=temp_path,
+                    n_ctx=MAX_CONTEXT_TOKENS,
+                    n_batch=128,
+                    n_threads=6,
+                    n_threads_batch=6,
+                    use_mlock=True,
+                    mmap=True,
+                    low_vram=False,
+                    vocab_only=False
+                )
+                os.remove(temp_path)
+                self.models[model_name] = {
+                    "instance": llm_instance,
+                    "loaded": True,
+                    "config": model_config
+                }
+                logging.info(f"✅ Modelo {model_name} cargado")
+            except Exception as e:
+                logging.error(f"❌ Error cargando modelo {model_config['name']}: {e}")
+                self.models[model_config["name"]] = {
+                    "instance": None,
+                    "loaded": False,
+                    "config": model_config,
+                    "error": str(e)
+                }
+    def _download_model(self, model_url):
+        """Descargar modelo"""
+        temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".gguf")
+        temp_path = temp_file.name
+        temp_file.close()
+        logging.info("📥 Descargando modelo…")
+        response = requests.get(model_url, stream=True, timeout=300)
+        response.raise_for_status()
+        downloaded = 0
+        with open(temp_path, 'wb') as f:
+            for chunk in response.iter_content(chunk_size=8192):
+                if chunk:
+                    f.write(chunk)
+                    downloaded += len(chunk)
+        return temp_path
+    def get_model(self, model_name):
+        """Obtener instancia de modelo por nombre"""
+        return self.models.get(model_name)
+    def chat_completion(self, model_name, messages, **kwargs):
+        """Generar respuesta con modelo específico"""
+        model_data = self.get_model(model_name)
+        if not model_data or not model_data["loaded"]:
+            error_msg = f"Modelo {model_name} no cargado"
+            if model_data and "error" in model_data:
+                error_msg += f": {model_data['error']}"
+            return {"error": error_msg}
+        response = model_data["instance"].create_chat_completion(
+            messages=messages,
+            **kwargs
+        )
+        response["provider"] = "telechars-ai"
+        response["model"] = model_name
+        return response
+    def get_loaded_models(self):
+        """Obtener lista de modelos cargados"""
+        loaded = []
+        for name, data in self.models.items():
+            if data["loaded"]:
+                loaded.append(name)
+        return loaded
+    def get_all_models_status(self):
+        """Obtener estado de todos los modelos"""
+        status = {}
+        for name, data in self.models.items():
+            status[name] = {
+                "loaded": data["loaded"],
+                "url": data["config"]["url"]
+            }
+            if "error" in data:
+                status[name]["error"] = data["error"]
+        return status
+# Inicializar el gestor con todos los modelos
+llm_manager = LLMManager(MODELS)
+@app.route('/')
+def home():
+    loaded_models = llm_manager.get_loaded_models()
+    status_html = "<ul>"
+    for model_name, model_data in llm_manager.models.items():
+        status = "✅ SÍ" if model_data["loaded"] else "❌ NO"
+        status_html += f"<li>{model_name}: {status}</li>"
+    status_html += "</ul>"
+    return f'''
+    <!DOCTYPE html>
+    <html>
+    <head>
+        <title>TeleChars AI API</title>
+        <style>
+            body {{ font-family: Arial, sans-serif; margin: 40px; }}
+            .config {{ background: #f0f0f0; padding: 15px; border-radius: 5px; margin-bottom: 20px; }}
+            .endpoint {{ background: #e8f4f8; padding: 10px; border-left: 4px solid #2196F3; margin: 10px 0; }}
+        </style>
+    </head>
+    <body>
+        <h1>TeleChars AI API</h1>
+        <div class="config">
+            <h3>⚙️ Configuración</h3>
+            <p><strong>Max Context Tokens:</strong> {MAX_CONTEXT_TOKENS}</p>
+            <p><strong>Max Generation Tokens:</strong> {MAX_GENERATION_TOKENS}</p>
+        </div>
+        <h2>📦 Modelos cargados:</h2>
+        {status_html}
+        <p>Total modelos: {len(loaded_models)}/{len(MODELS)}</p>
+        <h2>🔗 Endpoints disponibles:</h2>
+        <div class="endpoint">
+            <strong>GET /generate/&lt;mensaje&gt;[?params]</strong><br>
+            Devuelve solo el texto generado. Parámetros opcionales:<br>
+            • system= (instrucciones del sistema)<br>
+            • temperature= (0.0-2.0)<br>
+            • top_p= (0.0-1.0)<br>
+            • model= (nombre del modelo)<br>
+            • max_tokens= (máximo tokens a generar, default: {MAX_GENERATION_TOKENS})
+        </div>
+        <div class="endpoint">
+            <strong>POST /v1/chat/completions</strong><br>
+            Compatible con OpenAI API
+        </div>
+        <div class="endpoint">
+            <strong>GET /health</strong><br>
+            Estado del servicio
+        </div>
+        <div class="endpoint">
+            <strong>GET /models</strong><br>
+            Lista todos los modelos disponibles
+        </div>
+    </body>
+    </html>
+    '''
+@app.route('/v1/chat/completions', methods=['POST'])
+def chat_completions():
+    try:
+        data = request.get_json()
+        messages = data.get('messages', [])
+        model_name = data.get('model', MODELS[0]["name"])
+        if model_name not in llm_manager.models:
+            return jsonify({"error": f"Modelo '{model_name}' no encontrado. Modelos disponibles: {list(llm_manager.models.keys())}"}), 400
+        kwargs = {}
+        for key in data.keys():
+            if key not in ['messages', 'model']:
+                kwargs[key] = data[key]
+        # Aplicar límite de tokens si no se especifica
+        if 'max_tokens' not in kwargs:
+            kwargs['max_tokens'] = MAX_GENERATION_TOKENS
+        result = llm_manager.chat_completion(model_name, messages, **kwargs)
+        if "error" in result:
+            return jsonify(result), 500
+        return jsonify(result), 200
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+@app.route('/generate/<path:user_message>', methods=['GET'])
+def generate_endpoint(user_message):
+    """Endpoint GET para generar respuestas - Devuelve solo texto"""
+    try:
+        # Obtener parámetros GET con valores por defecto
+        system_instruction = request.args.get('system', 'Eres un asistente útil.')
+        temperature = float(request.args.get('temperature', 0.7))
+        top_p = float(request.args.get('top_p', 0.95))
+        model_name = request.args.get('model', MODELS[0]["name"])
+        max_tokens = int(request.args.get('max_tokens', MAX_GENERATION_TOKENS))
+        # Validar rangos
+        if not 0 <= temperature <= 2:
+            return Response(
+                f"Error: El parámetro 'temperature' debe estar entre 0 y 2",
+                status=400,
+                mimetype='text/plain'
+            )
+        if not 0 <= top_p <= 1:
+            return Response(
+                f"Error: El parámetro 'top_p' debe estar entre 0 y 1",
+                status=400,
+                mimetype='text/plain'
+            )
+        # Limitar max_tokens a la configuración máxima
+        if max_tokens > MAX_GENERATION_TOKENS:
+            max_tokens = MAX_GENERATION_TOKENS
+        # Validar que el modelo existe
+        if model_name not in llm_manager.models:
+            return Response(
+                f"Error: Modelo '{model_name}' no encontrado. Modelos disponibles: {', '.join(llm_manager.models.keys())}",
+                status=400,
+                mimetype='text/plain'
+            )
+        # Crear mensajes
+        messages = [
+            {"role": "system", "content": system_instruction},
+            {"role": "user", "content": user_message}
+        ]
+        # Configurar parámetros
+        kwargs = {
+            "temperature": temperature,
+            "top_p": top_p,
+            "max_tokens": max_tokens,
+            "stream": False
+        }
+        # Generar respuesta
+        result = llm_manager.chat_completion(model_name, messages, **kwargs)
+        if "error" in result:
+            return Response(
+                f"Error: {result['error']}",
+                status=500,
+                mimetype='text/plain'
+            )
+        response_text = result.get("choices", [{}])[0].get("message", {}).get("content", "")
+        if not response_text:
+            response_text = "No se generó respuesta"
+        # Devolver solo el texto plano
+        return Response(
+            response_text,
+            status=200,
+            mimetype='text/plain'
+        )
+    except ValueError as e:
+        return Response(
+            f"Error: Parámetros inválidos - {str(e)}. Asegúrate de que temperature, top_p y max_tokens sean números válidos.",
+            status=400,
+            mimetype='text/plain'
+        )
+    except Exception as e:
+        return Response(
+            f"Error: {str(e)}",
+            status=500,
+            mimetype='text/plain'
+        )
+@app.route('/health', methods=['GET'])
+def health():
+    loaded_models = llm_manager.get_loaded_models()
+    return jsonify({
+        "status": "healthy" if len(loaded_models) > 0 else "error",
+        "loaded_models": loaded_models,
+        "total_models": len(MODELS),
+        "config": {
+            "max_context_tokens": MAX_CONTEXT_TOKENS,
+            "max_generation_tokens": MAX_GENERATION_TOKENS
+        }
+    })
+@app.route('/models', methods=['GET'])
+def list_models():
+    """Endpoint para listar todos los modelos y su estado"""
+    return jsonify({
+        "available_models": MODELS,
+        "status": llm_manager.get_all_models_status(),
+        "config": {
+            "max_context_tokens": MAX_CONTEXT_TOKENS,
+            "max_generation_tokens": MAX_GENERATION_TOKENS
+        }
+    })
+@app.route('/models/<model_name>', methods=['GET'])
+def get_model_status(model_name):
+    """Endpoint para obtener el estado de un modelo específico"""
+    model_data = llm_manager.get_model(model_name)
+    if not model_data:
+        return jsonify({"error": f"Modelo '{model_name}' no encontrado"}), 404
+    return jsonify({
+        "model": model_name,
+        "loaded": model_data["loaded"],
+        "url": model_data["config"]["url"],
+        "error": model_data.get("error"),
+        "config": {
+            "max_context_tokens": MAX_CONTEXT_TOKENS,
+            "max_generation_tokens": MAX_GENERATION_TOKENS
+        }
+    })
+if __name__ == '__main__':
+    app.run(host='0.0.0.0', port=7860, debug=False)

gitattributes.txt ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

requirements (2).txt ADDED Viewed

	@@ -0,0 +1,5 @@

+llama-cpp-python==0.3.1
+gunicorn>=21.2.0
+flask>=2.3.3
+requests>=2.31.0
+psutil>=5.9.6