Spaces:

jaothan
/

podman_llamacpp_python

Build error

App Files Files Community

jaothan commited on Feb 24, 2025

Commit

1a3132e

verified ·

1 Parent(s): 72d701d

Upload 4 files

Browse files

Files changed (4) hide show

llamacpp_python/base/call-model.py +20 -0
llamacpp_python/base/chat-app.py +38 -0
llamacpp_python/base/docker-compose.yml +59 -0
llamacpp_python/base/model_server.py +22 -0

llamacpp_python/base/call-model.py ADDED Viewed

	@@ -0,0 +1,20 @@

+#### 6. **Handling Errors**
+#- If the chat application cannot connect to the model server, check the following:
+ # - Is the model server running?
+ # - Is the `MODEL_ENDPOINT` URL correct?
+ # - Are there any firewall or network restrictions blocking the connection?
+#  - Are the ports correctly mapped (if using Docker)?
+#- Add error handling in the chat application to handle cases where the model server is unavailable:
+#```python
+def call_model(prompt):
+    try:
+        url = f"{model_service}/generate"
+        payload = {"prompt": prompt}
+        response = requests.post(url, json=payload, timeout=10)  # Add a timeout
+        response.raise_for_status()  # Raise an error for bad status codes
+        return response.json().get("response", "No response from model")
+    except requests.exceptions.RequestException as e:
+        return f"Error connecting to the model server: {e}"

llamacpp_python/base/chat-app.py ADDED Viewed

	@@ -0,0 +1,38 @@

+#2. **Set the `MODEL_ENDPOINT` Environment Variable**
+#In your chat application, ensure the `MODEL_ENDPOINT` environment variable is set to the correct URL of the model server. For example:
+#```python
+import os
+import requests
+# Get the model endpoint from the environment variable
+model_service = os.getenv("MODEL_ENDPOINT", "http://localhost:8001")
+# Example function to call the model server
+def call_model(prompt):
+    url = f"{model_service}/generate"
+    payload = {"prompt": prompt}
+    response = requests.post(url, json=payload)
+    if response.status_code == 200:
+        return response.json().get("response", "No response from model")
+    else:
+        return f"Error: {response.status_code}"
+# Test the connection
+if __name__ == "__main__":
+    prompt = "Hello, model!"
+    result = call_model(prompt)
+    print(result)
+#### 3. **Test the Connection**
+#Run the chat application and test the connection to the model server:
+#```bash
+# Set the MODEL_ENDPOINT environment variable
+#export MODEL_ENDPOINT="http://localhost:8001"
+# Run the chat application
+#python chat_app.py
+#```
+#If everything is set up correctly, the chat application should be able to call #the model server and receive a response.

llamacpp_python/base/docker-compose.yml ADDED Viewed

	@@ -0,0 +1,59 @@

+#### 4. **Deploying in a Containerized Environment**
+#If you're using Docker or Podman, ensure the containers can communicate with each other. For example:
+#- **Docker Compose**:
+#  Create a `docker-compose.yml` file to define both the model server and the chat application:
+  #```yaml
+  version: "3"
+  services:
+    model_server:
+      image: my_model_server_image
+      ports:
+        - "8001:8001"
+      environment:
+        - PORT=8001
+      networks:
+        - my_network
+chat_app:
+      image: my_chat_app_image
+      environment:
+        - MODEL_ENDPOINT=http://model_server:8001
+      depends_on:
+        - model_server
+      networks:
+        - my_network
+  networks:
+    my_network:
+  #```
+  #- The `MODEL_ENDPOINT` for the chat application is set to `http://model_server:8001`, which uses Docker's internal DNS to resolve the model server's container name.
+#- **Docker Networking**:
+#  If you're not using Docker Compose, you can create a custom network and attach both containers to it:
+#  ```bash
+  # Create a custom network
+ # docker network create my_network
+  # Run the model server container
+ # docker run -d --name model_server --network my_network -p 8001:8001 my_model_server_image
+ # Run the chat application container
+ # docker run -d --name chat_app --network my_network -e MODEL_ENDPOINT=http://model_server:8001 my_chat_app_image
+#### 5. **Testing the Endpoint**
+#To ensure the model server is working as expected, you can test the endpoint directly using `curl` or a tool like Postman:
+#```bash
+#curl -X POST http://localhost:8001/generate -H "Content-Type: application/json" -d '{"prompt": "Hello, model!"}'
+#```
+#Expected response:
+#```json
+#{
+#  "response": "Generated response for: Hello, model!"
+#}

llamacpp_python/base/model_server.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from flask import Flask, request, jsonify
+app = Flask(__name__)
+@app.route("/generate", methods=["POST"])
+def generate():
+    data = request.json
+    prompt = data.get("prompt", "")
+    # Simulate a response from the model
+    response = f"Generated response for: {prompt}"
+    return jsonify({"response": response})
+if __name__ == "__main__":
+    app.run(host="0.0.0.0", port=8001)
+#Run the server:
+#```bash
+##python model_server.py
+#```
+#This server listens on `http://0.0.0.0:8001` and exposes a `/generate` endpoint# for generating responses.