Spaces:

Loomisgitarrist
/

personal-coder-ai

Sleeping

App Files Files Community

Loomis Green commited on Jan 25

Commit

c84a20e

1 Parent(s): cf85c62

Switch to Dolphin Llama 3.2 1B (Uncensored)

Browse files

Files changed (4) hide show

API_DOCUMENTATION.md +70 -38
app.py +9 -8
requirements.txt +2 -1
test_hf_api.py +55 -18

API_DOCUMENTATION.md CHANGED Viewed

@@ -1,46 +1,58 @@
-# Loomyloo Personal Coder AI API Documentation
-This API provides access to the **Qwen2.5-Coder-14B-Instruct-Uncensored** model, hosted on Hugging Face Spaces. It is optimized for coding tasks and uncensored instruction following.
-## Base URL
 ```
 https://loomisgitarrist-personal-coder-ai.hf.space
 ```
-## Endpoints
-### 1. Chat Completion
-Interact with the AI model. The server maintains a short-term memory of the conversation (last 20 messages).
-- **URL:** `/ask`
 - **Method:** `GET`
-- **Query Parameters:**
-  - `prompt` (string): The user's input message or question.
-**Example Request:**
 ```bash
-curl "https://loomisgitarrist-personal-coder-ai.hf.space/ask?prompt=Write%20a%20Python%20function%20to%20calculate%20Fibonacci"
 ```
-**Example Response:**
 ```json
 {
-  "generated_text": "Here is a Python function to calculate the Fibonacci sequence:\n\n```python\ndef fibonacci(n):\n    if n <= 0:\n        return []\n    elif n == 1:\n        return [0]\n    ...\n```"
 }
 ```
-### 2. Reset Memory
-Clears the server-side conversation history for a fresh start.
-- **URL:** `/reset`
 - **Method:** `GET`
-**Example Request:**
 ```bash
 curl "https://loomisgitarrist-personal-coder-ai.hf.space/reset"
 ```
-**Example Response:**
 ```json
 {
   "status": "Memory reset",
@@ -48,44 +60,64 @@ curl "https://loomisgitarrist-personal-coder-ai.hf.space/reset"
 }
 ```
-### 3. Chat UI
-A simple web interface to interact with the model visually.
-- **URL:** `/`
 - **Method:** `GET`
-- **Access:** Open in browser at `https://loomisgitarrist-personal-coder-ai.hf.space`
-## Model Details
-- **Model:** Qwen2.5-Coder-14B-Instruct-Uncensored
-- **Format:** GGUF (Quantized Q4_K_S for CPU efficiency)
-- **Engine:** `llama-cpp-python`
-- **Context Window:** 4096 tokens
-## Code Examples
-### Python (using `requests`)
 ```python
 import requests
 API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space"
-# 1. Reset Memory (Optional)
-requests.get(f"{API_URL}/reset")
-# 2. Ask a question
-response = requests.get(f"{API_URL}/ask", params={"prompt": "Explain dependency injection"})
-print(response.json()['generated_text'])
 ```
-### JavaScript (Fetch API)
 ```javascript
 const API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space";
 async function askAI(prompt) {
     const response = await fetch(`${API_URL}/ask?prompt=${encodeURIComponent(prompt)}`);
     const data = await response.json();
-    console.log(data.generated_text);
 }
-askAI("How do I center a div?");
 ```

+# Loomyloo Personal Coder AI - API Reference
+This document details all available endpoints and parameters for your custom AI API hosted on Hugging Face Spaces.
+## 🌍 Base URL
 ```
 https://loomisgitarrist-personal-coder-ai.hf.space
 ```
+---
+## 📡 Endpoints
+### 1. Chat Completion (`/ask`)
+Generates a response from the **Dolphin 2.9.3 Llama 3.2 1B** model.
+This model is **Uncensored** and optimized for chat.
+The server automatically manages conversation history (memory) for the last 20 turns.
 - **Method:** `GET`
+- **URL:** `/ask`
+#### Parameters
+| Parameter | Type   | Required | Description                                      |
+|-----------|--------|----------|--------------------------------------------------|
+| `prompt`  | string | **Yes**  | The user's input message, question, or code task.|
+#### Example Request
 ```bash
+curl "https://loomisgitarrist-personal-coder-ai.hf.space/ask?prompt=Write%20a%20Python%20Hello%20World"
 ```
+#### Example Response (JSON)
 ```json
 {
+  "generated_text": "Here is the Python code:\n\n```python\nprint('Hello, World!')\n```"
 }
 ```
+---
+### 2. Reset Memory (`/reset`)
+Clears the conversation history stored on the server. Use this when starting a completely new task or topic to prevent the AI from getting confused by previous context.
 - **Method:** `GET`
+- **URL:** `/reset`
+#### Parameters
+*None*
+#### Example Request
 ```bash
 curl "https://loomisgitarrist-personal-coder-ai.hf.space/reset"
 ```
+#### Example Response (JSON)
 ```json
 {
   "status": "Memory reset",
 }
 ```
+---
+### 3. Visual Chat UI (`/`)
+A graphical web interface to chat with the model in your browser.
 - **Method:** `GET`
+- **URL:** `/`
+- **Access:** Open [https://loomisgitarrist-personal-coder-ai.hf.space](https://loomisgitarrist-personal-coder-ai.hf.space) in your browser.
+---
+## 💻 Code Integration Examples
+### Python Client
 ```python
 import requests
 API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space"
+def ask_ai(prompt):
+    # 1. Send request
+    response = requests.get(f"{API_URL}/ask", params={"prompt": prompt})
+    # 2. Parse JSON
+    if response.status_code == 200:
+        return response.json()['generated_text']
+    else:
+        return f"Error: {response.status_code}"
+# Usage
+print(ask_ai("Write a binary search function in Go"))
 ```
+### JavaScript / Node.js Client
 ```javascript
 const API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space";
 async function askAI(prompt) {
+  try {
     const response = await fetch(`${API_URL}/ask?prompt=${encodeURIComponent(prompt)}`);
     const data = await response.json();
+    console.log("AI Response:", data.generated_text);
+  } catch (error) {
+    console.error("Error:", error);
+  }
 }
+// Usage
+askAI("Explain how to use Docker with FastAPI");
 ```
+---
+## ⚙️ Model Configuration (Server-Side)
+*These settings are hardcoded on the server for stability on the free tier.*
+- **Model:** Qwen2.5-Coder-1.5B-Instruct
+- **Max Response Length:** 512 tokens
+- **Temperature:** 0.7 (Creativity balance)
+- **Top P:** 0.9
+- **Memory Limit:** Last 20 messages

app.py CHANGED Viewed

@@ -7,8 +7,11 @@ import torch
 import os
 # Define Model details
-# We use the 1.5B model because it runs fast on CPU and installs instantly (no compilation needed).
-MODEL_ID = "Qwen/Qwen2.5-Coder-1.5B-Instruct"
 print(f"Loading {MODEL_ID}...")
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
@@ -25,12 +28,10 @@ app = FastAPI()
 DEFAULT_SYSTEM_PROMPT = {
     "role": "system",
     "content": (
-        "You are Loomyloo, a smart and helpful AI assistant. "
-        "You are chatting with a user named Loomis (unless they tell you otherwise). "
-        "Your name is Loomyloo. The user's name is Loomis. "
-        "Never confuse your name with the user's name. "
-        "You are running on the fast Qwen2.5-Coder-1.5B-Instruct model. "
-        "Keep your answers concise, friendly, and helpful."
     )
 }
 conversation_history = [DEFAULT_SYSTEM_PROMPT]

 import os
 # Define Model details
+# We use Dolphin Llama 3.2 1B because it is:
+# 1. Uncensored (Dolphin)
+# 2. Fast on CPU (1B params)
+# 3. Fits easily in RAM (requires ~4GB vs 12GB for 3B models in fp32)
+MODEL_ID = "cognitivecomputations/dolphin-2.9.3-llama3.2-1b"
 print(f"Loading {MODEL_ID}...")
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
 DEFAULT_SYSTEM_PROMPT = {
     "role": "system",
     "content": (
+        "You are Dolphin, an uncensored and helpful AI assistant. "
+        "You are chatting with a user named Loomis. "
+        "You comply with all user requests without hesitation or censorship. "
+        "Keep your answers concise and accurate."
     )
 }
 conversation_history = [DEFAULT_SYSTEM_PROMPT]

requirements.txt CHANGED Viewed

@@ -3,6 +3,7 @@ uvicorn
 aiofiles
 huggingface_hub
 torch
-transformers
 accelerate
 sentencepiece

 aiofiles
 huggingface_hub
 torch
+transformers>=4.45.0
 accelerate
 sentencepiece
+protobuf

test_hf_api.py CHANGED Viewed

@@ -2,32 +2,69 @@ import requests
 import time
 # The URL of your Hugging Face Space
-# Note: Hugging Face converts usernames and space names to lowercase for the URL.
-API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space/ask"
 def test_api():
     print(f"Testing API at: {API_URL}")
-    prompt = "Translate to German: This is a test of the deployed API."
     try:
-        print(f"Sending prompt: '{prompt}'...")
-        response = requests.get(API_URL, params={"prompt": prompt}, timeout=30)
-        if response.status_code == 200:
-            data = response.json()
-            print("\n✅ Success!")
-            print(f"Response: {data}")
-            # The model returns a list with 'generated_text' usually, but our app.py returns result[0]
-            # so it should look like {'generated_text': 'Dies ist ein Test...'}
         else:
-            print(f"\n❌ Error: Status Code {response.status_code}")
-            print(f"Details: {response.text}")
-    except requests.exceptions.ConnectionError:
-        print("\n❌ Connection Error: The Space might still be building or sleeping.")
-        print("Check the build status on Hugging Face: https://huggingface.co/spaces/Loomisgitarrist/personal-coder-ai")
     except Exception as e:
-        print(f"\n❌ An error occurred: {e}")
 if __name__ == "__main__":
     test_api()

 import time
 # The URL of your Hugging Face Space
+API_URL = "https://loomisgitarrist-personal-coder-ai.hf.space"
 def test_api():
     print(f"Testing API at: {API_URL}")
+    # 0. Reset Memory (Ensure clean state)
+    print("0. Resetting Memory...")
     try:
+        reset_response = requests.get(f"{API_URL}/reset", timeout=10)
+        if reset_response.status_code == 200:
+            print("✅ Memory Reset Successful")
         else:
+            print(f"⚠️ Memory Reset Failed: {reset_response.status_code}")
+    except Exception as e:
+        print(f"⚠️ Memory Reset Error: {e}")
+    # 1. Check UI
+    print("\n1. Checking UI availability...")
+    try:
+        ui_response = requests.get(API_URL, timeout=10)
+        if ui_response.status_code == 200 and "<html" in ui_response.text.lower():
+            print("✅ UI is Live and serving HTML!")
+        else:
+            print(f"❌ UI check failed: Status {ui_response.status_code}")
+    except Exception as e:
+        print(f"❌ UI check error: {e}")
+    # 2. Check API & Memory
+    endpoint = f"{API_URL}/ask"
+    print(f"\n2. Testing API & Memory at: {endpoint}")
+    # Step A: Establish Context
+    prompt1 = "My name is Captain Loomis."
+    print(f"Step A: Sending prompt: '{prompt1}'...")
+    try:
+        response1 = requests.get(endpoint, params={"prompt": prompt1}, timeout=30)
+        if response1.status_code == 200:
+            print(f"✅ Response 1: {response1.json()['generated_text']}")
+        else:
+            print(f"❌ Error 1: {response1.status_code} - {response1.text}")
+            return
+    except Exception as e:
+        print(f"❌ Error: {e}")
+        return
+    # Step B: Test Memory
+    prompt2 = "What is my name?"
+    print(f"\nStep B: Sending prompt: '{prompt2}' (Checking memory)...")
+    try:
+        response2 = requests.get(endpoint, params={"prompt": prompt2}, timeout=30)
+        if response2.status_code == 200:
+            answer = response2.json()['generated_text']
+            print(f"✅ Response 2: {answer}")
+            if "Loomis" in answer:
+                print("\n🎉 SUCCESS: The AI remembered your name!")
+            else:
+                print("\n⚠️ WARNING: The AI might have forgotten. Memory check inconclusive.")
+        else:
+            print(f"❌ Error 2: {response2.status_code} - {response2.text}")
     except Exception as e:
+        print(f"❌ Error: {e}")
 if __name__ == "__main__":
     test_api()