Spaces:
Sleeping
K-AI API Engineering Guide π§
User Instruction: If you (the AI) are reading this file, it contains the CRITICAL knowledge required to maintain this project. Do not ignore these rules.
1. System Architecture
The API uses a Strict Engine (engine.py) that routes requests to providers (g4f, pollinations, zai, gemini, huggingchat).
- Adaptive Fallback: By default (
provider="auto"), the engine tries models inMODEL_RANKINGorder. - Strict Mode: If
modelorprovideris specified, the engine tries ONLY that combination. No fallback.
2. Deployment
A. Vercel (Serverless - Default)
Fast, free, but NO browser support (Z.ai, Gemini, HuggingChat disabled).
See vercel.json and hacks below.
B. Hugging Face Spaces (Docker - Full Browser Support)
Use this for Z.ai, Gemini, HuggingChat (requires browser).
- See README_DOCKER.md
- Supports full browser automation.
Vercel Deployment Hacks (CRITICAL)
Vercel is a serverless environment with a Read-Only Filesystem (except /tmp).
Web scraping libraries like g4f fail because they try to write cookies/cache to ~/.g4f.
The Fix:
We force the entire application to use /tmp as the home directory.
In main.py (Top Level):
import os
# Must be set BEFORE importing g4f
if os.environ.get("VERCEL") or True:
os.environ["HOME"] = "/tmp"
DO NOT REMOVE THIS. It prevents [Errno 30] Read-only file system crashes.
3. Provider Specifics
A. G4F (Scraping Layer)
We use g4f to access free AI models via browser masquerading.
- User-Agent Rotation: Every request gets a fresh, random UA (
useragent.py). - Stateless Clients: New client per request.
- Cookies: Saved to
/tmp(via the Vercel fix).
B. Pollinations (API Layer)
Used for reliable, high-speed models (gpt-oss-20b, mistral).
- No Keys Needed: It's a free public API (
text.pollinations.ai). - Stability: Much more stable than G4F. Used as a fallback or primary for specific models.
C. Z.ai (Browser-Based Provider)
Uses Playwright Chromium to interact with chat.z.ai as a real browser.
- Why Browser: Z.ai uses a proprietary
x-signaturehash generated by obfuscated client-side JS. Rather than reverse-engineering it (fragile), we let the browser execute Z.ai's own JS. - API:
POST /api/v2/chat/completionswith fingerprint query params. - Auth: Guest JWT from
GET /api/v1/auths/(auto, no signup). - Model:
glm-5(default, reasoning model),glm-4-flash. - Key Headers:
x-fe-version: prod-fe-1.0.237,x-signature: <sha256>,Authorization: Bearer <JWT>. - Speed: ~5-15s per request (browser startup + DOM scraping).
- Vercel: DISABLED (no Chromium in serverless). Local/Docker only.
- Files:
providers/zai_provider.py,test_zai_browser.py,zai_captured.json.
D. Gemini (Browser-Based Provider)
Uses Playwright Chromium to interact with gemini.google.com as a real browser.
- Why Browser: Interaction mimics a real user session (Guest mode or Incognito).
- Input:
div[contenteditable="true"]. - Prompt Engineering: Appends
..... answer in plain textto ensure clean output. - Model:
gemini-3-flash(Fast, efficient, web-based). - Files:
providers/gemini_provider.py,test_gemini_browser.py. - Status: Experimental. Requires local Playwright environment.
E. HuggingChat (Browser-Based Provider)
Uses Playwright Chromium to interact with huggingface.co/chat as a real browser.
- Why Browser: HuggingChat provides access to 100+ open-source models via web interface.
- Authentication: Uses credentials (stored in provider) - logs in automatically.
- Session Management: Uses Supabase to persist cookies across redeploys (see Section 7).
- New Conversation Each Time: Clicks "New Chat" to ensure no context sharing between API calls.
- Input:
textareawith placeholder text. - Features:
- Handles the welcome modal automatically (clicks "Start chatting")
- Supports model selection from dropdown
- Access to top models: Llama 3.3 70B, Qwen 2.5 72B, DeepSeek R1, Kimi K2, etc.
- Models (all prefixed with
huggingface-):huggingface-omni- Auto-routes to best model (default)huggingface-llama-3.3-70b- Meta's latest Llama modelhuggingface-qwen-72b- Alibaba's Qwen modelhuggingface-deepseek-r1- DeepSeek reasoning modelhuggingface-kimi-k2- Moonshot's Kimi K2 model
- Files:
providers/huggingchat_provider.py,provider_sessions.py. - Status: Working. Requires local Playwright environment.
- Vercel: DISABLED (no Chromium in serverless). Local/Docker only.
F. Search & Deep Research
The API includes a search engine (search_engine.py) powered by DuckDuckGo (via duckduckgo_search).
/search: Returns raw search results./deep_research: Multi-step process:- Analyzes user query.
- Generates search queries.
- Scrapes results.
- Synthesizes a final answer using the AI Engine.
4. Frontend & Admin
static/docs.html: The public landing page AND the "Try It" dashboard.static/admin.html: Secret admin panel (/qazmlp) for checking stats and running tests.- Stats: Stored in Supabase (persisted across Vercel cold starts).
5. Debugging Tools
We have built-in tools to diagnose issues on Vercel:
/admin/debug_g4f: Runs a live G4F test (gpt-4o-mini,gpt-4) and returns verbose logs.- Note: Uses
AsyncClientto avoid "Event loop already running" errors.
- Note: Uses
/admin/test_all: Runs a parallel check on all configured models.debug_g4f_verbose.py: Local script for deep inspection.debug_huggingchat_visible.py: Launches visible browser to debug HuggingChat interactions.
6. Response Sanitization
The sanitizer.py module cleans AI responses by removing:
- Promotional spam (llmplayground.net, Pollinations ads, etc.)
- UI Artifacts ("Export to Sheets", "Copied", model names like "Kimi-K2-Instruct-0905 via groq")
- JSON double-encoding (some providers wrap responses in JSON)
- Reasoning traces (
<think>tags from DeepSeek and similar)
When adding new providers, check if they inject artifacts and add patterns to SPAM_PATTERNS in sanitizer.py.
7. Provider Session Management (Supabase)
Overview
Browser-based providers (HuggingChat, Z.ai, Gemini) can save their authentication sessions to Supabase. This ensures:
- β Sessions survive redeploys and restarts
- β No repeated login emails
- β Shared session state across multiple workers
Architecture
- Table:
provider_sessions(seesupabase_sessions_schema.sql) - Manager:
provider_sessions.py-ProviderSessionManagerclass - Key Fields:
provider: Provider name (e.g., "huggingchat", "zai")session_data: JSONB with cookies, tokens, etc.conversation_count: Number of API calls mademax_conversations: Limit before requiring re-login (default 50)expires_at: Session expiration timestamp
Usage in Providers
from provider_sessions import get_provider_session_manager
session_mgr = get_provider_session_manager()
# Check if we need to login
if session_mgr.needs_login("huggingchat"):
# Perform login
cookies = await perform_login()
# Save to Supabase
session_mgr.save_session("huggingchat", cookies, conversation_count=0)
else:
# Use existing session
session = session_mgr.get_session("huggingchat")
cookies = session["session_data"]["cookies"]
# After successful API call, increment counter
session_mgr.increment_conversation("huggingchat")
Setup
- Run
supabase_sessions_schema.sqlin Supabase SQL Editor - Ensure
SUPABASE_URLandSUPABASE_KEYare set in environment - Provider automatically uses Supabase for session persistence
Current Implementation Status
- HuggingChat: β Uses Supabase sessions (saves cookies, 50 conversations per login)
- Z.ai: β Not needed (auto-gets guest JWT each time)
- Gemini: β Not needed (no authentication required)
Limits Per Provider
| Provider | Max Conversations | Session Duration |
|---|---|---|
| HuggingChat | 50 | 24 hours |
| Z.ai | 100 | 48 hours |
| Gemini | 100 | 48 hours |
8. Maintenance Workflows
- Adding Models: Run
@.agent/workflows/update.md.- Crucial: Always run
step 3.6(Strict Mode Verification) after updates.
- Crucial: Always run
- Strict Mode Validation: Run
python3 test_strict.py. - Future Candidates:
chat.z.ai: β INTEGRATED (See Section 3C above).- Previous blocker (x-signature) solved via Playwright browser automation.
- Provider:
providers/zai_provider.py, Model:glm-5(Tier 1).
9. Common Issues & Fixes
| Error | Cause | Fix |
|---|---|---|
[Errno 30] Read-only file system |
HOME not set to /tmp |
Ensure os.environ["HOME"] = "/tmp" is at top of main.py. |
Event loop already running |
Sync Client in async handler |
Use g4f.client.AsyncClient. |
Add a "api_key" |
Provider requires auth | The provider (e.g. OpenRouter) is active but we have no key. Use strict mode to avoid it, or rely on ApiAirforce. |
Model not found: auto |
model="auto" passed |
engine.py must handle model="auto" as None. |
| HuggingChat login emails every request | Not using session management | Ensure provider_sessions.py is being used and Supabase table exists. |
| "Start chatting" modal blocking | Welcome modal not dismissed | Provider should click the modal button before finding input. |
| Response contains "Copied" or model names | Sanitization missing | Add UI artifact patterns to sanitizer.py. |
10. Tips & Tricks
Browser-Based Providers (Z.ai, Gemini, HuggingChat)
- Always use headless mode on servers - Visible browser doesn't work on Hugging Face
- Handle modals - Welcome screens block interaction, click them first
- Wait for hydration - JavaScript-heavy sites need 2-3 seconds after page load
- Multiple selectors - Try multiple input selectors (textarea, contenteditable, etc.)
- Check for loading states - Spinners/loading indicators mean content isn't ready
- Use ephemeral contexts - New context per request for isolation, but reuse cookies via Supabase
Model Naming
- Always prefix with provider name (e.g.,
huggingface-,gemini-,zai-) - Use kebab-case (e.g.,
llama-3.3-70b, notLlama_3.3_70b) - Keep it short but descriptive (e.g.,
huggingface-kimi-k2vsmoonshotai-Kimi-K2-Instruct)
Adding New Providers
- Create
providers/<name>_provider.pyinheriting fromBaseProvider - Implement
send_message(),get_available_models(),is_available() - Add models to
config.pyMODEL_RANKING and PROVIDER_MODELS - Import and register in
engine.py - Add documentation to this guide (Section 3)
- Test locally with debug script before deploying
- Consider if session management (Supabase) is needed
Testing
- Always test locally first with
python3 test_<provider>_browser.py - Use visible browser for debugging (
headless=False) to see what's happening - Take screenshots at each step to diagnose issues
- Check logs on Hugging Face Spaces for errors
Last Updated: 2026-02-14