--- title: Open Finance LLM 8B emoji: 🐉 colorFrom: red colorTo: red sdk: docker pinned: false app_port: 7860 suggested_hardware: l4x1 --- # Open Finance LLM 8B OpenAI-compatible API powered by DragonLLM/Qwen-Open-Finance-R-8B. ## Deployment | Platform | Backend | Dockerfile | Use Case | |----------|---------|------------|----------| | Hugging Face Spaces | Transformers | `Dockerfile` | Development, L4 GPU | | Koyeb | vLLM | `Dockerfile.koyeb` | Production, L40s GPU | ## Features - OpenAI-compatible API - Tool/function calling support - Streaming responses - Rate limiting (30 req/min, 500 req/hour) - Statistics tracking via `/v1/stats` ## Quick Start ```bash curl -X POST "https://your-endpoint/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "DragonLLM/Qwen-Open-Finance-R-8B", "messages": [{"role": "user", "content": "What is compound interest?"}], "max_tokens": 500 }' ``` ```python from openai import OpenAI client = OpenAI(base_url="https://your-endpoint/v1", api_key="not-needed") response = client.chat.completions.create( model="DragonLLM/Qwen-Open-Finance-R-8B", messages=[{"role": "user", "content": "What is compound interest?"}], max_tokens=500 ) ``` ## Configuration | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `HF_TOKEN_LC2` | Yes | - | Hugging Face token | | `MODEL` | No | `DragonLLM/Qwen-Open-Finance-R-8B` | Model name | | `PORT` | No | `8000` (vLLM) / `7860` (Transformers) | Server port | **vLLM-specific (Koyeb):** - `ENABLE_AUTO_TOOL_CHOICE=true` - Enable tool calling - `TOOL_CALL_PARSER=hermes` - Parser for Qwen models - `MAX_MODEL_LEN=8192` - Max context length - `GPU_MEMORY_UTILIZATION=0.90` - GPU memory fraction ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/models` | GET | List available models | | `/v1/chat/completions` | POST | Chat completion | | `/v1/stats` | GET | Usage statistics | | `/health` | GET | Health check | ## Technical Specs - **Model**: DragonLLM/Qwen-Open-Finance-R-8B (8B parameters) - **vLLM Backend**: vllm-openai:latest with hermes tool parser - **Transformers Backend**: 4.45.0+ with PyTorch 2.5.0+ (CUDA 12.4) - **Minimum VRAM**: 20GB (L4), recommended 48GB (L40s) ## Development ```bash pip install -r requirements.txt uvicorn app.main:app --reload --port 8080 pytest tests/ -v ``` ## License MIT License