open-finance-llm-8b / README.md
jeanbaptdzd's picture
Cleanup: remove redundant docs, condense README
7a92d8e
metadata
title: Open Finance LLM 8B
emoji: πŸ‰
colorFrom: red
colorTo: red
sdk: docker
pinned: false
app_port: 7860
suggested_hardware: l4x1

Open Finance LLM 8B

OpenAI-compatible API powered by DragonLLM/Qwen-Open-Finance-R-8B.

Deployment

Platform Backend Dockerfile Use Case
Hugging Face Spaces Transformers Dockerfile Development, L4 GPU
Koyeb vLLM Dockerfile.koyeb Production, L40s GPU

Features

  • OpenAI-compatible API
  • Tool/function calling support
  • Streaming responses
  • Rate limiting (30 req/min, 500 req/hour)
  • Statistics tracking via /v1/stats

Quick Start

curl -X POST "https://your-endpoint/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DragonLLM/Qwen-Open-Finance-R-8B",
    "messages": [{"role": "user", "content": "What is compound interest?"}],
    "max_tokens": 500
  }'
from openai import OpenAI

client = OpenAI(base_url="https://your-endpoint/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="DragonLLM/Qwen-Open-Finance-R-8B",
    messages=[{"role": "user", "content": "What is compound interest?"}],
    max_tokens=500
)

Configuration

Variable Required Default Description
HF_TOKEN_LC2 Yes - Hugging Face token
MODEL No DragonLLM/Qwen-Open-Finance-R-8B Model name
PORT No 8000 (vLLM) / 7860 (Transformers) Server port

vLLM-specific (Koyeb):

  • ENABLE_AUTO_TOOL_CHOICE=true - Enable tool calling
  • TOOL_CALL_PARSER=hermes - Parser for Qwen models
  • MAX_MODEL_LEN=8192 - Max context length
  • GPU_MEMORY_UTILIZATION=0.90 - GPU memory fraction

API Endpoints

Endpoint Method Description
/v1/models GET List available models
/v1/chat/completions POST Chat completion
/v1/stats GET Usage statistics
/health GET Health check

Technical Specs

  • Model: DragonLLM/Qwen-Open-Finance-R-8B (8B parameters)
  • vLLM Backend: vllm-openai:latest with hermes tool parser
  • Transformers Backend: 4.45.0+ with PyTorch 2.5.0+ (CUDA 12.4)
  • Minimum VRAM: 20GB (L4), recommended 48GB (L40s)

Development

pip install -r requirements.txt
uvicorn app.main:app --reload --port 8080
pytest tests/ -v

License

MIT License