Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

App Files Files Community

open-finance-llm-8b / README.md

jeanbaptdzd

Cleanup: remove redundant docs, condense README

7a92d8e 14 days ago

preview code

raw

history blame contribute delete

2.46 kB

metadata

title: Open Finance LLM 8B
emoji: 🐉
colorFrom: red
colorTo: red
sdk: docker
pinned: false
app_port: 7860
suggested_hardware: l4x1

Open Finance LLM 8B

OpenAI-compatible API powered by DragonLLM/Qwen-Open-Finance-R-8B.

Deployment

Platform	Backend	Dockerfile	Use Case
Hugging Face Spaces	Transformers	`Dockerfile`	Development, L4 GPU
Koyeb	vLLM	`Dockerfile.koyeb`	Production, L40s GPU

Features

OpenAI-compatible API
Tool/function calling support
Streaming responses
Rate limiting (30 req/min, 500 req/hour)
Statistics tracking via /v1/stats

Quick Start

curl -X POST "https://your-endpoint/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DragonLLM/Qwen-Open-Finance-R-8B",
    "messages": [{"role": "user", "content": "What is compound interest?"}],
    "max_tokens": 500
  }'

from openai import OpenAI

client = OpenAI(base_url="https://your-endpoint/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="DragonLLM/Qwen-Open-Finance-R-8B",
    messages=[{"role": "user", "content": "What is compound interest?"}],
    max_tokens=500
)

Configuration

Variable	Required	Default	Description
`HF_TOKEN_LC2`	Yes	-	Hugging Face token
`MODEL`	No	`DragonLLM/Qwen-Open-Finance-R-8B`	Model name
`PORT`	No	`8000` (vLLM) / `7860` (Transformers)	Server port

vLLM-specific (Koyeb):

ENABLE_AUTO_TOOL_CHOICE=true - Enable tool calling
TOOL_CALL_PARSER=hermes - Parser for Qwen models
MAX_MODEL_LEN=8192 - Max context length
GPU_MEMORY_UTILIZATION=0.90 - GPU memory fraction

API Endpoints

Endpoint	Method	Description
`/v1/models`	GET	List available models
`/v1/chat/completions`	POST	Chat completion
`/v1/stats`	GET	Usage statistics
`/health`	GET	Health check

Technical Specs

Model: DragonLLM/Qwen-Open-Finance-R-8B (8B parameters)
vLLM Backend: vllm-openai:latest with hermes tool parser
Transformers Backend: 4.45.0+ with PyTorch 2.5.0+ (CUDA 12.4)
Minimum VRAM: 20GB (L4), recommended 48GB (L40s)

Development

pip install -r requirements.txt
uvicorn app.main:app --reload --port 8080
pytest tests/ -v

License

MIT License