Fix device placement for tokenizer outputs before model inference 64c014e jeanbaptdzd commited on Nov 23
Refactor: Address code shortcomings and align with HF best practices dc14519 jeanbaptdzd commited on Nov 23
Fix OpenAI API compatibility: support tool_choice='required' and response_format a82e45b jeanbaptdzd commited on Nov 19
feat: Add rate limiting, stats tracking, and fix critical issues 67befa7 jeanbaptdzd commited on Nov 17
Refactor: Remove RAG, upgrade vLLM 0.9.2, add optimization mode da484d7 jeanbaptdzd commited on Nov 2
feat: FastAPI vLLM service with OpenAI-compatible endpoints and PRIIPs extractor 6851411 jeanbaptdzd commited on Oct 28