Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

App Files Files Community

jeanbapt commited on Nov 3

Commit

6a4421a

unverified ·

2 Parent(s): 184f293 1e23279

Merge pull request #2 from DealExMachina/test-coderabbit-validation

Browse files

Files changed (16) hide show

.coderabbit.yaml +0 -1
CLEANUP_PLAN.md +0 -155
CLEANUP_SUMMARY.md +0 -190
CODE_REVIEW_SUMMARY.md +0 -119
TEST_CODERABBIT.md +0 -40
app/config.py +26 -4
app/main.py +34 -10
app/middleware.py +30 -10
app/models/openai.py +70 -21
app/providers/base.py +24 -2
app/providers/transformers_provider.py +19 -11
app/routers/openai_api.py +13 -12
app/services/chat_service.py +22 -2
app/utils/constants.py +24 -15
app/utils/helpers.py +3 -3
app/utils/memory.py +14 -3

.coderabbit.yaml CHANGED Viewed

@@ -16,7 +16,6 @@ review:
   simple: false  # Set to true for faster, simpler reviews
   high_level_summary: true
   estimate_time: true
-  project_language: python
 chat:
   enabled: true

   simple: false  # Set to true for faster, simpler reviews
   high_level_summary: true
   estimate_time: true
 chat:
   enabled: true

CLEANUP_PLAN.md DELETED Viewed

@@ -1,155 +0,0 @@
-# Code Cleanup Plan
-## Overview
-This document outlines the cleanup strategy for the simple-llm-pro-finance project to remove obsolete files and improve code organization.
-## Files to Remove
-### 1. Obsolete Test Scripts (Root Directory)
-**Reason:** All functional tests have been moved to `tests/` directory. These are one-off debugging scripts.
-- `analyze_performance.py` - Performance analysis done, results in FINAL_TEST_REPORT.md
-- `debug_chat_template.py` - Debug script, no longer needed
-- `final_clean_test.py` - One-off test
-- `investigate_french_consistency.py` - Investigation complete
-- `quiz_finance_francais.py` - Test script (also in git staging)
-- `test_advanced_finance.py` - Moved to tests/
-- `test_all_fixes.py` - One-off validation
-- `test_debug_endpoint.sh` - Shell test script
-- `test_finance_final.py` - One-off test
-- `test_finance_improved.py` - One-off test
-- `test_finance_queries.py` - One-off test
-- `test_french_direct.py` - One-off test
-- `test_french_final_check.py` - One-off test
-- `test_french_simple.sh` - Shell test script
-- `test_french_strategies.py` - One-off test
-- `test_generation_fix.sh` - Shell test script
-- `test_memory_stress.py` - Moved to tests/
-- `test_quick_french.py` - One-off test
-- `test_service.py` - One-off test
-- `test_system_prompt.py` - One-off test
-- `test_tokenizer_debug.py` - Debug script
-- `test_truncation_issue.py` - One-off test
-**Total:** 21 test files
-### 2. Obsolete Documentation Files
-**Reason:** Superseded by comprehensive final reports.
-- `STATUS.md` - Historical status, superseded by FINAL_STATUS.md
-- `FIXES_SUMMARY.md` - Historical, covered in FINAL_TEST_REPORT.md
-- `PERFORMANCE_REPORT.md` - Covered in FINAL_TEST_REPORT.md
-- `memory_test_results.txt` - Old test results
-- `test_results.txt` - Old test results
-**Total:** 5 documentation files
-### 3. Empty/Debug Code Directories
-**Reason:** Unused or debug-only code.
-- `app/utils/` - Empty directory (only __pycache__)
-- `app/routers/debug.py` - Debug endpoint not needed in production
-**Total:** 1 directory, 1 file
-## Files to Keep
-### Core Application
-- `app/` directory (except items listed for removal)
-  - `main.py` - FastAPI application
-  - `config.py` - Configuration
-  - `middleware.py` - API key authentication
-  - `models/openai.py` - Pydantic models
-  - `providers/base.py` - Provider protocol
-  - `providers/transformers_provider.py` - Main inference engine
-  - `routers/openai_api.py` - OpenAI-compatible API
-  - `services/chat_service.py` - Chat service wrapper
-### Tests
-- `tests/` directory - Proper pytest structure
-  - `conftest.py`
-  - `test_config.py`
-  - `test_middleware.py`
-  - `test_openai_models.py`
-  - `test_openai_routes.py`
-  - `test_providers.py`
-  - `performance/` - Performance benchmarks
-### Documentation
-- `README.md` - Main documentation (needs cleanup)
-- `FINAL_STATUS.md` - Final deployment status
-- `FINAL_TEST_REPORT.md` - Comprehensive test results
-- `LICENSE` - MIT license
-### Configuration & Deployment
-- `Dockerfile` - Docker build configuration
-- `requirements.txt` - Production dependencies
-- `requirements-dev.txt` - Development dependencies
-### Scripts
-- `scripts/validate_hf_readme.py` - Useful validation utility
-- `scripts/README.md` - Scripts documentation
-## Refactoring Needed
-### 1. Remove Debug Router from Production
-**File:** `app/main.py`
-**Change:** Remove debug router import and mount
-```python
-# Remove this line
-app.include_router(debug.router, prefix="/v1")
-```
-### 2. Clean Up README.md
-**File:** `README.md`
-**Changes:**
-- Remove outdated test coverage stats (91% reference)
-- Update to reflect current stable state
-- Simplify configuration section
-- Remove references to obsolete features
-### 3. Remove Empty Utils Directory
-**Directory:** `app/utils/`
-**Action:** Delete the entire directory as it's unused
-## Impact Assessment
-### Breaking Changes
-**None** - All removed files are development/debugging artifacts.
-### Non-Breaking Changes
-- Removing debug endpoint (`/v1/debug/prompt`) - Not documented in README
-- Cleaner project structure
-- Reduced repository size
-### Benefits
-- **Clarity:** Easier to understand project structure
-- **Maintenance:** Fewer files to maintain
-- **Size:** Reduced repo size
-- **Professionalism:** Clean, production-ready codebase
-## Execution Plan
-1. ✅ Create backup branch
-2. ✅ Remove obsolete test files
-3. ✅ Remove obsolete documentation
-4. ✅ Remove debug code
-5. ✅ Update README.md
-6. ✅ Run tests to verify nothing broke
-7. ✅ Commit and push changes
-## Success Criteria
-- ✅ All tests in `tests/` directory still pass
-- ✅ Application still starts and serves requests
-- ✅ README.md is accurate and up-to-date
-- ✅ No broken imports or references
-- ✅ Git history preserved (files deleted, not rewritten)
-## Rollback Plan
-If issues arise:
-1. Git checkout the cleanup branch: `git checkout pre-cleanup-backup`
-2. Review what was removed
-3. Restore only necessary files

CLEANUP_SUMMARY.md DELETED Viewed

@@ -1,190 +0,0 @@
-# Cleanup Summary - November 2, 2025
-## Overview
-Comprehensive codebase cleanup to remove obsolete test scripts, redundant documentation, and debug code from the project.
-## Files Removed
-### Test Scripts (21 files)
-All one-off debugging and validation scripts have been removed. Proper tests remain in `tests/` directory.
-✅ Removed:
-- `analyze_performance.py`
-- `debug_chat_template.py`
-- `final_clean_test.py`
-- `investigate_french_consistency.py`
-- `quiz_finance_francais.py`
-- `test_advanced_finance.py`
-- `test_all_fixes.py`
-- `test_debug_endpoint.sh`
-- `test_finance_final.py`
-- `test_finance_improved.py`
-- `test_finance_queries.py`
-- `test_french_direct.py`
-- `test_french_final_check.py`
-- `test_french_simple.sh`
-- `test_french_strategies.py`
-- `test_generation_fix.sh`
-- `test_memory_stress.py`
-- `test_quick_french.py`
-- `test_service.py`
-- `test_system_prompt.py`
-- `test_tokenizer_debug.py`
-- `test_truncation_issue.py`
-### Documentation Files (5 files)
-Historical documentation superseded by comprehensive final reports.
-✅ Removed:
-- `STATUS.md` (superseded by FINAL_STATUS.md)
-- `FIXES_SUMMARY.md` (covered in FINAL_TEST_REPORT.md)
-- `PERFORMANCE_REPORT.md` (covered in FINAL_TEST_REPORT.md)
-- `memory_test_results.txt` (old test results)
-- `test_results.txt` (old test results)
-### Code Files (2 items)
-Debug code not needed in production.
-✅ Removed:
-- `app/routers/debug.py` - Debug endpoint for prompt inspection
-- `app/utils/` - Empty directory
-## Code Changes
-### Modified: `app/main.py`
-**Before:**
-```python
-from app.routers import openai_api, debug
-...
-app.include_router(debug.router, prefix="/v1")
-```
-**After:**
-```python
-from app.routers import openai_api
-...
-# Debug router removed
-```
-### Modified: `README.md`
-Updated to reflect:
-- Current stable state (production-ready)
-- Accurate feature list
-- Better API examples with realistic max_tokens
-- Chain-of-thought reasoning explanation
-- Language support details
-- Removed outdated test coverage stats
-- Added technical specifications section
-## Project Structure (After Cleanup)
-```
-simple-llm-pro-finance/
-├── app/                          # Core application
-│   ├── config.py                 # Configuration
-│   ├── main.py                   # FastAPI app
-│   ├── middleware.py             # API key auth
-│   ├── models/
-│   │   └── openai.py            # Pydantic models
-│   ├── providers/
-│   │   ├── base.py              # Provider protocol
-│   │   └── transformers_provider.py  # Main inference engine
-│   ├── routers/
-│   │   └── openai_api.py        # OpenAI-compatible API
-│   └── services/
-│       └── chat_service.py      # Chat service wrapper
-├── tests/                        # Proper test suite
-│   ├── conftest.py
-│   ├── test_*.py                # Unit tests
-│   └── performance/             # Performance benchmarks
-├── scripts/                      # Utility scripts
-│   └── validate_hf_readme.py    # README validator
-├── Dockerfile                    # Docker build config
-├── requirements.txt              # Production dependencies
-├── requirements-dev.txt          # Development dependencies
-├── README.md                     # Main documentation
-├── FINAL_STATUS.md              # Deployment status
-├── FINAL_TEST_REPORT.md         # Test results & metrics
-├── CLEANUP_PLAN.md              # This cleanup plan
-└── LICENSE                       # MIT license
-```
-## Impact Assessment
-### Breaking Changes
-**None** - All removed files were development artifacts.
-### Removed Endpoints
-- `/v1/debug/prompt` - Debug endpoint (never documented in README)
-### Benefits
-- ✅ **Cleaner structure** - 28 fewer files in root directory
-- ✅ **Better organization** - Clear separation of concerns
-- ✅ **Easier navigation** - No clutter from obsolete scripts
-- ✅ **Professional appearance** - Production-ready codebase
-- ✅ **Reduced confusion** - No outdated documentation
-- ✅ **Smaller repo size** - Faster clones and deployments
-## Verification
-### Syntax Validation
-✅ All Python files compile successfully:
-- `app/main.py` ✓
-- `app/routers/openai_api.py` ✓
-- `app/services/chat_service.py` ✓
-### Import Structure
-✅ No broken imports detected
-✅ All module dependencies satisfied
-### Test Suite
-✅ Tests remain in `tests/` directory
-✅ Proper pytest structure maintained
-✅ Performance benchmarks preserved
-## Git Status
-### Staged Changes (Existing)
-- `app/providers/transformers_provider.py` (previous work)
-- `quiz_finance_francais.py` (previous work)
-### Unstaged Changes (This Cleanup)
-- Modified: `app/main.py` (removed debug router)
-- Modified: `README.md` (updated documentation)
-- Deleted: 26 obsolete files
-- Added: `CLEANUP_PLAN.md` (this document)
-## Backup
-✅ Backup branch created: `pre-cleanup-backup`
-To restore if needed:
-```bash
-git checkout pre-cleanup-backup
-```
-## Next Steps
-1. ✅ Review changes
-2. ⏳ Stage cleanup changes: `git add -A`
-3. ⏳ Commit: `git commit -m "Clean up: Remove obsolete test scripts and documentation"`
-4. ⏳ Optional: Squash with staged changes
-5. ⏳ Push to repository
-## Success Criteria
-- ✅ All obsolete files removed
-- ✅ Code syntax valid
-- ✅ No broken imports
-- ✅ README updated and accurate
-- ✅ Backup created
-- ✅ Professional project structure
-## Summary
-**Removed:** 28 files (21 test scripts, 5 docs, 2 code files)
-**Modified:** 2 files (main.py, README.md)
-**Added:** 2 files (CLEANUP_PLAN.md, CLEANUP_SUMMARY.md)
-**Net Change:** -24 files
-The codebase is now clean, well-organized, and production-ready! 🎉

CODE_REVIEW_SUMMARY.md DELETED Viewed

@@ -1,119 +0,0 @@
-# Code Review and Cleanup Summary
-**Date:** November 2, 2025
-**Reviewer:** AI Assistant
-**Status:** Complete
-## Executive Summary
-Comprehensive codebase cleanup removing 28 obsolete files and refactoring documentation to be professional and concise.
-## Changes Made
-### Files Removed: 28
-**Test Scripts (21 files):**
-- All one-off test/debug scripts moved or removed
-- Proper tests retained in `tests/` directory
-**Documentation (5 files):**
-- Obsolete status reports superseded by final documentation
-- Old test result files removed
-**Code (2 items):**
-- Debug router removed from production code
-- Empty utils directory removed
-### Files Modified: 2
-**app/main.py:**
-- Removed debug router import and mount
-- Cleaned up for production deployment
-**README.md:**
-- Removed all emojis from section headers
-- Eliminated redundant self-congratulatory content
-- Condensed from 189 to 139 lines
-- Made professional and concise
-- Removed "Features" checklist section
-- Streamlined technical specifications
-- Removed unnecessary "Contributing" section
-### Files Added: 3
-- `CLEANUP_PLAN.md` - Detailed cleanup strategy
-- `CLEANUP_SUMMARY.md` - Execution summary
-- `CODE_REVIEW_SUMMARY.md` - This document
-## Project Structure (After Cleanup)
-```
-simple-llm-pro-finance/
-├── app/                    # Application code
-│   ├── config.py
-│   ├── main.py
-│   ├── middleware.py
-│   ├── models/
-│   ├── providers/
-│   ├── routers/
-│   └── services/
-├── tests/                  # Test suite
-├── scripts/                # Utilities
-├── Dockerfile
-├── requirements.txt
-├── requirements-dev.txt
-├── README.md              # Clean, professional docs
-├── FINAL_STATUS.md
-├── FINAL_TEST_REPORT.md
-└── LICENSE
-```
-## Code Quality Improvements
-**Before:**
-- 50+ files in repository
-- Multiple redundant documentation files
-- Debug endpoints in production code
-- Verbose, emoji-heavy documentation
-- Test scripts scattered in root directory
-**After:**
-- 26 essential files
-- Single source of truth for documentation
-- Production-ready code only
-- Professional, concise documentation
-- Organized test directory structure
-## Verification
-- Python syntax validation: PASSED
-- Import structure: VALID
-- No broken references: CONFIRMED
-- Backup created: `pre-cleanup-backup` branch
-## Impact
-**Breaking Changes:** None
-**Removed Endpoints:** `/v1/debug/prompt` (undocumented)
-**Repository Size:** Reduced by ~24 files
-**Maintainability:** Significantly improved
-## Recommendations
-### Immediate
-1. Review and approve changes
-2. Stage all changes: `git add -A`
-3. Commit with message: "refactor: Clean up codebase - remove obsolete files and improve documentation"
-4. Push to repository
-### Future Considerations
-1. Consider removing `CLEANUP_PLAN.md` and `CLEANUP_SUMMARY.md` after merge
-2. Update `.gitignore` to prevent future test script accumulation
-3. Establish guidelines for temporary debugging files
-## Conclusion
-The codebase is now clean, professional, and production-ready. All obsolete development artifacts have been removed, documentation is concise and accurate, and the project structure is well-organized.
-**Net Result:** -24 files, cleaner code, better documentation.

TEST_CODERABBIT.md DELETED Viewed

@@ -1,40 +0,0 @@
-# Testing CodeRabbit Integration
-## What to do:
-1. **Create a branch:**
-   ```bash
-   git checkout -b test-coderabbit-review
-   ```
-2. **Commit this test file:**
-   ```bash
-   git add TEST_CODERABBIT.md .github/pull_request_template.md
-   git commit -m "test: Add PR template and test CodeRabbit integration"
-   ```
-3. **Push and create PR:**
-   ```bash
-   git push origin test-coderabbit-review
-   ```
-   Then go to GitHub and create a Pull Request from `test-coderabbit-review` to `master`
-4. **Watch for CodeRabbit:**
-   - CodeRabbit should automatically comment on your PR
-   - It will review code quality, suggest improvements
-   - Check for CodeRabbit comments in the PR thread
-## What CodeRabbit will review:
-- Code quality and best practices
-- Potential bugs or security issues
-- Performance optimizations
-- Documentation completeness
-- Test coverage
-## To test more thoroughly:
-After this test, try creating a PR with:
-- A small bug (see if it catches it)
-- Missing error handling
-- Performance issues
-- Security concerns

app/config.py CHANGED Viewed

@@ -1,11 +1,33 @@
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
-    model: str = "DragonLLM/qwen3-8b-fin-v1.0"
-    service_api_key: str | None = None
-    log_level: str = "info"
-    force_model_reload: bool = False  # Set FORCE_MODEL_RELOAD=true to bypass cache on startup
     model_config = SettingsConfigDict(
         env_file=".env",

+"""Application configuration using Pydantic settings."""
+from typing import Literal
+from pydantic import Field
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
+    """Application settings loaded from environment variables.
+    Supports loading from .env file with UTF-8 encoding.
+    All settings can be overridden via environment variables.
+    """
+    model: str = Field(
+        default="DragonLLM/qwen3-8b-fin-v1.0",
+        description="Hugging Face model identifier"
+    )
+    service_api_key: str | None = Field(
+        default=None,
+        description="Optional API key for authentication (SERVICE_API_KEY env var)"
+    )
+    log_level: Literal["debug", "info", "warning", "error"] = Field(
+        default="info",
+        description="Logging level"
+    )
+    force_model_reload: bool = Field(
+        default=False,
+        description="Force model reload from Hugging Face, bypassing cache (FORCE_MODEL_RELOAD env var)"
+    )
     model_config = SettingsConfigDict(
         env_file=".env",

app/main.py CHANGED Viewed

@@ -1,15 +1,24 @@
 from typing import Dict
 from fastapi import FastAPI
 from app.middleware import api_key_guard
 from app.routers import openai_api
-from app.config import settings
-import logging
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-app = FastAPI(title="LLM Pro Finance API (Transformers)")
 # Mount routers
 app.include_router(openai_api.router, prefix="/v1")
@@ -17,10 +26,14 @@ app.include_router(openai_api.router, prefix="/v1")
 # Optional API key middleware
 app.middleware("http")(api_key_guard)
 @app.on_event("startup")
-async def startup_event():
-    """Startup event - initialize model in background"""
-    import threading
     logger.info("Starting LLM Pro Finance API...")
     force_reload = settings.force_model_reload
@@ -29,7 +42,8 @@ async def startup_event():
     logger.info("Initializing model in background thread...")
-    def load_model():
         from app.providers.transformers_provider import initialize_model
         initialize_model(force_reload=force_reload)
@@ -38,20 +52,30 @@ async def startup_event():
     thread.start()
     logger.info("Model initialization started in background")
 @app.get("/")
 async def root() -> Dict[str, str]:
-    """Root endpoint returning API status and information."""
     return {
         "status": "ok",
         "service": "Qwen Open Finance R 8B Inference",
         "version": "1.0.0",
-        "model": "DragonLLM/qwen3-8b-fin-v1.0",
         "backend": "Transformers"
     }
 @app.get("/health")
 async def health() -> Dict[str, str]:
-    """Health check endpoint."""
     return {"status": "healthy", "service": "LLM Pro Finance API"}

+"""Main FastAPI application entry point."""
+import logging
+import threading
 from typing import Dict
 from fastapi import FastAPI
+from app.config import settings
 from app.middleware import api_key_guard
 from app.routers import openai_api
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
+app = FastAPI(
+    title="LLM Pro Finance API (Transformers)",
+    description="OpenAI-compatible API for financial LLM inference",
+    version="1.0.0"
+)
 # Mount routers
 app.include_router(openai_api.router, prefix="/v1")
 # Optional API key middleware
 app.middleware("http")(api_key_guard)
 @app.on_event("startup")
+async def startup_event() -> None:
+    """Startup event - initialize model in background thread.
+    Loads the model asynchronously to avoid blocking the API startup.
+    Model loading happens in a daemon thread so it doesn't prevent shutdown.
+    """
     logger.info("Starting LLM Pro Finance API...")
     force_reload = settings.force_model_reload
     logger.info("Initializing model in background thread...")
+    def load_model() -> None:
+        """Load the model in a background thread."""
         from app.providers.transformers_provider import initialize_model
         initialize_model(force_reload=force_reload)
     thread.start()
     logger.info("Model initialization started in background")
 @app.get("/")
 async def root() -> Dict[str, str]:
+    """Root endpoint returning API status and information.
+    Returns:
+        Dictionary containing API status, service name, version, model, and backend.
+    """
     return {
         "status": "ok",
         "service": "Qwen Open Finance R 8B Inference",
         "version": "1.0.0",
+        "model": settings.model,
         "backend": "Transformers"
     }
 @app.get("/health")
 async def health() -> Dict[str, str]:
+    """Health check endpoint for monitoring and load balancers.
+    Returns:
+        Dictionary with service health status.
+    """
     return {"status": "healthy", "service": "LLM Pro Finance API"}

app/middleware.py CHANGED Viewed

@@ -1,26 +1,46 @@
-from fastapi import Request, HTTPException
-from fastapi.responses import JSONResponse
 from app.config import settings
-async def api_key_guard(request: Request, call_next):
-    # Public endpoints that don't require authentication
-    public_paths = ["/", "/health", "/docs", "/redoc", "/openapi.json"]
     # Skip auth for public endpoints
-    if request.url.path in public_paths:
         return await call_next(request)
     # Skip auth if no API key is configured
     if not settings.service_api_key:
         return await call_next(request)
-    # Check API key
-    key = request.headers.get("x-api-key") or request.headers.get("authorization")
-    if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
         return await call_next(request)
-    return JSONResponse({"error": "unauthorized"}, status_code=401)

+from fastapi import Request
+from fastapi.responses import JSONResponse, Response
+from typing import Callable, Awaitable, Union
 from app.config import settings
+# Public endpoints that don't require authentication
+PUBLIC_PATHS = frozenset(["/", "/health", "/docs", "/redoc", "/openapi.json"])
+async def api_key_guard(request: Request, call_next: Callable[[Request], Awaitable[Response]]) -> Union[Response, JSONResponse]:
+    """
+    Middleware to protect API endpoints with optional API key authentication.
+    Args:
+        request: FastAPI request object
+        call_next: Next middleware/handler in the chain
+    Returns:
+        Response from next handler or 401 if unauthorized
+    """
     # Skip auth for public endpoints
+    if request.url.path in PUBLIC_PATHS:
         return await call_next(request)
     # Skip auth if no API key is configured
     if not settings.service_api_key:
         return await call_next(request)
+    # Check API key from headers
+    api_key = request.headers.get("x-api-key")
+    if not api_key:
+        # Also check Authorization header with Bearer token
+        auth_header = request.headers.get("authorization", "")
+        if auth_header.startswith("Bearer "):
+            api_key = auth_header.replace("Bearer ", "").strip()
+    if api_key and api_key == settings.service_api_key:
         return await call_next(request)
+    return JSONResponse(
+        content={"error": {"message": "unauthorized", "type": "authentication_error"}},
+        status_code=401
+    )

app/models/openai.py CHANGED Viewed

@@ -1,4 +1,7 @@
 from typing import List, Literal, Optional
 from pydantic import BaseModel, Field
@@ -6,42 +9,88 @@ Role = Literal["system", "user", "assistant", "tool"]
 class Message(BaseModel):
     role: Role
-    content: str
 class ChatCompletionRequest(BaseModel):
-    model: Optional[str] = None  # Optional, will use default from config
-    messages: List[Message]
-    temperature: Optional[float] = 0.7
-    max_tokens: Optional[int] = None
-    stream: Optional[bool] = False
-    top_p: Optional[float] = 1.0
 class ChoiceMessage(BaseModel):
-    role: Literal["assistant"]
-    content: Optional[str] = None
 class Choice(BaseModel):
-    index: int
-    message: ChoiceMessage
-    finish_reason: Optional[str] = None
 class Usage(BaseModel):
-    prompt_tokens: int
-    completion_tokens: int
-    total_tokens: int
 class ChatCompletionResponse(BaseModel):
-    id: str
-    object: Literal["chat.completion"] = "chat.completion"
-    created: int
-    model: str
-    choices: List[Choice]
-    usage: Optional[Usage] = None

+"""OpenAI-compatible API models using Pydantic."""
 from typing import List, Literal, Optional
 from pydantic import BaseModel, Field
 class Message(BaseModel):
+    """A single message in a conversation.
+    Attributes:
+        role: The role of the message sender
+        content: The text content of the message
+    """
     role: Role
+    content: str = Field(..., description="Message content")
 class ChatCompletionRequest(BaseModel):
+    """Request model for chat completions endpoint.
+    Attributes:
+        model: Optional model identifier (uses default from config if not provided)
+        messages: List of messages in the conversation
+        temperature: Sampling temperature (0-2)
+        max_tokens: Maximum tokens to generate
+        stream: Whether to stream the response
+        top_p: Nucleus sampling parameter
+    """
+    model: Optional[str] = Field(default=None, description="Model identifier")
+    messages: List[Message] = Field(..., description="Conversation messages")
+    temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
+    max_tokens: Optional[int] = Field(default=None, ge=1, description="Maximum tokens to generate")
+    stream: Optional[bool] = Field(default=False, description="Stream response")
+    top_p: Optional[float] = Field(default=1.0, ge=0.0, le=1.0, description="Nucleus sampling parameter")
 class ChoiceMessage(BaseModel):
+    """Assistant message in a completion choice.
+    Attributes:
+        role: Always "assistant" for completion messages
+        content: The generated message content
+    """
+    role: Literal["assistant"] = "assistant"
+    content: Optional[str] = Field(default=None, description="Generated message content")
 class Choice(BaseModel):
+    """A single completion choice.
+    Attributes:
+        index: Choice index
+        message: The generated message
+        finish_reason: Reason why generation finished (stop, length, etc.)
+    """
+    index: int = Field(..., description="Choice index")
+    message: ChoiceMessage = Field(..., description="Generated message")
+    finish_reason: Optional[str] = Field(default=None, description="Reason for completion")
 class Usage(BaseModel):
+    """Token usage statistics.
+    Attributes:
+        prompt_tokens: Number of tokens in the prompt
+        completion_tokens: Number of tokens in the completion
+        total_tokens: Total tokens used
+    """
+    prompt_tokens: int = Field(..., ge=0, description="Tokens in prompt")
+    completion_tokens: int = Field(..., ge=0, description="Tokens in completion")
+    total_tokens: int = Field(..., ge=0, description="Total tokens used")
 class ChatCompletionResponse(BaseModel):
+    """Response model for chat completions endpoint.
+    Attributes:
+        id: Unique completion ID
+        object: Always "chat.completion"
+        created: Unix timestamp of creation
+        model: Model identifier used
+        choices: List of completion choices
+        usage: Optional token usage statistics
+    """
+    id: str = Field(..., description="Completion ID")
+    object: Literal["chat.completion"] = Field(default="chat.completion", description="Object type")
+    created: int = Field(..., description="Unix timestamp")
+    model: str = Field(..., description="Model identifier")
+    choices: List[Choice] = Field(..., description="Completion choices")
+    usage: Optional[Usage] = Field(default=None, description="Token usage statistics")

app/providers/base.py CHANGED Viewed

@@ -1,11 +1,33 @@
-from typing import Protocol, Dict, Any
 class LLMProvider(Protocol):
     async def list_models(self) -> Dict[str, Any]:
         ...
     async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
         ...

+"""Base protocol for LLM providers."""
+from typing import Any, Dict, Protocol
 class LLMProvider(Protocol):
+    """Protocol defining the interface for LLM providers.
+    Any class implementing this protocol must provide async methods
+    for listing models and generating chat completions.
+    """
     async def list_models(self) -> Dict[str, Any]:
+        """List available models.
+        Returns:
+            Dictionary containing model information.
+        """
         ...
     async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
+        """Generate chat completion.
+        Args:
+            payload: Request payload containing messages and parameters
+            stream: Whether to stream the response
+        Returns:
+            Chat completion response (varies by implementation)
+        """
         ...

app/providers/transformers_provider.py CHANGED Viewed

@@ -3,7 +3,7 @@ import time
 import json
 import logging
 import torch
-from typing import Dict, Any, AsyncIterator, Union
 import asyncio
 from threading import Thread, Lock
 from huggingface_hub import login, hf_hub_download
@@ -386,20 +386,28 @@ class TransformersProvider:
         yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
         yield "data: [DONE]\n\n"
-    def _messages_to_prompt(self, messages: list) -> str:
-        """Convert OpenAI messages format to prompt (fallback)."""
-        prompt = ""
         for message in messages:
-            role = message["role"]
-            content = message["content"]
             if role == "system":
-                prompt += f"System: {content}\n"
             elif role == "user":
-                prompt += f"User: {content}\n"
             elif role == "assistant":
-                prompt += f"Assistant: {content}\n"
-        prompt += "Assistant: "
-        return prompt
 # Module-level provider instance

 import json
 import logging
 import torch
+from typing import Dict, Any, AsyncIterator, Union, List
 import asyncio
 from threading import Thread, Lock
 from huggingface_hub import login, hf_hub_download
         yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
         yield "data: [DONE]\n\n"
+    def _messages_to_prompt(self, messages: List[Dict[str, str]]) -> str:
+        """
+        Convert OpenAI messages format to prompt (fallback).
+        Args:
+            messages: List of message dictionaries with 'role' and 'content'
+        Returns:
+            Formatted prompt string
+        """
+        prompt_parts = []
         for message in messages:
+            role = message.get("role", "user")
+            content = message.get("content", "")
             if role == "system":
+                prompt_parts.append(f"System: {content}")
             elif role == "user":
+                prompt_parts.append(f"User: {content}")
             elif role == "assistant":
+                prompt_parts.append(f"Assistant: {content}")
+        prompt_parts.append("Assistant: ")
+        return "\n".join(prompt_parts)
 # Module-level provider instance

app/routers/openai_api.py CHANGED Viewed

@@ -1,4 +1,4 @@
-from typing import Any, Dict
 import logging
 from fastapi import APIRouter, Query
@@ -15,13 +15,13 @@ router = APIRouter()
 @router.get("/models")
-async def list_models():
     """List available models (OpenAI-compatible endpoint)"""
     return await chat_service.list_models()
 @router.post("/models/reload")
-async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")):
     """
     Reload the model from cache or Hugging Face Hub.
@@ -51,7 +51,7 @@ async def reload_model(force: bool = Query(False, description="Force reload from
 @router.post("/chat/completions")
-async def chat_completions(body: ChatCompletionRequest):
     """Chat completions endpoint (OpenAI-compatible)"""
     try:
         # Validate messages list is not empty
@@ -61,22 +61,23 @@ async def chat_completions(body: ChatCompletionRequest):
                 content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
             )
         # Build payload with all supported parameters
         payload: Dict[str, Any] = {
             "model": body.model or settings.model,
             "messages": [m.model_dump() for m in body.messages],
-            "temperature": body.temperature or 0.7,
             "top_p": body.top_p or 1.0,
             "stream": body.stream or False,
         }
-        # Validate temperature range
-        if payload["temperature"] < 0 or payload["temperature"] > 2:
-            return JSONResponse(
-                status_code=400,
-                content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
-            )
         # Add optional max_tokens if provided
         if body.max_tokens is not None:
             if body.max_tokens < 1:

+from typing import Any, Dict, Union
 import logging
 from fastapi import APIRouter, Query
 @router.get("/models")
+async def list_models() -> Dict[str, Any]:
     """List available models (OpenAI-compatible endpoint)"""
     return await chat_service.list_models()
 @router.post("/models/reload")
+async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")) -> JSONResponse:
     """
     Reload the model from cache or Hugging Face Hub.
 @router.post("/chat/completions")
+async def chat_completions(body: ChatCompletionRequest) -> Union[JSONResponse, StreamingResponse]:
     """Chat completions endpoint (OpenAI-compatible)"""
     try:
         # Validate messages list is not empty
                 content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
             )
+        # Validate temperature range before building payload
+        temperature = body.temperature or 0.7
+        if temperature < 0 or temperature > 2:
+            return JSONResponse(
+                status_code=400,
+                content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
+            )
         # Build payload with all supported parameters
         payload: Dict[str, Any] = {
             "model": body.model or settings.model,
             "messages": [m.model_dump() for m in body.messages],
+            "temperature": temperature,
             "top_p": body.top_p or 1.0,
             "stream": body.stream or False,
         }
         # Add optional max_tokens if provided
         if body.max_tokens is not None:
             if body.max_tokens < 1:

app/services/chat_service.py CHANGED Viewed

@@ -1,13 +1,33 @@
-from typing import Any, Dict
 from app.providers import transformers_provider as provider
 async def list_models() -> Dict[str, Any]:
     return await provider.list_models()
-async def chat(payload: Dict[str, Any], stream: bool = False):
     return await provider.chat(payload, stream=stream)

+"""Chat service layer providing abstraction over the provider."""
+from typing import Any, Dict, Union, AsyncIterator
 from app.providers import transformers_provider as provider
 async def list_models() -> Dict[str, Any]:
+    """
+    List available models.
+    Returns:
+        Dictionary containing model list in OpenAI-compatible format
+    """
     return await provider.list_models()
+async def chat(
+    payload: Dict[str, Any],
+    stream: bool = False
+) -> Union[Dict[str, Any], AsyncIterator[str]]:
+    """
+    Process chat completion request.
+    Args:
+        payload: Request payload containing messages and generation parameters
+        stream: Whether to stream the response
+    Returns:
+        Response dictionary or async iterator for streaming
+    """
     return await provider.chat(payload, stream=stream)

app/utils/constants.py CHANGED Viewed

@@ -1,18 +1,25 @@
-"""Application-wide constants."""
 import os
 # Model configuration
-MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
 # Cache directory - respect HF_HOME if set, otherwise use default
-CACHE_DIR = os.getenv("HF_HOME", "/tmp/huggingface")
 # Hugging Face token environment variable priority order
-HF_TOKEN_VARS = ["HF_TOKEN_LC2", "HF_TOKEN_LC", "HF_TOKEN", "HUGGING_FACE_HUB_TOKEN"]
 # French language detection patterns
-FRENCH_PHRASES = [
     "en français",
     "répondez en français",
     "réponse française",
@@ -20,9 +27,11 @@ FRENCH_PHRASES = [
     "expliquez en français",
 ]
-FRENCH_CHARS = ["é", "è", "ê", "à", "ç", "ù", "ô", "î", "â", "û", "ë", "ï"]
-FRENCH_PATTERNS = [
     "qu'est-ce",
     "qu'est",
     "expliquez",
@@ -38,7 +47,7 @@ FRENCH_PATTERNS = [
     "définissez",
 ]
-FRENCH_SYSTEM_PROMPT = (
     "Vous êtes un assistant financier expert. "
     "Répondez TOUJOURS en français. "
     "Soyez concis et précis dans vos explications. "
@@ -46,13 +55,13 @@ FRENCH_SYSTEM_PROMPT = (
 )
 # Qwen3 EOS tokens
-EOS_TOKENS = [151645, 151643]  # [<|im_end|>, <|endoftext|>]
-PAD_TOKEN_ID = 151643  # <|endoftext|>
 # Generation defaults
-DEFAULT_MAX_TOKENS = 1000  # Increased for complete answers with concise reasoning
-DEFAULT_TEMPERATURE = 0.7
-DEFAULT_TOP_P = 1.0
-DEFAULT_TOP_K = 20
-REPETITION_PENALTY = 1.05

+"""Application-wide constants and configuration."""
 import os
+from typing import Final, List
 # Model configuration
+MODEL_NAME: Final[str] = "DragonLLM/qwen3-8b-fin-v1.0"
 # Cache directory - respect HF_HOME if set, otherwise use default
+CACHE_DIR: Final[str] = os.getenv("HF_HOME", "/tmp/huggingface")
 # Hugging Face token environment variable priority order
+HF_TOKEN_VARS: Final[List[str]] = [
+    "HF_TOKEN_LC2",
+    "HF_TOKEN_LC",
+    "HF_TOKEN",
+    "HUGGING_FACE_HUB_TOKEN"
+]
 # French language detection patterns
+FRENCH_PHRASES: Final[List[str]] = [
     "en français",
     "répondez en français",
     "réponse française",
     "expliquez en français",
 ]
+FRENCH_CHARS: Final[List[str]] = [
+    "é", "è", "ê", "à", "ç", "ù", "ô", "î", "â", "û", "ë", "ï"
+]
+FRENCH_PATTERNS: Final[List[str]] = [
     "qu'est-ce",
     "qu'est",
     "expliquez",
     "définissez",
 ]
+FRENCH_SYSTEM_PROMPT: Final[str] = (
     "Vous êtes un assistant financier expert. "
     "Répondez TOUJOURS en français. "
     "Soyez concis et précis dans vos explications. "
 )
 # Qwen3 EOS tokens
+EOS_TOKENS: Final[List[int]] = [151645, 151643]  # [<|im_end|>, <|endoftext|>]
+PAD_TOKEN_ID: Final[int] = 151643  # <|endoftext|>
 # Generation defaults
+DEFAULT_MAX_TOKENS: Final[int] = 1000  # Increased for complete answers with concise reasoning
+DEFAULT_TEMPERATURE: Final[float] = 0.7
+DEFAULT_TOP_P: Final[float] = 1.0
+DEFAULT_TOP_K: Final[int] = 20
+REPETITION_PENALTY: Final[float] = 1.05

app/utils/helpers.py CHANGED Viewed

@@ -2,7 +2,7 @@
 import os
 import logging
-from typing import Optional, Tuple
 from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
@@ -24,7 +24,7 @@ def get_hf_token() -> Tuple[Optional[str], str]:
     return None, "none"
-def is_french_request(messages: list) -> bool:
     """
     Detect if the request is in French based on user messages.
@@ -55,7 +55,7 @@ def is_french_request(messages: list) -> bool:
     return False
-def has_french_system_prompt(messages: list) -> bool:
     """Check if messages already contain a French system prompt."""
     return any(
         "français" in msg.get("content", "").lower()

 import os
 import logging
+from typing import Optional, Tuple, List, Dict, Any
 from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
     return None, "none"
+def is_french_request(messages: List[Dict[str, Any]]) -> bool:
     """
     Detect if the request is in French based on user messages.
     return False
+def has_french_system_prompt(messages: List[Dict[str, Any]]) -> bool:
     """Check if messages already contain a French system prompt."""
     return any(
         "français" in msg.get("content", "").lower()

app/utils/memory.py CHANGED Viewed

@@ -1,12 +1,23 @@
 """GPU memory management utilities."""
 import gc
 import torch
-from typing import Optional
-def clear_gpu_memory(model=None, tokenizer=None):
-    """Clear GPU memory completely."""
     if not torch.cuda.is_available():
         return

 """GPU memory management utilities."""
 import gc
+from typing import Optional, Any
 import torch
+def clear_gpu_memory(model: Optional[Any] = None, tokenizer: Optional[Any] = None) -> None:
+    """Clear GPU memory completely.
+    This function performs aggressive GPU memory cleanup by:
+    1. Deleting model and tokenizer objects if provided
+    2. Clearing CUDA cache
+    3. Running multiple garbage collection passes
+    Args:
+        model: Optional model object to delete
+        tokenizer: Optional tokenizer object to delete
+    """
     if not torch.cuda.is_available():
         return