jeanbapt commited on
Commit
6a4421a
·
unverified ·
2 Parent(s): 184f293 1e23279

Merge pull request #2 from DealExMachina/test-coderabbit-validation

Browse files
.coderabbit.yaml CHANGED
@@ -16,7 +16,6 @@ review:
16
  simple: false # Set to true for faster, simpler reviews
17
  high_level_summary: true
18
  estimate_time: true
19
- project_language: python
20
 
21
  chat:
22
  enabled: true
 
16
  simple: false # Set to true for faster, simpler reviews
17
  high_level_summary: true
18
  estimate_time: true
 
19
 
20
  chat:
21
  enabled: true
CLEANUP_PLAN.md DELETED
@@ -1,155 +0,0 @@
1
- # Code Cleanup Plan
2
-
3
- ## Overview
4
- This document outlines the cleanup strategy for the simple-llm-pro-finance project to remove obsolete files and improve code organization.
5
-
6
- ## Files to Remove
7
-
8
- ### 1. Obsolete Test Scripts (Root Directory)
9
- **Reason:** All functional tests have been moved to `tests/` directory. These are one-off debugging scripts.
10
-
11
- - `analyze_performance.py` - Performance analysis done, results in FINAL_TEST_REPORT.md
12
- - `debug_chat_template.py` - Debug script, no longer needed
13
- - `final_clean_test.py` - One-off test
14
- - `investigate_french_consistency.py` - Investigation complete
15
- - `quiz_finance_francais.py` - Test script (also in git staging)
16
- - `test_advanced_finance.py` - Moved to tests/
17
- - `test_all_fixes.py` - One-off validation
18
- - `test_debug_endpoint.sh` - Shell test script
19
- - `test_finance_final.py` - One-off test
20
- - `test_finance_improved.py` - One-off test
21
- - `test_finance_queries.py` - One-off test
22
- - `test_french_direct.py` - One-off test
23
- - `test_french_final_check.py` - One-off test
24
- - `test_french_simple.sh` - Shell test script
25
- - `test_french_strategies.py` - One-off test
26
- - `test_generation_fix.sh` - Shell test script
27
- - `test_memory_stress.py` - Moved to tests/
28
- - `test_quick_french.py` - One-off test
29
- - `test_service.py` - One-off test
30
- - `test_system_prompt.py` - One-off test
31
- - `test_tokenizer_debug.py` - Debug script
32
- - `test_truncation_issue.py` - One-off test
33
-
34
- **Total:** 21 test files
35
-
36
- ### 2. Obsolete Documentation Files
37
- **Reason:** Superseded by comprehensive final reports.
38
-
39
- - `STATUS.md` - Historical status, superseded by FINAL_STATUS.md
40
- - `FIXES_SUMMARY.md` - Historical, covered in FINAL_TEST_REPORT.md
41
- - `PERFORMANCE_REPORT.md` - Covered in FINAL_TEST_REPORT.md
42
- - `memory_test_results.txt` - Old test results
43
- - `test_results.txt` - Old test results
44
-
45
- **Total:** 5 documentation files
46
-
47
- ### 3. Empty/Debug Code Directories
48
- **Reason:** Unused or debug-only code.
49
-
50
- - `app/utils/` - Empty directory (only __pycache__)
51
- - `app/routers/debug.py` - Debug endpoint not needed in production
52
-
53
- **Total:** 1 directory, 1 file
54
-
55
- ## Files to Keep
56
-
57
- ### Core Application
58
- - `app/` directory (except items listed for removal)
59
- - `main.py` - FastAPI application
60
- - `config.py` - Configuration
61
- - `middleware.py` - API key authentication
62
- - `models/openai.py` - Pydantic models
63
- - `providers/base.py` - Provider protocol
64
- - `providers/transformers_provider.py` - Main inference engine
65
- - `routers/openai_api.py` - OpenAI-compatible API
66
- - `services/chat_service.py` - Chat service wrapper
67
-
68
- ### Tests
69
- - `tests/` directory - Proper pytest structure
70
- - `conftest.py`
71
- - `test_config.py`
72
- - `test_middleware.py`
73
- - `test_openai_models.py`
74
- - `test_openai_routes.py`
75
- - `test_providers.py`
76
- - `performance/` - Performance benchmarks
77
-
78
- ### Documentation
79
- - `README.md` - Main documentation (needs cleanup)
80
- - `FINAL_STATUS.md` - Final deployment status
81
- - `FINAL_TEST_REPORT.md` - Comprehensive test results
82
- - `LICENSE` - MIT license
83
-
84
- ### Configuration & Deployment
85
- - `Dockerfile` - Docker build configuration
86
- - `requirements.txt` - Production dependencies
87
- - `requirements-dev.txt` - Development dependencies
88
-
89
- ### Scripts
90
- - `scripts/validate_hf_readme.py` - Useful validation utility
91
- - `scripts/README.md` - Scripts documentation
92
-
93
- ## Refactoring Needed
94
-
95
- ### 1. Remove Debug Router from Production
96
- **File:** `app/main.py`
97
- **Change:** Remove debug router import and mount
98
- ```python
99
- # Remove this line
100
- app.include_router(debug.router, prefix="/v1")
101
- ```
102
-
103
- ### 2. Clean Up README.md
104
- **File:** `README.md`
105
- **Changes:**
106
- - Remove outdated test coverage stats (91% reference)
107
- - Update to reflect current stable state
108
- - Simplify configuration section
109
- - Remove references to obsolete features
110
-
111
- ### 3. Remove Empty Utils Directory
112
- **Directory:** `app/utils/`
113
- **Action:** Delete the entire directory as it's unused
114
-
115
- ## Impact Assessment
116
-
117
- ### Breaking Changes
118
- **None** - All removed files are development/debugging artifacts.
119
-
120
- ### Non-Breaking Changes
121
- - Removing debug endpoint (`/v1/debug/prompt`) - Not documented in README
122
- - Cleaner project structure
123
- - Reduced repository size
124
-
125
- ### Benefits
126
- - **Clarity:** Easier to understand project structure
127
- - **Maintenance:** Fewer files to maintain
128
- - **Size:** Reduced repo size
129
- - **Professionalism:** Clean, production-ready codebase
130
-
131
- ## Execution Plan
132
-
133
- 1. ✅ Create backup branch
134
- 2. ✅ Remove obsolete test files
135
- 3. ✅ Remove obsolete documentation
136
- 4. ✅ Remove debug code
137
- 5. ✅ Update README.md
138
- 6. ✅ Run tests to verify nothing broke
139
- 7. ✅ Commit and push changes
140
-
141
- ## Success Criteria
142
-
143
- - ✅ All tests in `tests/` directory still pass
144
- - ✅ Application still starts and serves requests
145
- - ✅ README.md is accurate and up-to-date
146
- - ✅ No broken imports or references
147
- - ✅ Git history preserved (files deleted, not rewritten)
148
-
149
- ## Rollback Plan
150
-
151
- If issues arise:
152
- 1. Git checkout the cleanup branch: `git checkout pre-cleanup-backup`
153
- 2. Review what was removed
154
- 3. Restore only necessary files
155
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CLEANUP_SUMMARY.md DELETED
@@ -1,190 +0,0 @@
1
- # Cleanup Summary - November 2, 2025
2
-
3
- ## Overview
4
- Comprehensive codebase cleanup to remove obsolete test scripts, redundant documentation, and debug code from the project.
5
-
6
- ## Files Removed
7
-
8
- ### Test Scripts (21 files)
9
- All one-off debugging and validation scripts have been removed. Proper tests remain in `tests/` directory.
10
-
11
- ✅ Removed:
12
- - `analyze_performance.py`
13
- - `debug_chat_template.py`
14
- - `final_clean_test.py`
15
- - `investigate_french_consistency.py`
16
- - `quiz_finance_francais.py`
17
- - `test_advanced_finance.py`
18
- - `test_all_fixes.py`
19
- - `test_debug_endpoint.sh`
20
- - `test_finance_final.py`
21
- - `test_finance_improved.py`
22
- - `test_finance_queries.py`
23
- - `test_french_direct.py`
24
- - `test_french_final_check.py`
25
- - `test_french_simple.sh`
26
- - `test_french_strategies.py`
27
- - `test_generation_fix.sh`
28
- - `test_memory_stress.py`
29
- - `test_quick_french.py`
30
- - `test_service.py`
31
- - `test_system_prompt.py`
32
- - `test_tokenizer_debug.py`
33
- - `test_truncation_issue.py`
34
-
35
- ### Documentation Files (5 files)
36
- Historical documentation superseded by comprehensive final reports.
37
-
38
- ✅ Removed:
39
- - `STATUS.md` (superseded by FINAL_STATUS.md)
40
- - `FIXES_SUMMARY.md` (covered in FINAL_TEST_REPORT.md)
41
- - `PERFORMANCE_REPORT.md` (covered in FINAL_TEST_REPORT.md)
42
- - `memory_test_results.txt` (old test results)
43
- - `test_results.txt` (old test results)
44
-
45
- ### Code Files (2 items)
46
- Debug code not needed in production.
47
-
48
- ✅ Removed:
49
- - `app/routers/debug.py` - Debug endpoint for prompt inspection
50
- - `app/utils/` - Empty directory
51
-
52
- ## Code Changes
53
-
54
- ### Modified: `app/main.py`
55
- **Before:**
56
- ```python
57
- from app.routers import openai_api, debug
58
- ...
59
- app.include_router(debug.router, prefix="/v1")
60
- ```
61
-
62
- **After:**
63
- ```python
64
- from app.routers import openai_api
65
- ...
66
- # Debug router removed
67
- ```
68
-
69
- ### Modified: `README.md`
70
- Updated to reflect:
71
- - Current stable state (production-ready)
72
- - Accurate feature list
73
- - Better API examples with realistic max_tokens
74
- - Chain-of-thought reasoning explanation
75
- - Language support details
76
- - Removed outdated test coverage stats
77
- - Added technical specifications section
78
-
79
- ## Project Structure (After Cleanup)
80
-
81
- ```
82
- simple-llm-pro-finance/
83
- ├── app/ # Core application
84
- │ ├── config.py # Configuration
85
- │ ├── main.py # FastAPI app
86
- │ ├── middleware.py # API key auth
87
- │ ├── models/
88
- │ │ └── openai.py # Pydantic models
89
- │ ├── providers/
90
- │ │ ├── base.py # Provider protocol
91
- │ │ └── transformers_provider.py # Main inference engine
92
- │ ├── routers/
93
- │ │ └── openai_api.py # OpenAI-compatible API
94
- │ └── services/
95
- │ └── chat_service.py # Chat service wrapper
96
- ├── tests/ # Proper test suite
97
- │ ├── conftest.py
98
- │ ├── test_*.py # Unit tests
99
- │ └── performance/ # Performance benchmarks
100
- ├── scripts/ # Utility scripts
101
- │ └── validate_hf_readme.py # README validator
102
- ├── Dockerfile # Docker build config
103
- ├── requirements.txt # Production dependencies
104
- ├── requirements-dev.txt # Development dependencies
105
- ├── README.md # Main documentation
106
- ├── FINAL_STATUS.md # Deployment status
107
- ├── FINAL_TEST_REPORT.md # Test results & metrics
108
- ├── CLEANUP_PLAN.md # This cleanup plan
109
- └── LICENSE # MIT license
110
- ```
111
-
112
- ## Impact Assessment
113
-
114
- ### Breaking Changes
115
- **None** - All removed files were development artifacts.
116
-
117
- ### Removed Endpoints
118
- - `/v1/debug/prompt` - Debug endpoint (never documented in README)
119
-
120
- ### Benefits
121
- - ✅ **Cleaner structure** - 28 fewer files in root directory
122
- - ✅ **Better organization** - Clear separation of concerns
123
- - ✅ **Easier navigation** - No clutter from obsolete scripts
124
- - ✅ **Professional appearance** - Production-ready codebase
125
- - ✅ **Reduced confusion** - No outdated documentation
126
- - ✅ **Smaller repo size** - Faster clones and deployments
127
-
128
- ## Verification
129
-
130
- ### Syntax Validation
131
- ✅ All Python files compile successfully:
132
- - `app/main.py` ✓
133
- - `app/routers/openai_api.py` ✓
134
- - `app/services/chat_service.py` ✓
135
-
136
- ### Import Structure
137
- ✅ No broken imports detected
138
- ✅ All module dependencies satisfied
139
-
140
- ### Test Suite
141
- ✅ Tests remain in `tests/` directory
142
- ✅ Proper pytest structure maintained
143
- ✅ Performance benchmarks preserved
144
-
145
- ## Git Status
146
-
147
- ### Staged Changes (Existing)
148
- - `app/providers/transformers_provider.py` (previous work)
149
- - `quiz_finance_francais.py` (previous work)
150
-
151
- ### Unstaged Changes (This Cleanup)
152
- - Modified: `app/main.py` (removed debug router)
153
- - Modified: `README.md` (updated documentation)
154
- - Deleted: 26 obsolete files
155
- - Added: `CLEANUP_PLAN.md` (this document)
156
-
157
- ## Backup
158
- ✅ Backup branch created: `pre-cleanup-backup`
159
-
160
- To restore if needed:
161
- ```bash
162
- git checkout pre-cleanup-backup
163
- ```
164
-
165
- ## Next Steps
166
-
167
- 1. ✅ Review changes
168
- 2. ⏳ Stage cleanup changes: `git add -A`
169
- 3. ⏳ Commit: `git commit -m "Clean up: Remove obsolete test scripts and documentation"`
170
- 4. ⏳ Optional: Squash with staged changes
171
- 5. ⏳ Push to repository
172
-
173
- ## Success Criteria
174
-
175
- - ✅ All obsolete files removed
176
- - ✅ Code syntax valid
177
- - ✅ No broken imports
178
- - ✅ README updated and accurate
179
- - ✅ Backup created
180
- - ✅ Professional project structure
181
-
182
- ## Summary
183
-
184
- **Removed:** 28 files (21 test scripts, 5 docs, 2 code files)
185
- **Modified:** 2 files (main.py, README.md)
186
- **Added:** 2 files (CLEANUP_PLAN.md, CLEANUP_SUMMARY.md)
187
- **Net Change:** -24 files
188
-
189
- The codebase is now clean, well-organized, and production-ready! 🎉
190
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CODE_REVIEW_SUMMARY.md DELETED
@@ -1,119 +0,0 @@
1
- # Code Review and Cleanup Summary
2
-
3
- **Date:** November 2, 2025
4
- **Reviewer:** AI Assistant
5
- **Status:** Complete
6
-
7
- ## Executive Summary
8
-
9
- Comprehensive codebase cleanup removing 28 obsolete files and refactoring documentation to be professional and concise.
10
-
11
- ## Changes Made
12
-
13
- ### Files Removed: 28
14
-
15
- **Test Scripts (21 files):**
16
- - All one-off test/debug scripts moved or removed
17
- - Proper tests retained in `tests/` directory
18
-
19
- **Documentation (5 files):**
20
- - Obsolete status reports superseded by final documentation
21
- - Old test result files removed
22
-
23
- **Code (2 items):**
24
- - Debug router removed from production code
25
- - Empty utils directory removed
26
-
27
- ### Files Modified: 2
28
-
29
- **app/main.py:**
30
- - Removed debug router import and mount
31
- - Cleaned up for production deployment
32
-
33
- **README.md:**
34
- - Removed all emojis from section headers
35
- - Eliminated redundant self-congratulatory content
36
- - Condensed from 189 to 139 lines
37
- - Made professional and concise
38
- - Removed "Features" checklist section
39
- - Streamlined technical specifications
40
- - Removed unnecessary "Contributing" section
41
-
42
- ### Files Added: 3
43
-
44
- - `CLEANUP_PLAN.md` - Detailed cleanup strategy
45
- - `CLEANUP_SUMMARY.md` - Execution summary
46
- - `CODE_REVIEW_SUMMARY.md` - This document
47
-
48
- ## Project Structure (After Cleanup)
49
-
50
- ```
51
- simple-llm-pro-finance/
52
- ├── app/ # Application code
53
- │ ├── config.py
54
- │ ├── main.py
55
- │ ├── middleware.py
56
- │ ├── models/
57
- │ ├── providers/
58
- │ ├── routers/
59
- │ └── services/
60
- ├── tests/ # Test suite
61
- ├── scripts/ # Utilities
62
- ├── Dockerfile
63
- ├── requirements.txt
64
- ├── requirements-dev.txt
65
- ├── README.md # Clean, professional docs
66
- ├── FINAL_STATUS.md
67
- ├── FINAL_TEST_REPORT.md
68
- └── LICENSE
69
- ```
70
-
71
- ## Code Quality Improvements
72
-
73
- **Before:**
74
- - 50+ files in repository
75
- - Multiple redundant documentation files
76
- - Debug endpoints in production code
77
- - Verbose, emoji-heavy documentation
78
- - Test scripts scattered in root directory
79
-
80
- **After:**
81
- - 26 essential files
82
- - Single source of truth for documentation
83
- - Production-ready code only
84
- - Professional, concise documentation
85
- - Organized test directory structure
86
-
87
- ## Verification
88
-
89
- - Python syntax validation: PASSED
90
- - Import structure: VALID
91
- - No broken references: CONFIRMED
92
- - Backup created: `pre-cleanup-backup` branch
93
-
94
- ## Impact
95
-
96
- **Breaking Changes:** None
97
- **Removed Endpoints:** `/v1/debug/prompt` (undocumented)
98
- **Repository Size:** Reduced by ~24 files
99
- **Maintainability:** Significantly improved
100
-
101
- ## Recommendations
102
-
103
- ### Immediate
104
- 1. Review and approve changes
105
- 2. Stage all changes: `git add -A`
106
- 3. Commit with message: "refactor: Clean up codebase - remove obsolete files and improve documentation"
107
- 4. Push to repository
108
-
109
- ### Future Considerations
110
- 1. Consider removing `CLEANUP_PLAN.md` and `CLEANUP_SUMMARY.md` after merge
111
- 2. Update `.gitignore` to prevent future test script accumulation
112
- 3. Establish guidelines for temporary debugging files
113
-
114
- ## Conclusion
115
-
116
- The codebase is now clean, professional, and production-ready. All obsolete development artifacts have been removed, documentation is concise and accurate, and the project structure is well-organized.
117
-
118
- **Net Result:** -24 files, cleaner code, better documentation.
119
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TEST_CODERABBIT.md DELETED
@@ -1,40 +0,0 @@
1
- # Testing CodeRabbit Integration
2
-
3
- ## What to do:
4
-
5
- 1. **Create a branch:**
6
- ```bash
7
- git checkout -b test-coderabbit-review
8
- ```
9
-
10
- 2. **Commit this test file:**
11
- ```bash
12
- git add TEST_CODERABBIT.md .github/pull_request_template.md
13
- git commit -m "test: Add PR template and test CodeRabbit integration"
14
- ```
15
-
16
- 3. **Push and create PR:**
17
- ```bash
18
- git push origin test-coderabbit-review
19
- ```
20
- Then go to GitHub and create a Pull Request from `test-coderabbit-review` to `master`
21
-
22
- 4. **Watch for CodeRabbit:**
23
- - CodeRabbit should automatically comment on your PR
24
- - It will review code quality, suggest improvements
25
- - Check for CodeRabbit comments in the PR thread
26
-
27
- ## What CodeRabbit will review:
28
- - Code quality and best practices
29
- - Potential bugs or security issues
30
- - Performance optimizations
31
- - Documentation completeness
32
- - Test coverage
33
-
34
- ## To test more thoroughly:
35
- After this test, try creating a PR with:
36
- - A small bug (see if it catches it)
37
- - Missing error handling
38
- - Performance issues
39
- - Security concerns
40
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/config.py CHANGED
@@ -1,11 +1,33 @@
 
 
 
 
1
  from pydantic_settings import BaseSettings, SettingsConfigDict
2
 
3
 
4
  class Settings(BaseSettings):
5
- model: str = "DragonLLM/qwen3-8b-fin-v1.0"
6
- service_api_key: str | None = None
7
- log_level: str = "info"
8
- force_model_reload: bool = False # Set FORCE_MODEL_RELOAD=true to bypass cache on startup
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  model_config = SettingsConfigDict(
11
  env_file=".env",
 
1
+ """Application configuration using Pydantic settings."""
2
+
3
+ from typing import Literal
4
+ from pydantic import Field
5
  from pydantic_settings import BaseSettings, SettingsConfigDict
6
 
7
 
8
  class Settings(BaseSettings):
9
+ """Application settings loaded from environment variables.
10
+
11
+ Supports loading from .env file with UTF-8 encoding.
12
+ All settings can be overridden via environment variables.
13
+ """
14
+
15
+ model: str = Field(
16
+ default="DragonLLM/qwen3-8b-fin-v1.0",
17
+ description="Hugging Face model identifier"
18
+ )
19
+ service_api_key: str | None = Field(
20
+ default=None,
21
+ description="Optional API key for authentication (SERVICE_API_KEY env var)"
22
+ )
23
+ log_level: Literal["debug", "info", "warning", "error"] = Field(
24
+ default="info",
25
+ description="Logging level"
26
+ )
27
+ force_model_reload: bool = Field(
28
+ default=False,
29
+ description="Force model reload from Hugging Face, bypassing cache (FORCE_MODEL_RELOAD env var)"
30
+ )
31
 
32
  model_config = SettingsConfigDict(
33
  env_file=".env",
app/main.py CHANGED
@@ -1,15 +1,24 @@
 
 
 
 
1
  from typing import Dict
 
2
  from fastapi import FastAPI
 
 
3
  from app.middleware import api_key_guard
4
  from app.routers import openai_api
5
- from app.config import settings
6
- import logging
7
 
8
  # Configure logging
9
  logging.basicConfig(level=logging.INFO)
10
  logger = logging.getLogger(__name__)
11
 
12
- app = FastAPI(title="LLM Pro Finance API (Transformers)")
 
 
 
 
13
 
14
  # Mount routers
15
  app.include_router(openai_api.router, prefix="/v1")
@@ -17,10 +26,14 @@ app.include_router(openai_api.router, prefix="/v1")
17
  # Optional API key middleware
18
  app.middleware("http")(api_key_guard)
19
 
 
20
  @app.on_event("startup")
21
- async def startup_event():
22
- """Startup event - initialize model in background"""
23
- import threading
 
 
 
24
  logger.info("Starting LLM Pro Finance API...")
25
 
26
  force_reload = settings.force_model_reload
@@ -29,7 +42,8 @@ async def startup_event():
29
 
30
  logger.info("Initializing model in background thread...")
31
 
32
- def load_model():
 
33
  from app.providers.transformers_provider import initialize_model
34
  initialize_model(force_reload=force_reload)
35
 
@@ -38,20 +52,30 @@ async def startup_event():
38
  thread.start()
39
  logger.info("Model initialization started in background")
40
 
 
41
  @app.get("/")
42
  async def root() -> Dict[str, str]:
43
- """Root endpoint returning API status and information."""
 
 
 
 
44
  return {
45
  "status": "ok",
46
  "service": "Qwen Open Finance R 8B Inference",
47
  "version": "1.0.0",
48
- "model": "DragonLLM/qwen3-8b-fin-v1.0",
49
  "backend": "Transformers"
50
  }
51
 
 
52
  @app.get("/health")
53
  async def health() -> Dict[str, str]:
54
- """Health check endpoint."""
 
 
 
 
55
  return {"status": "healthy", "service": "LLM Pro Finance API"}
56
 
57
 
 
1
+ """Main FastAPI application entry point."""
2
+
3
+ import logging
4
+ import threading
5
  from typing import Dict
6
+
7
  from fastapi import FastAPI
8
+
9
+ from app.config import settings
10
  from app.middleware import api_key_guard
11
  from app.routers import openai_api
 
 
12
 
13
  # Configure logging
14
  logging.basicConfig(level=logging.INFO)
15
  logger = logging.getLogger(__name__)
16
 
17
+ app = FastAPI(
18
+ title="LLM Pro Finance API (Transformers)",
19
+ description="OpenAI-compatible API for financial LLM inference",
20
+ version="1.0.0"
21
+ )
22
 
23
  # Mount routers
24
  app.include_router(openai_api.router, prefix="/v1")
 
26
  # Optional API key middleware
27
  app.middleware("http")(api_key_guard)
28
 
29
+
30
  @app.on_event("startup")
31
+ async def startup_event() -> None:
32
+ """Startup event - initialize model in background thread.
33
+
34
+ Loads the model asynchronously to avoid blocking the API startup.
35
+ Model loading happens in a daemon thread so it doesn't prevent shutdown.
36
+ """
37
  logger.info("Starting LLM Pro Finance API...")
38
 
39
  force_reload = settings.force_model_reload
 
42
 
43
  logger.info("Initializing model in background thread...")
44
 
45
+ def load_model() -> None:
46
+ """Load the model in a background thread."""
47
  from app.providers.transformers_provider import initialize_model
48
  initialize_model(force_reload=force_reload)
49
 
 
52
  thread.start()
53
  logger.info("Model initialization started in background")
54
 
55
+
56
  @app.get("/")
57
  async def root() -> Dict[str, str]:
58
+ """Root endpoint returning API status and information.
59
+
60
+ Returns:
61
+ Dictionary containing API status, service name, version, model, and backend.
62
+ """
63
  return {
64
  "status": "ok",
65
  "service": "Qwen Open Finance R 8B Inference",
66
  "version": "1.0.0",
67
+ "model": settings.model,
68
  "backend": "Transformers"
69
  }
70
 
71
+
72
  @app.get("/health")
73
  async def health() -> Dict[str, str]:
74
+ """Health check endpoint for monitoring and load balancers.
75
+
76
+ Returns:
77
+ Dictionary with service health status.
78
+ """
79
  return {"status": "healthy", "service": "LLM Pro Finance API"}
80
 
81
 
app/middleware.py CHANGED
@@ -1,26 +1,46 @@
1
- from fastapi import Request, HTTPException
2
- from fastapi.responses import JSONResponse
 
3
 
4
  from app.config import settings
5
 
 
 
6
 
7
- async def api_key_guard(request: Request, call_next):
8
- # Public endpoints that don't require authentication
9
- public_paths = ["/", "/health", "/docs", "/redoc", "/openapi.json"]
 
 
 
 
 
10
 
 
 
 
11
  # Skip auth for public endpoints
12
- if request.url.path in public_paths:
13
  return await call_next(request)
14
 
15
  # Skip auth if no API key is configured
16
  if not settings.service_api_key:
17
  return await call_next(request)
18
 
19
- # Check API key
20
- key = request.headers.get("x-api-key") or request.headers.get("authorization")
21
- if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
 
 
 
 
 
 
22
  return await call_next(request)
23
 
24
- return JSONResponse({"error": "unauthorized"}, status_code=401)
 
 
 
25
 
26
 
 
1
+ from fastapi import Request
2
+ from fastapi.responses import JSONResponse, Response
3
+ from typing import Callable, Awaitable, Union
4
 
5
  from app.config import settings
6
 
7
+ # Public endpoints that don't require authentication
8
+ PUBLIC_PATHS = frozenset(["/", "/health", "/docs", "/redoc", "/openapi.json"])
9
 
10
+
11
+ async def api_key_guard(request: Request, call_next: Callable[[Request], Awaitable[Response]]) -> Union[Response, JSONResponse]:
12
+ """
13
+ Middleware to protect API endpoints with optional API key authentication.
14
+
15
+ Args:
16
+ request: FastAPI request object
17
+ call_next: Next middleware/handler in the chain
18
 
19
+ Returns:
20
+ Response from next handler or 401 if unauthorized
21
+ """
22
  # Skip auth for public endpoints
23
+ if request.url.path in PUBLIC_PATHS:
24
  return await call_next(request)
25
 
26
  # Skip auth if no API key is configured
27
  if not settings.service_api_key:
28
  return await call_next(request)
29
 
30
+ # Check API key from headers
31
+ api_key = request.headers.get("x-api-key")
32
+ if not api_key:
33
+ # Also check Authorization header with Bearer token
34
+ auth_header = request.headers.get("authorization", "")
35
+ if auth_header.startswith("Bearer "):
36
+ api_key = auth_header.replace("Bearer ", "").strip()
37
+
38
+ if api_key and api_key == settings.service_api_key:
39
  return await call_next(request)
40
 
41
+ return JSONResponse(
42
+ content={"error": {"message": "unauthorized", "type": "authentication_error"}},
43
+ status_code=401
44
+ )
45
 
46
 
app/models/openai.py CHANGED
@@ -1,4 +1,7 @@
 
 
1
  from typing import List, Literal, Optional
 
2
  from pydantic import BaseModel, Field
3
 
4
 
@@ -6,42 +9,88 @@ Role = Literal["system", "user", "assistant", "tool"]
6
 
7
 
8
  class Message(BaseModel):
 
 
 
 
 
 
9
  role: Role
10
- content: str
11
 
12
 
13
  class ChatCompletionRequest(BaseModel):
14
- model: Optional[str] = None # Optional, will use default from config
15
- messages: List[Message]
16
- temperature: Optional[float] = 0.7
17
- max_tokens: Optional[int] = None
18
- stream: Optional[bool] = False
19
- top_p: Optional[float] = 1.0
 
 
 
 
 
 
 
 
 
 
20
 
21
 
22
  class ChoiceMessage(BaseModel):
23
- role: Literal["assistant"]
24
- content: Optional[str] = None
 
 
 
 
 
 
25
 
26
 
27
  class Choice(BaseModel):
28
- index: int
29
- message: ChoiceMessage
30
- finish_reason: Optional[str] = None
 
 
 
 
 
 
 
31
 
32
 
33
  class Usage(BaseModel):
34
- prompt_tokens: int
35
- completion_tokens: int
36
- total_tokens: int
 
 
 
 
 
 
 
37
 
38
 
39
  class ChatCompletionResponse(BaseModel):
40
- id: str
41
- object: Literal["chat.completion"] = "chat.completion"
42
- created: int
43
- model: str
44
- choices: List[Choice]
45
- usage: Optional[Usage] = None
 
 
 
 
 
 
 
 
 
 
46
 
47
 
 
1
+ """OpenAI-compatible API models using Pydantic."""
2
+
3
  from typing import List, Literal, Optional
4
+
5
  from pydantic import BaseModel, Field
6
 
7
 
 
9
 
10
 
11
  class Message(BaseModel):
12
+ """A single message in a conversation.
13
+
14
+ Attributes:
15
+ role: The role of the message sender
16
+ content: The text content of the message
17
+ """
18
  role: Role
19
+ content: str = Field(..., description="Message content")
20
 
21
 
22
  class ChatCompletionRequest(BaseModel):
23
+ """Request model for chat completions endpoint.
24
+
25
+ Attributes:
26
+ model: Optional model identifier (uses default from config if not provided)
27
+ messages: List of messages in the conversation
28
+ temperature: Sampling temperature (0-2)
29
+ max_tokens: Maximum tokens to generate
30
+ stream: Whether to stream the response
31
+ top_p: Nucleus sampling parameter
32
+ """
33
+ model: Optional[str] = Field(default=None, description="Model identifier")
34
+ messages: List[Message] = Field(..., description="Conversation messages")
35
+ temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
36
+ max_tokens: Optional[int] = Field(default=None, ge=1, description="Maximum tokens to generate")
37
+ stream: Optional[bool] = Field(default=False, description="Stream response")
38
+ top_p: Optional[float] = Field(default=1.0, ge=0.0, le=1.0, description="Nucleus sampling parameter")
39
 
40
 
41
  class ChoiceMessage(BaseModel):
42
+ """Assistant message in a completion choice.
43
+
44
+ Attributes:
45
+ role: Always "assistant" for completion messages
46
+ content: The generated message content
47
+ """
48
+ role: Literal["assistant"] = "assistant"
49
+ content: Optional[str] = Field(default=None, description="Generated message content")
50
 
51
 
52
  class Choice(BaseModel):
53
+ """A single completion choice.
54
+
55
+ Attributes:
56
+ index: Choice index
57
+ message: The generated message
58
+ finish_reason: Reason why generation finished (stop, length, etc.)
59
+ """
60
+ index: int = Field(..., description="Choice index")
61
+ message: ChoiceMessage = Field(..., description="Generated message")
62
+ finish_reason: Optional[str] = Field(default=None, description="Reason for completion")
63
 
64
 
65
  class Usage(BaseModel):
66
+ """Token usage statistics.
67
+
68
+ Attributes:
69
+ prompt_tokens: Number of tokens in the prompt
70
+ completion_tokens: Number of tokens in the completion
71
+ total_tokens: Total tokens used
72
+ """
73
+ prompt_tokens: int = Field(..., ge=0, description="Tokens in prompt")
74
+ completion_tokens: int = Field(..., ge=0, description="Tokens in completion")
75
+ total_tokens: int = Field(..., ge=0, description="Total tokens used")
76
 
77
 
78
  class ChatCompletionResponse(BaseModel):
79
+ """Response model for chat completions endpoint.
80
+
81
+ Attributes:
82
+ id: Unique completion ID
83
+ object: Always "chat.completion"
84
+ created: Unix timestamp of creation
85
+ model: Model identifier used
86
+ choices: List of completion choices
87
+ usage: Optional token usage statistics
88
+ """
89
+ id: str = Field(..., description="Completion ID")
90
+ object: Literal["chat.completion"] = Field(default="chat.completion", description="Object type")
91
+ created: int = Field(..., description="Unix timestamp")
92
+ model: str = Field(..., description="Model identifier")
93
+ choices: List[Choice] = Field(..., description="Completion choices")
94
+ usage: Optional[Usage] = Field(default=None, description="Token usage statistics")
95
 
96
 
app/providers/base.py CHANGED
@@ -1,11 +1,33 @@
1
- from typing import Protocol, Dict, Any
 
 
2
 
3
 
4
  class LLMProvider(Protocol):
 
 
 
 
 
 
5
  async def list_models(self) -> Dict[str, Any]:
 
 
 
 
 
6
  ...
7
-
8
  async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
 
 
 
 
 
 
 
 
 
9
  ...
10
 
11
 
 
1
+ """Base protocol for LLM providers."""
2
+
3
+ from typing import Any, Dict, Protocol
4
 
5
 
6
  class LLMProvider(Protocol):
7
+ """Protocol defining the interface for LLM providers.
8
+
9
+ Any class implementing this protocol must provide async methods
10
+ for listing models and generating chat completions.
11
+ """
12
+
13
  async def list_models(self) -> Dict[str, Any]:
14
+ """List available models.
15
+
16
+ Returns:
17
+ Dictionary containing model information.
18
+ """
19
  ...
20
+
21
  async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
22
+ """Generate chat completion.
23
+
24
+ Args:
25
+ payload: Request payload containing messages and parameters
26
+ stream: Whether to stream the response
27
+
28
+ Returns:
29
+ Chat completion response (varies by implementation)
30
+ """
31
  ...
32
 
33
 
app/providers/transformers_provider.py CHANGED
@@ -3,7 +3,7 @@ import time
3
  import json
4
  import logging
5
  import torch
6
- from typing import Dict, Any, AsyncIterator, Union
7
  import asyncio
8
  from threading import Thread, Lock
9
  from huggingface_hub import login, hf_hub_download
@@ -386,20 +386,28 @@ class TransformersProvider:
386
  yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
387
  yield "data: [DONE]\n\n"
388
 
389
- def _messages_to_prompt(self, messages: list) -> str:
390
- """Convert OpenAI messages format to prompt (fallback)."""
391
- prompt = ""
 
 
 
 
 
 
 
 
392
  for message in messages:
393
- role = message["role"]
394
- content = message["content"]
395
  if role == "system":
396
- prompt += f"System: {content}\n"
397
  elif role == "user":
398
- prompt += f"User: {content}\n"
399
  elif role == "assistant":
400
- prompt += f"Assistant: {content}\n"
401
- prompt += "Assistant: "
402
- return prompt
403
 
404
 
405
  # Module-level provider instance
 
3
  import json
4
  import logging
5
  import torch
6
+ from typing import Dict, Any, AsyncIterator, Union, List
7
  import asyncio
8
  from threading import Thread, Lock
9
  from huggingface_hub import login, hf_hub_download
 
386
  yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
387
  yield "data: [DONE]\n\n"
388
 
389
+ def _messages_to_prompt(self, messages: List[Dict[str, str]]) -> str:
390
+ """
391
+ Convert OpenAI messages format to prompt (fallback).
392
+
393
+ Args:
394
+ messages: List of message dictionaries with 'role' and 'content'
395
+
396
+ Returns:
397
+ Formatted prompt string
398
+ """
399
+ prompt_parts = []
400
  for message in messages:
401
+ role = message.get("role", "user")
402
+ content = message.get("content", "")
403
  if role == "system":
404
+ prompt_parts.append(f"System: {content}")
405
  elif role == "user":
406
+ prompt_parts.append(f"User: {content}")
407
  elif role == "assistant":
408
+ prompt_parts.append(f"Assistant: {content}")
409
+ prompt_parts.append("Assistant: ")
410
+ return "\n".join(prompt_parts)
411
 
412
 
413
  # Module-level provider instance
app/routers/openai_api.py CHANGED
@@ -1,4 +1,4 @@
1
- from typing import Any, Dict
2
  import logging
3
 
4
  from fastapi import APIRouter, Query
@@ -15,13 +15,13 @@ router = APIRouter()
15
 
16
 
17
  @router.get("/models")
18
- async def list_models():
19
  """List available models (OpenAI-compatible endpoint)"""
20
  return await chat_service.list_models()
21
 
22
 
23
  @router.post("/models/reload")
24
- async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")):
25
  """
26
  Reload the model from cache or Hugging Face Hub.
27
 
@@ -51,7 +51,7 @@ async def reload_model(force: bool = Query(False, description="Force reload from
51
 
52
 
53
  @router.post("/chat/completions")
54
- async def chat_completions(body: ChatCompletionRequest):
55
  """Chat completions endpoint (OpenAI-compatible)"""
56
  try:
57
  # Validate messages list is not empty
@@ -61,22 +61,23 @@ async def chat_completions(body: ChatCompletionRequest):
61
  content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
62
  )
63
 
 
 
 
 
 
 
 
 
64
  # Build payload with all supported parameters
65
  payload: Dict[str, Any] = {
66
  "model": body.model or settings.model,
67
  "messages": [m.model_dump() for m in body.messages],
68
- "temperature": body.temperature or 0.7,
69
  "top_p": body.top_p or 1.0,
70
  "stream": body.stream or False,
71
  }
72
 
73
- # Validate temperature range
74
- if payload["temperature"] < 0 or payload["temperature"] > 2:
75
- return JSONResponse(
76
- status_code=400,
77
- content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
78
- )
79
-
80
  # Add optional max_tokens if provided
81
  if body.max_tokens is not None:
82
  if body.max_tokens < 1:
 
1
+ from typing import Any, Dict, Union
2
  import logging
3
 
4
  from fastapi import APIRouter, Query
 
15
 
16
 
17
  @router.get("/models")
18
+ async def list_models() -> Dict[str, Any]:
19
  """List available models (OpenAI-compatible endpoint)"""
20
  return await chat_service.list_models()
21
 
22
 
23
  @router.post("/models/reload")
24
+ async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")) -> JSONResponse:
25
  """
26
  Reload the model from cache or Hugging Face Hub.
27
 
 
51
 
52
 
53
  @router.post("/chat/completions")
54
+ async def chat_completions(body: ChatCompletionRequest) -> Union[JSONResponse, StreamingResponse]:
55
  """Chat completions endpoint (OpenAI-compatible)"""
56
  try:
57
  # Validate messages list is not empty
 
61
  content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
62
  )
63
 
64
+ # Validate temperature range before building payload
65
+ temperature = body.temperature or 0.7
66
+ if temperature < 0 or temperature > 2:
67
+ return JSONResponse(
68
+ status_code=400,
69
+ content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
70
+ )
71
+
72
  # Build payload with all supported parameters
73
  payload: Dict[str, Any] = {
74
  "model": body.model or settings.model,
75
  "messages": [m.model_dump() for m in body.messages],
76
+ "temperature": temperature,
77
  "top_p": body.top_p or 1.0,
78
  "stream": body.stream or False,
79
  }
80
 
 
 
 
 
 
 
 
81
  # Add optional max_tokens if provided
82
  if body.max_tokens is not None:
83
  if body.max_tokens < 1:
app/services/chat_service.py CHANGED
@@ -1,13 +1,33 @@
1
- from typing import Any, Dict
 
2
 
3
  from app.providers import transformers_provider as provider
4
 
5
 
6
  async def list_models() -> Dict[str, Any]:
 
 
 
 
 
 
7
  return await provider.list_models()
8
 
9
 
10
- async def chat(payload: Dict[str, Any], stream: bool = False):
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  return await provider.chat(payload, stream=stream)
12
 
13
 
 
1
+ """Chat service layer providing abstraction over the provider."""
2
+ from typing import Any, Dict, Union, AsyncIterator
3
 
4
  from app.providers import transformers_provider as provider
5
 
6
 
7
  async def list_models() -> Dict[str, Any]:
8
+ """
9
+ List available models.
10
+
11
+ Returns:
12
+ Dictionary containing model list in OpenAI-compatible format
13
+ """
14
  return await provider.list_models()
15
 
16
 
17
+ async def chat(
18
+ payload: Dict[str, Any],
19
+ stream: bool = False
20
+ ) -> Union[Dict[str, Any], AsyncIterator[str]]:
21
+ """
22
+ Process chat completion request.
23
+
24
+ Args:
25
+ payload: Request payload containing messages and generation parameters
26
+ stream: Whether to stream the response
27
+
28
+ Returns:
29
+ Response dictionary or async iterator for streaming
30
+ """
31
  return await provider.chat(payload, stream=stream)
32
 
33
 
app/utils/constants.py CHANGED
@@ -1,18 +1,25 @@
1
- """Application-wide constants."""
2
 
3
  import os
 
 
4
 
5
  # Model configuration
6
- MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
7
 
8
  # Cache directory - respect HF_HOME if set, otherwise use default
9
- CACHE_DIR = os.getenv("HF_HOME", "/tmp/huggingface")
10
 
11
  # Hugging Face token environment variable priority order
12
- HF_TOKEN_VARS = ["HF_TOKEN_LC2", "HF_TOKEN_LC", "HF_TOKEN", "HUGGING_FACE_HUB_TOKEN"]
 
 
 
 
 
13
 
14
  # French language detection patterns
15
- FRENCH_PHRASES = [
16
  "en français",
17
  "répondez en français",
18
  "réponse française",
@@ -20,9 +27,11 @@ FRENCH_PHRASES = [
20
  "expliquez en français",
21
  ]
22
 
23
- FRENCH_CHARS = ["é", "è", "ê", "à", "ç", "ù", "ô", "î", "â", "û", "ë", "ï"]
 
 
24
 
25
- FRENCH_PATTERNS = [
26
  "qu'est-ce",
27
  "qu'est",
28
  "expliquez",
@@ -38,7 +47,7 @@ FRENCH_PATTERNS = [
38
  "définissez",
39
  ]
40
 
41
- FRENCH_SYSTEM_PROMPT = (
42
  "Vous êtes un assistant financier expert. "
43
  "Répondez TOUJOURS en français. "
44
  "Soyez concis et précis dans vos explications. "
@@ -46,13 +55,13 @@ FRENCH_SYSTEM_PROMPT = (
46
  )
47
 
48
  # Qwen3 EOS tokens
49
- EOS_TOKENS = [151645, 151643] # [<|im_end|>, <|endoftext|>]
50
- PAD_TOKEN_ID = 151643 # <|endoftext|>
51
 
52
  # Generation defaults
53
- DEFAULT_MAX_TOKENS = 1000 # Increased for complete answers with concise reasoning
54
- DEFAULT_TEMPERATURE = 0.7
55
- DEFAULT_TOP_P = 1.0
56
- DEFAULT_TOP_K = 20
57
- REPETITION_PENALTY = 1.05
58
 
 
1
+ """Application-wide constants and configuration."""
2
 
3
  import os
4
+ from typing import Final, List
5
+
6
 
7
  # Model configuration
8
+ MODEL_NAME: Final[str] = "DragonLLM/qwen3-8b-fin-v1.0"
9
 
10
  # Cache directory - respect HF_HOME if set, otherwise use default
11
+ CACHE_DIR: Final[str] = os.getenv("HF_HOME", "/tmp/huggingface")
12
 
13
  # Hugging Face token environment variable priority order
14
+ HF_TOKEN_VARS: Final[List[str]] = [
15
+ "HF_TOKEN_LC2",
16
+ "HF_TOKEN_LC",
17
+ "HF_TOKEN",
18
+ "HUGGING_FACE_HUB_TOKEN"
19
+ ]
20
 
21
  # French language detection patterns
22
+ FRENCH_PHRASES: Final[List[str]] = [
23
  "en français",
24
  "répondez en français",
25
  "réponse française",
 
27
  "expliquez en français",
28
  ]
29
 
30
+ FRENCH_CHARS: Final[List[str]] = [
31
+ "é", "è", "ê", "à", "ç", "ù", "ô", "î", "â", "û", "ë", "ï"
32
+ ]
33
 
34
+ FRENCH_PATTERNS: Final[List[str]] = [
35
  "qu'est-ce",
36
  "qu'est",
37
  "expliquez",
 
47
  "définissez",
48
  ]
49
 
50
+ FRENCH_SYSTEM_PROMPT: Final[str] = (
51
  "Vous êtes un assistant financier expert. "
52
  "Répondez TOUJOURS en français. "
53
  "Soyez concis et précis dans vos explications. "
 
55
  )
56
 
57
  # Qwen3 EOS tokens
58
+ EOS_TOKENS: Final[List[int]] = [151645, 151643] # [<|im_end|>, <|endoftext|>]
59
+ PAD_TOKEN_ID: Final[int] = 151643 # <|endoftext|>
60
 
61
  # Generation defaults
62
+ DEFAULT_MAX_TOKENS: Final[int] = 1000 # Increased for complete answers with concise reasoning
63
+ DEFAULT_TEMPERATURE: Final[float] = 0.7
64
+ DEFAULT_TOP_P: Final[float] = 1.0
65
+ DEFAULT_TOP_K: Final[int] = 20
66
+ REPETITION_PENALTY: Final[float] = 1.05
67
 
app/utils/helpers.py CHANGED
@@ -2,7 +2,7 @@
2
 
3
  import os
4
  import logging
5
- from typing import Optional, Tuple
6
 
7
  from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
8
 
@@ -24,7 +24,7 @@ def get_hf_token() -> Tuple[Optional[str], str]:
24
  return None, "none"
25
 
26
 
27
- def is_french_request(messages: list) -> bool:
28
  """
29
  Detect if the request is in French based on user messages.
30
 
@@ -55,7 +55,7 @@ def is_french_request(messages: list) -> bool:
55
  return False
56
 
57
 
58
- def has_french_system_prompt(messages: list) -> bool:
59
  """Check if messages already contain a French system prompt."""
60
  return any(
61
  "français" in msg.get("content", "").lower()
 
2
 
3
  import os
4
  import logging
5
+ from typing import Optional, Tuple, List, Dict, Any
6
 
7
  from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
8
 
 
24
  return None, "none"
25
 
26
 
27
+ def is_french_request(messages: List[Dict[str, Any]]) -> bool:
28
  """
29
  Detect if the request is in French based on user messages.
30
 
 
55
  return False
56
 
57
 
58
+ def has_french_system_prompt(messages: List[Dict[str, Any]]) -> bool:
59
  """Check if messages already contain a French system prompt."""
60
  return any(
61
  "français" in msg.get("content", "").lower()
app/utils/memory.py CHANGED
@@ -1,12 +1,23 @@
1
  """GPU memory management utilities."""
2
 
3
  import gc
 
 
4
  import torch
5
- from typing import Optional
6
 
7
 
8
- def clear_gpu_memory(model=None, tokenizer=None):
9
- """Clear GPU memory completely."""
 
 
 
 
 
 
 
 
 
 
10
  if not torch.cuda.is_available():
11
  return
12
 
 
1
  """GPU memory management utilities."""
2
 
3
  import gc
4
+ from typing import Optional, Any
5
+
6
  import torch
 
7
 
8
 
9
+ def clear_gpu_memory(model: Optional[Any] = None, tokenizer: Optional[Any] = None) -> None:
10
+ """Clear GPU memory completely.
11
+
12
+ This function performs aggressive GPU memory cleanup by:
13
+ 1. Deleting model and tokenizer objects if provided
14
+ 2. Clearing CUDA cache
15
+ 3. Running multiple garbage collection passes
16
+
17
+ Args:
18
+ model: Optional model object to delete
19
+ tokenizer: Optional tokenizer object to delete
20
+ """
21
  if not torch.cuda.is_available():
22
  return
23