Merge pull request #2 from DealExMachina/test-coderabbit-validation
Browse files- .coderabbit.yaml +0 -1
- CLEANUP_PLAN.md +0 -155
- CLEANUP_SUMMARY.md +0 -190
- CODE_REVIEW_SUMMARY.md +0 -119
- TEST_CODERABBIT.md +0 -40
- app/config.py +26 -4
- app/main.py +34 -10
- app/middleware.py +30 -10
- app/models/openai.py +70 -21
- app/providers/base.py +24 -2
- app/providers/transformers_provider.py +19 -11
- app/routers/openai_api.py +13 -12
- app/services/chat_service.py +22 -2
- app/utils/constants.py +24 -15
- app/utils/helpers.py +3 -3
- app/utils/memory.py +14 -3
.coderabbit.yaml
CHANGED
|
@@ -16,7 +16,6 @@ review:
|
|
| 16 |
simple: false # Set to true for faster, simpler reviews
|
| 17 |
high_level_summary: true
|
| 18 |
estimate_time: true
|
| 19 |
-
project_language: python
|
| 20 |
|
| 21 |
chat:
|
| 22 |
enabled: true
|
|
|
|
| 16 |
simple: false # Set to true for faster, simpler reviews
|
| 17 |
high_level_summary: true
|
| 18 |
estimate_time: true
|
|
|
|
| 19 |
|
| 20 |
chat:
|
| 21 |
enabled: true
|
CLEANUP_PLAN.md
DELETED
|
@@ -1,155 +0,0 @@
|
|
| 1 |
-
# Code Cleanup Plan
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
This document outlines the cleanup strategy for the simple-llm-pro-finance project to remove obsolete files and improve code organization.
|
| 5 |
-
|
| 6 |
-
## Files to Remove
|
| 7 |
-
|
| 8 |
-
### 1. Obsolete Test Scripts (Root Directory)
|
| 9 |
-
**Reason:** All functional tests have been moved to `tests/` directory. These are one-off debugging scripts.
|
| 10 |
-
|
| 11 |
-
- `analyze_performance.py` - Performance analysis done, results in FINAL_TEST_REPORT.md
|
| 12 |
-
- `debug_chat_template.py` - Debug script, no longer needed
|
| 13 |
-
- `final_clean_test.py` - One-off test
|
| 14 |
-
- `investigate_french_consistency.py` - Investigation complete
|
| 15 |
-
- `quiz_finance_francais.py` - Test script (also in git staging)
|
| 16 |
-
- `test_advanced_finance.py` - Moved to tests/
|
| 17 |
-
- `test_all_fixes.py` - One-off validation
|
| 18 |
-
- `test_debug_endpoint.sh` - Shell test script
|
| 19 |
-
- `test_finance_final.py` - One-off test
|
| 20 |
-
- `test_finance_improved.py` - One-off test
|
| 21 |
-
- `test_finance_queries.py` - One-off test
|
| 22 |
-
- `test_french_direct.py` - One-off test
|
| 23 |
-
- `test_french_final_check.py` - One-off test
|
| 24 |
-
- `test_french_simple.sh` - Shell test script
|
| 25 |
-
- `test_french_strategies.py` - One-off test
|
| 26 |
-
- `test_generation_fix.sh` - Shell test script
|
| 27 |
-
- `test_memory_stress.py` - Moved to tests/
|
| 28 |
-
- `test_quick_french.py` - One-off test
|
| 29 |
-
- `test_service.py` - One-off test
|
| 30 |
-
- `test_system_prompt.py` - One-off test
|
| 31 |
-
- `test_tokenizer_debug.py` - Debug script
|
| 32 |
-
- `test_truncation_issue.py` - One-off test
|
| 33 |
-
|
| 34 |
-
**Total:** 21 test files
|
| 35 |
-
|
| 36 |
-
### 2. Obsolete Documentation Files
|
| 37 |
-
**Reason:** Superseded by comprehensive final reports.
|
| 38 |
-
|
| 39 |
-
- `STATUS.md` - Historical status, superseded by FINAL_STATUS.md
|
| 40 |
-
- `FIXES_SUMMARY.md` - Historical, covered in FINAL_TEST_REPORT.md
|
| 41 |
-
- `PERFORMANCE_REPORT.md` - Covered in FINAL_TEST_REPORT.md
|
| 42 |
-
- `memory_test_results.txt` - Old test results
|
| 43 |
-
- `test_results.txt` - Old test results
|
| 44 |
-
|
| 45 |
-
**Total:** 5 documentation files
|
| 46 |
-
|
| 47 |
-
### 3. Empty/Debug Code Directories
|
| 48 |
-
**Reason:** Unused or debug-only code.
|
| 49 |
-
|
| 50 |
-
- `app/utils/` - Empty directory (only __pycache__)
|
| 51 |
-
- `app/routers/debug.py` - Debug endpoint not needed in production
|
| 52 |
-
|
| 53 |
-
**Total:** 1 directory, 1 file
|
| 54 |
-
|
| 55 |
-
## Files to Keep
|
| 56 |
-
|
| 57 |
-
### Core Application
|
| 58 |
-
- `app/` directory (except items listed for removal)
|
| 59 |
-
- `main.py` - FastAPI application
|
| 60 |
-
- `config.py` - Configuration
|
| 61 |
-
- `middleware.py` - API key authentication
|
| 62 |
-
- `models/openai.py` - Pydantic models
|
| 63 |
-
- `providers/base.py` - Provider protocol
|
| 64 |
-
- `providers/transformers_provider.py` - Main inference engine
|
| 65 |
-
- `routers/openai_api.py` - OpenAI-compatible API
|
| 66 |
-
- `services/chat_service.py` - Chat service wrapper
|
| 67 |
-
|
| 68 |
-
### Tests
|
| 69 |
-
- `tests/` directory - Proper pytest structure
|
| 70 |
-
- `conftest.py`
|
| 71 |
-
- `test_config.py`
|
| 72 |
-
- `test_middleware.py`
|
| 73 |
-
- `test_openai_models.py`
|
| 74 |
-
- `test_openai_routes.py`
|
| 75 |
-
- `test_providers.py`
|
| 76 |
-
- `performance/` - Performance benchmarks
|
| 77 |
-
|
| 78 |
-
### Documentation
|
| 79 |
-
- `README.md` - Main documentation (needs cleanup)
|
| 80 |
-
- `FINAL_STATUS.md` - Final deployment status
|
| 81 |
-
- `FINAL_TEST_REPORT.md` - Comprehensive test results
|
| 82 |
-
- `LICENSE` - MIT license
|
| 83 |
-
|
| 84 |
-
### Configuration & Deployment
|
| 85 |
-
- `Dockerfile` - Docker build configuration
|
| 86 |
-
- `requirements.txt` - Production dependencies
|
| 87 |
-
- `requirements-dev.txt` - Development dependencies
|
| 88 |
-
|
| 89 |
-
### Scripts
|
| 90 |
-
- `scripts/validate_hf_readme.py` - Useful validation utility
|
| 91 |
-
- `scripts/README.md` - Scripts documentation
|
| 92 |
-
|
| 93 |
-
## Refactoring Needed
|
| 94 |
-
|
| 95 |
-
### 1. Remove Debug Router from Production
|
| 96 |
-
**File:** `app/main.py`
|
| 97 |
-
**Change:** Remove debug router import and mount
|
| 98 |
-
```python
|
| 99 |
-
# Remove this line
|
| 100 |
-
app.include_router(debug.router, prefix="/v1")
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
### 2. Clean Up README.md
|
| 104 |
-
**File:** `README.md`
|
| 105 |
-
**Changes:**
|
| 106 |
-
- Remove outdated test coverage stats (91% reference)
|
| 107 |
-
- Update to reflect current stable state
|
| 108 |
-
- Simplify configuration section
|
| 109 |
-
- Remove references to obsolete features
|
| 110 |
-
|
| 111 |
-
### 3. Remove Empty Utils Directory
|
| 112 |
-
**Directory:** `app/utils/`
|
| 113 |
-
**Action:** Delete the entire directory as it's unused
|
| 114 |
-
|
| 115 |
-
## Impact Assessment
|
| 116 |
-
|
| 117 |
-
### Breaking Changes
|
| 118 |
-
**None** - All removed files are development/debugging artifacts.
|
| 119 |
-
|
| 120 |
-
### Non-Breaking Changes
|
| 121 |
-
- Removing debug endpoint (`/v1/debug/prompt`) - Not documented in README
|
| 122 |
-
- Cleaner project structure
|
| 123 |
-
- Reduced repository size
|
| 124 |
-
|
| 125 |
-
### Benefits
|
| 126 |
-
- **Clarity:** Easier to understand project structure
|
| 127 |
-
- **Maintenance:** Fewer files to maintain
|
| 128 |
-
- **Size:** Reduced repo size
|
| 129 |
-
- **Professionalism:** Clean, production-ready codebase
|
| 130 |
-
|
| 131 |
-
## Execution Plan
|
| 132 |
-
|
| 133 |
-
1. ✅ Create backup branch
|
| 134 |
-
2. ✅ Remove obsolete test files
|
| 135 |
-
3. ✅ Remove obsolete documentation
|
| 136 |
-
4. ✅ Remove debug code
|
| 137 |
-
5. ✅ Update README.md
|
| 138 |
-
6. ✅ Run tests to verify nothing broke
|
| 139 |
-
7. ✅ Commit and push changes
|
| 140 |
-
|
| 141 |
-
## Success Criteria
|
| 142 |
-
|
| 143 |
-
- ✅ All tests in `tests/` directory still pass
|
| 144 |
-
- ✅ Application still starts and serves requests
|
| 145 |
-
- ✅ README.md is accurate and up-to-date
|
| 146 |
-
- ✅ No broken imports or references
|
| 147 |
-
- ✅ Git history preserved (files deleted, not rewritten)
|
| 148 |
-
|
| 149 |
-
## Rollback Plan
|
| 150 |
-
|
| 151 |
-
If issues arise:
|
| 152 |
-
1. Git checkout the cleanup branch: `git checkout pre-cleanup-backup`
|
| 153 |
-
2. Review what was removed
|
| 154 |
-
3. Restore only necessary files
|
| 155 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CLEANUP_SUMMARY.md
DELETED
|
@@ -1,190 +0,0 @@
|
|
| 1 |
-
# Cleanup Summary - November 2, 2025
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
Comprehensive codebase cleanup to remove obsolete test scripts, redundant documentation, and debug code from the project.
|
| 5 |
-
|
| 6 |
-
## Files Removed
|
| 7 |
-
|
| 8 |
-
### Test Scripts (21 files)
|
| 9 |
-
All one-off debugging and validation scripts have been removed. Proper tests remain in `tests/` directory.
|
| 10 |
-
|
| 11 |
-
✅ Removed:
|
| 12 |
-
- `analyze_performance.py`
|
| 13 |
-
- `debug_chat_template.py`
|
| 14 |
-
- `final_clean_test.py`
|
| 15 |
-
- `investigate_french_consistency.py`
|
| 16 |
-
- `quiz_finance_francais.py`
|
| 17 |
-
- `test_advanced_finance.py`
|
| 18 |
-
- `test_all_fixes.py`
|
| 19 |
-
- `test_debug_endpoint.sh`
|
| 20 |
-
- `test_finance_final.py`
|
| 21 |
-
- `test_finance_improved.py`
|
| 22 |
-
- `test_finance_queries.py`
|
| 23 |
-
- `test_french_direct.py`
|
| 24 |
-
- `test_french_final_check.py`
|
| 25 |
-
- `test_french_simple.sh`
|
| 26 |
-
- `test_french_strategies.py`
|
| 27 |
-
- `test_generation_fix.sh`
|
| 28 |
-
- `test_memory_stress.py`
|
| 29 |
-
- `test_quick_french.py`
|
| 30 |
-
- `test_service.py`
|
| 31 |
-
- `test_system_prompt.py`
|
| 32 |
-
- `test_tokenizer_debug.py`
|
| 33 |
-
- `test_truncation_issue.py`
|
| 34 |
-
|
| 35 |
-
### Documentation Files (5 files)
|
| 36 |
-
Historical documentation superseded by comprehensive final reports.
|
| 37 |
-
|
| 38 |
-
✅ Removed:
|
| 39 |
-
- `STATUS.md` (superseded by FINAL_STATUS.md)
|
| 40 |
-
- `FIXES_SUMMARY.md` (covered in FINAL_TEST_REPORT.md)
|
| 41 |
-
- `PERFORMANCE_REPORT.md` (covered in FINAL_TEST_REPORT.md)
|
| 42 |
-
- `memory_test_results.txt` (old test results)
|
| 43 |
-
- `test_results.txt` (old test results)
|
| 44 |
-
|
| 45 |
-
### Code Files (2 items)
|
| 46 |
-
Debug code not needed in production.
|
| 47 |
-
|
| 48 |
-
✅ Removed:
|
| 49 |
-
- `app/routers/debug.py` - Debug endpoint for prompt inspection
|
| 50 |
-
- `app/utils/` - Empty directory
|
| 51 |
-
|
| 52 |
-
## Code Changes
|
| 53 |
-
|
| 54 |
-
### Modified: `app/main.py`
|
| 55 |
-
**Before:**
|
| 56 |
-
```python
|
| 57 |
-
from app.routers import openai_api, debug
|
| 58 |
-
...
|
| 59 |
-
app.include_router(debug.router, prefix="/v1")
|
| 60 |
-
```
|
| 61 |
-
|
| 62 |
-
**After:**
|
| 63 |
-
```python
|
| 64 |
-
from app.routers import openai_api
|
| 65 |
-
...
|
| 66 |
-
# Debug router removed
|
| 67 |
-
```
|
| 68 |
-
|
| 69 |
-
### Modified: `README.md`
|
| 70 |
-
Updated to reflect:
|
| 71 |
-
- Current stable state (production-ready)
|
| 72 |
-
- Accurate feature list
|
| 73 |
-
- Better API examples with realistic max_tokens
|
| 74 |
-
- Chain-of-thought reasoning explanation
|
| 75 |
-
- Language support details
|
| 76 |
-
- Removed outdated test coverage stats
|
| 77 |
-
- Added technical specifications section
|
| 78 |
-
|
| 79 |
-
## Project Structure (After Cleanup)
|
| 80 |
-
|
| 81 |
-
```
|
| 82 |
-
simple-llm-pro-finance/
|
| 83 |
-
├── app/ # Core application
|
| 84 |
-
│ ├── config.py # Configuration
|
| 85 |
-
│ ├── main.py # FastAPI app
|
| 86 |
-
│ ├── middleware.py # API key auth
|
| 87 |
-
│ ├── models/
|
| 88 |
-
│ │ └── openai.py # Pydantic models
|
| 89 |
-
│ ├── providers/
|
| 90 |
-
│ │ ├── base.py # Provider protocol
|
| 91 |
-
│ │ └── transformers_provider.py # Main inference engine
|
| 92 |
-
│ ├── routers/
|
| 93 |
-
│ │ └── openai_api.py # OpenAI-compatible API
|
| 94 |
-
│ └── services/
|
| 95 |
-
│ └── chat_service.py # Chat service wrapper
|
| 96 |
-
├── tests/ # Proper test suite
|
| 97 |
-
│ ├── conftest.py
|
| 98 |
-
│ ├── test_*.py # Unit tests
|
| 99 |
-
│ └── performance/ # Performance benchmarks
|
| 100 |
-
├── scripts/ # Utility scripts
|
| 101 |
-
│ └── validate_hf_readme.py # README validator
|
| 102 |
-
├── Dockerfile # Docker build config
|
| 103 |
-
├── requirements.txt # Production dependencies
|
| 104 |
-
├── requirements-dev.txt # Development dependencies
|
| 105 |
-
├── README.md # Main documentation
|
| 106 |
-
├── FINAL_STATUS.md # Deployment status
|
| 107 |
-
├── FINAL_TEST_REPORT.md # Test results & metrics
|
| 108 |
-
├── CLEANUP_PLAN.md # This cleanup plan
|
| 109 |
-
└── LICENSE # MIT license
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
## Impact Assessment
|
| 113 |
-
|
| 114 |
-
### Breaking Changes
|
| 115 |
-
**None** - All removed files were development artifacts.
|
| 116 |
-
|
| 117 |
-
### Removed Endpoints
|
| 118 |
-
- `/v1/debug/prompt` - Debug endpoint (never documented in README)
|
| 119 |
-
|
| 120 |
-
### Benefits
|
| 121 |
-
- ✅ **Cleaner structure** - 28 fewer files in root directory
|
| 122 |
-
- ✅ **Better organization** - Clear separation of concerns
|
| 123 |
-
- ✅ **Easier navigation** - No clutter from obsolete scripts
|
| 124 |
-
- ✅ **Professional appearance** - Production-ready codebase
|
| 125 |
-
- ✅ **Reduced confusion** - No outdated documentation
|
| 126 |
-
- ✅ **Smaller repo size** - Faster clones and deployments
|
| 127 |
-
|
| 128 |
-
## Verification
|
| 129 |
-
|
| 130 |
-
### Syntax Validation
|
| 131 |
-
✅ All Python files compile successfully:
|
| 132 |
-
- `app/main.py` ✓
|
| 133 |
-
- `app/routers/openai_api.py` ✓
|
| 134 |
-
- `app/services/chat_service.py` ✓
|
| 135 |
-
|
| 136 |
-
### Import Structure
|
| 137 |
-
✅ No broken imports detected
|
| 138 |
-
✅ All module dependencies satisfied
|
| 139 |
-
|
| 140 |
-
### Test Suite
|
| 141 |
-
✅ Tests remain in `tests/` directory
|
| 142 |
-
✅ Proper pytest structure maintained
|
| 143 |
-
✅ Performance benchmarks preserved
|
| 144 |
-
|
| 145 |
-
## Git Status
|
| 146 |
-
|
| 147 |
-
### Staged Changes (Existing)
|
| 148 |
-
- `app/providers/transformers_provider.py` (previous work)
|
| 149 |
-
- `quiz_finance_francais.py` (previous work)
|
| 150 |
-
|
| 151 |
-
### Unstaged Changes (This Cleanup)
|
| 152 |
-
- Modified: `app/main.py` (removed debug router)
|
| 153 |
-
- Modified: `README.md` (updated documentation)
|
| 154 |
-
- Deleted: 26 obsolete files
|
| 155 |
-
- Added: `CLEANUP_PLAN.md` (this document)
|
| 156 |
-
|
| 157 |
-
## Backup
|
| 158 |
-
✅ Backup branch created: `pre-cleanup-backup`
|
| 159 |
-
|
| 160 |
-
To restore if needed:
|
| 161 |
-
```bash
|
| 162 |
-
git checkout pre-cleanup-backup
|
| 163 |
-
```
|
| 164 |
-
|
| 165 |
-
## Next Steps
|
| 166 |
-
|
| 167 |
-
1. ✅ Review changes
|
| 168 |
-
2. ⏳ Stage cleanup changes: `git add -A`
|
| 169 |
-
3. ⏳ Commit: `git commit -m "Clean up: Remove obsolete test scripts and documentation"`
|
| 170 |
-
4. ⏳ Optional: Squash with staged changes
|
| 171 |
-
5. ⏳ Push to repository
|
| 172 |
-
|
| 173 |
-
## Success Criteria
|
| 174 |
-
|
| 175 |
-
- ✅ All obsolete files removed
|
| 176 |
-
- ✅ Code syntax valid
|
| 177 |
-
- ✅ No broken imports
|
| 178 |
-
- ✅ README updated and accurate
|
| 179 |
-
- ✅ Backup created
|
| 180 |
-
- ✅ Professional project structure
|
| 181 |
-
|
| 182 |
-
## Summary
|
| 183 |
-
|
| 184 |
-
**Removed:** 28 files (21 test scripts, 5 docs, 2 code files)
|
| 185 |
-
**Modified:** 2 files (main.py, README.md)
|
| 186 |
-
**Added:** 2 files (CLEANUP_PLAN.md, CLEANUP_SUMMARY.md)
|
| 187 |
-
**Net Change:** -24 files
|
| 188 |
-
|
| 189 |
-
The codebase is now clean, well-organized, and production-ready! 🎉
|
| 190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CODE_REVIEW_SUMMARY.md
DELETED
|
@@ -1,119 +0,0 @@
|
|
| 1 |
-
# Code Review and Cleanup Summary
|
| 2 |
-
|
| 3 |
-
**Date:** November 2, 2025
|
| 4 |
-
**Reviewer:** AI Assistant
|
| 5 |
-
**Status:** Complete
|
| 6 |
-
|
| 7 |
-
## Executive Summary
|
| 8 |
-
|
| 9 |
-
Comprehensive codebase cleanup removing 28 obsolete files and refactoring documentation to be professional and concise.
|
| 10 |
-
|
| 11 |
-
## Changes Made
|
| 12 |
-
|
| 13 |
-
### Files Removed: 28
|
| 14 |
-
|
| 15 |
-
**Test Scripts (21 files):**
|
| 16 |
-
- All one-off test/debug scripts moved or removed
|
| 17 |
-
- Proper tests retained in `tests/` directory
|
| 18 |
-
|
| 19 |
-
**Documentation (5 files):**
|
| 20 |
-
- Obsolete status reports superseded by final documentation
|
| 21 |
-
- Old test result files removed
|
| 22 |
-
|
| 23 |
-
**Code (2 items):**
|
| 24 |
-
- Debug router removed from production code
|
| 25 |
-
- Empty utils directory removed
|
| 26 |
-
|
| 27 |
-
### Files Modified: 2
|
| 28 |
-
|
| 29 |
-
**app/main.py:**
|
| 30 |
-
- Removed debug router import and mount
|
| 31 |
-
- Cleaned up for production deployment
|
| 32 |
-
|
| 33 |
-
**README.md:**
|
| 34 |
-
- Removed all emojis from section headers
|
| 35 |
-
- Eliminated redundant self-congratulatory content
|
| 36 |
-
- Condensed from 189 to 139 lines
|
| 37 |
-
- Made professional and concise
|
| 38 |
-
- Removed "Features" checklist section
|
| 39 |
-
- Streamlined technical specifications
|
| 40 |
-
- Removed unnecessary "Contributing" section
|
| 41 |
-
|
| 42 |
-
### Files Added: 3
|
| 43 |
-
|
| 44 |
-
- `CLEANUP_PLAN.md` - Detailed cleanup strategy
|
| 45 |
-
- `CLEANUP_SUMMARY.md` - Execution summary
|
| 46 |
-
- `CODE_REVIEW_SUMMARY.md` - This document
|
| 47 |
-
|
| 48 |
-
## Project Structure (After Cleanup)
|
| 49 |
-
|
| 50 |
-
```
|
| 51 |
-
simple-llm-pro-finance/
|
| 52 |
-
├── app/ # Application code
|
| 53 |
-
│ ├── config.py
|
| 54 |
-
│ ├── main.py
|
| 55 |
-
│ ├── middleware.py
|
| 56 |
-
│ ├── models/
|
| 57 |
-
│ ├── providers/
|
| 58 |
-
│ ├── routers/
|
| 59 |
-
│ └── services/
|
| 60 |
-
├── tests/ # Test suite
|
| 61 |
-
├── scripts/ # Utilities
|
| 62 |
-
├── Dockerfile
|
| 63 |
-
├── requirements.txt
|
| 64 |
-
├── requirements-dev.txt
|
| 65 |
-
├── README.md # Clean, professional docs
|
| 66 |
-
├── FINAL_STATUS.md
|
| 67 |
-
├── FINAL_TEST_REPORT.md
|
| 68 |
-
└── LICENSE
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
## Code Quality Improvements
|
| 72 |
-
|
| 73 |
-
**Before:**
|
| 74 |
-
- 50+ files in repository
|
| 75 |
-
- Multiple redundant documentation files
|
| 76 |
-
- Debug endpoints in production code
|
| 77 |
-
- Verbose, emoji-heavy documentation
|
| 78 |
-
- Test scripts scattered in root directory
|
| 79 |
-
|
| 80 |
-
**After:**
|
| 81 |
-
- 26 essential files
|
| 82 |
-
- Single source of truth for documentation
|
| 83 |
-
- Production-ready code only
|
| 84 |
-
- Professional, concise documentation
|
| 85 |
-
- Organized test directory structure
|
| 86 |
-
|
| 87 |
-
## Verification
|
| 88 |
-
|
| 89 |
-
- Python syntax validation: PASSED
|
| 90 |
-
- Import structure: VALID
|
| 91 |
-
- No broken references: CONFIRMED
|
| 92 |
-
- Backup created: `pre-cleanup-backup` branch
|
| 93 |
-
|
| 94 |
-
## Impact
|
| 95 |
-
|
| 96 |
-
**Breaking Changes:** None
|
| 97 |
-
**Removed Endpoints:** `/v1/debug/prompt` (undocumented)
|
| 98 |
-
**Repository Size:** Reduced by ~24 files
|
| 99 |
-
**Maintainability:** Significantly improved
|
| 100 |
-
|
| 101 |
-
## Recommendations
|
| 102 |
-
|
| 103 |
-
### Immediate
|
| 104 |
-
1. Review and approve changes
|
| 105 |
-
2. Stage all changes: `git add -A`
|
| 106 |
-
3. Commit with message: "refactor: Clean up codebase - remove obsolete files and improve documentation"
|
| 107 |
-
4. Push to repository
|
| 108 |
-
|
| 109 |
-
### Future Considerations
|
| 110 |
-
1. Consider removing `CLEANUP_PLAN.md` and `CLEANUP_SUMMARY.md` after merge
|
| 111 |
-
2. Update `.gitignore` to prevent future test script accumulation
|
| 112 |
-
3. Establish guidelines for temporary debugging files
|
| 113 |
-
|
| 114 |
-
## Conclusion
|
| 115 |
-
|
| 116 |
-
The codebase is now clean, professional, and production-ready. All obsolete development artifacts have been removed, documentation is concise and accurate, and the project structure is well-organized.
|
| 117 |
-
|
| 118 |
-
**Net Result:** -24 files, cleaner code, better documentation.
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TEST_CODERABBIT.md
DELETED
|
@@ -1,40 +0,0 @@
|
|
| 1 |
-
# Testing CodeRabbit Integration
|
| 2 |
-
|
| 3 |
-
## What to do:
|
| 4 |
-
|
| 5 |
-
1. **Create a branch:**
|
| 6 |
-
```bash
|
| 7 |
-
git checkout -b test-coderabbit-review
|
| 8 |
-
```
|
| 9 |
-
|
| 10 |
-
2. **Commit this test file:**
|
| 11 |
-
```bash
|
| 12 |
-
git add TEST_CODERABBIT.md .github/pull_request_template.md
|
| 13 |
-
git commit -m "test: Add PR template and test CodeRabbit integration"
|
| 14 |
-
```
|
| 15 |
-
|
| 16 |
-
3. **Push and create PR:**
|
| 17 |
-
```bash
|
| 18 |
-
git push origin test-coderabbit-review
|
| 19 |
-
```
|
| 20 |
-
Then go to GitHub and create a Pull Request from `test-coderabbit-review` to `master`
|
| 21 |
-
|
| 22 |
-
4. **Watch for CodeRabbit:**
|
| 23 |
-
- CodeRabbit should automatically comment on your PR
|
| 24 |
-
- It will review code quality, suggest improvements
|
| 25 |
-
- Check for CodeRabbit comments in the PR thread
|
| 26 |
-
|
| 27 |
-
## What CodeRabbit will review:
|
| 28 |
-
- Code quality and best practices
|
| 29 |
-
- Potential bugs or security issues
|
| 30 |
-
- Performance optimizations
|
| 31 |
-
- Documentation completeness
|
| 32 |
-
- Test coverage
|
| 33 |
-
|
| 34 |
-
## To test more thoroughly:
|
| 35 |
-
After this test, try creating a PR with:
|
| 36 |
-
- A small bug (see if it catches it)
|
| 37 |
-
- Missing error handling
|
| 38 |
-
- Performance issues
|
| 39 |
-
- Security concerns
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/config.py
CHANGED
|
@@ -1,11 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from pydantic_settings import BaseSettings, SettingsConfigDict
|
| 2 |
|
| 3 |
|
| 4 |
class Settings(BaseSettings):
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
model_config = SettingsConfigDict(
|
| 11 |
env_file=".env",
|
|
|
|
| 1 |
+
"""Application configuration using Pydantic settings."""
|
| 2 |
+
|
| 3 |
+
from typing import Literal
|
| 4 |
+
from pydantic import Field
|
| 5 |
from pydantic_settings import BaseSettings, SettingsConfigDict
|
| 6 |
|
| 7 |
|
| 8 |
class Settings(BaseSettings):
|
| 9 |
+
"""Application settings loaded from environment variables.
|
| 10 |
+
|
| 11 |
+
Supports loading from .env file with UTF-8 encoding.
|
| 12 |
+
All settings can be overridden via environment variables.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
model: str = Field(
|
| 16 |
+
default="DragonLLM/qwen3-8b-fin-v1.0",
|
| 17 |
+
description="Hugging Face model identifier"
|
| 18 |
+
)
|
| 19 |
+
service_api_key: str | None = Field(
|
| 20 |
+
default=None,
|
| 21 |
+
description="Optional API key for authentication (SERVICE_API_KEY env var)"
|
| 22 |
+
)
|
| 23 |
+
log_level: Literal["debug", "info", "warning", "error"] = Field(
|
| 24 |
+
default="info",
|
| 25 |
+
description="Logging level"
|
| 26 |
+
)
|
| 27 |
+
force_model_reload: bool = Field(
|
| 28 |
+
default=False,
|
| 29 |
+
description="Force model reload from Hugging Face, bypassing cache (FORCE_MODEL_RELOAD env var)"
|
| 30 |
+
)
|
| 31 |
|
| 32 |
model_config = SettingsConfigDict(
|
| 33 |
env_file=".env",
|
app/main.py
CHANGED
|
@@ -1,15 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from typing import Dict
|
|
|
|
| 2 |
from fastapi import FastAPI
|
|
|
|
|
|
|
| 3 |
from app.middleware import api_key_guard
|
| 4 |
from app.routers import openai_api
|
| 5 |
-
from app.config import settings
|
| 6 |
-
import logging
|
| 7 |
|
| 8 |
# Configure logging
|
| 9 |
logging.basicConfig(level=logging.INFO)
|
| 10 |
logger = logging.getLogger(__name__)
|
| 11 |
|
| 12 |
-
app = FastAPI(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# Mount routers
|
| 15 |
app.include_router(openai_api.router, prefix="/v1")
|
|
@@ -17,10 +26,14 @@ app.include_router(openai_api.router, prefix="/v1")
|
|
| 17 |
# Optional API key middleware
|
| 18 |
app.middleware("http")(api_key_guard)
|
| 19 |
|
|
|
|
| 20 |
@app.on_event("startup")
|
| 21 |
-
async def startup_event():
|
| 22 |
-
"""Startup event - initialize model in background
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
| 24 |
logger.info("Starting LLM Pro Finance API...")
|
| 25 |
|
| 26 |
force_reload = settings.force_model_reload
|
|
@@ -29,7 +42,8 @@ async def startup_event():
|
|
| 29 |
|
| 30 |
logger.info("Initializing model in background thread...")
|
| 31 |
|
| 32 |
-
def load_model():
|
|
|
|
| 33 |
from app.providers.transformers_provider import initialize_model
|
| 34 |
initialize_model(force_reload=force_reload)
|
| 35 |
|
|
@@ -38,20 +52,30 @@ async def startup_event():
|
|
| 38 |
thread.start()
|
| 39 |
logger.info("Model initialization started in background")
|
| 40 |
|
|
|
|
| 41 |
@app.get("/")
|
| 42 |
async def root() -> Dict[str, str]:
|
| 43 |
-
"""Root endpoint returning API status and information.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
return {
|
| 45 |
"status": "ok",
|
| 46 |
"service": "Qwen Open Finance R 8B Inference",
|
| 47 |
"version": "1.0.0",
|
| 48 |
-
"model":
|
| 49 |
"backend": "Transformers"
|
| 50 |
}
|
| 51 |
|
|
|
|
| 52 |
@app.get("/health")
|
| 53 |
async def health() -> Dict[str, str]:
|
| 54 |
-
"""Health check endpoint.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
return {"status": "healthy", "service": "LLM Pro Finance API"}
|
| 56 |
|
| 57 |
|
|
|
|
| 1 |
+
"""Main FastAPI application entry point."""
|
| 2 |
+
|
| 3 |
+
import logging
|
| 4 |
+
import threading
|
| 5 |
from typing import Dict
|
| 6 |
+
|
| 7 |
from fastapi import FastAPI
|
| 8 |
+
|
| 9 |
+
from app.config import settings
|
| 10 |
from app.middleware import api_key_guard
|
| 11 |
from app.routers import openai_api
|
|
|
|
|
|
|
| 12 |
|
| 13 |
# Configure logging
|
| 14 |
logging.basicConfig(level=logging.INFO)
|
| 15 |
logger = logging.getLogger(__name__)
|
| 16 |
|
| 17 |
+
app = FastAPI(
|
| 18 |
+
title="LLM Pro Finance API (Transformers)",
|
| 19 |
+
description="OpenAI-compatible API for financial LLM inference",
|
| 20 |
+
version="1.0.0"
|
| 21 |
+
)
|
| 22 |
|
| 23 |
# Mount routers
|
| 24 |
app.include_router(openai_api.router, prefix="/v1")
|
|
|
|
| 26 |
# Optional API key middleware
|
| 27 |
app.middleware("http")(api_key_guard)
|
| 28 |
|
| 29 |
+
|
| 30 |
@app.on_event("startup")
|
| 31 |
+
async def startup_event() -> None:
|
| 32 |
+
"""Startup event - initialize model in background thread.
|
| 33 |
+
|
| 34 |
+
Loads the model asynchronously to avoid blocking the API startup.
|
| 35 |
+
Model loading happens in a daemon thread so it doesn't prevent shutdown.
|
| 36 |
+
"""
|
| 37 |
logger.info("Starting LLM Pro Finance API...")
|
| 38 |
|
| 39 |
force_reload = settings.force_model_reload
|
|
|
|
| 42 |
|
| 43 |
logger.info("Initializing model in background thread...")
|
| 44 |
|
| 45 |
+
def load_model() -> None:
|
| 46 |
+
"""Load the model in a background thread."""
|
| 47 |
from app.providers.transformers_provider import initialize_model
|
| 48 |
initialize_model(force_reload=force_reload)
|
| 49 |
|
|
|
|
| 52 |
thread.start()
|
| 53 |
logger.info("Model initialization started in background")
|
| 54 |
|
| 55 |
+
|
| 56 |
@app.get("/")
|
| 57 |
async def root() -> Dict[str, str]:
|
| 58 |
+
"""Root endpoint returning API status and information.
|
| 59 |
+
|
| 60 |
+
Returns:
|
| 61 |
+
Dictionary containing API status, service name, version, model, and backend.
|
| 62 |
+
"""
|
| 63 |
return {
|
| 64 |
"status": "ok",
|
| 65 |
"service": "Qwen Open Finance R 8B Inference",
|
| 66 |
"version": "1.0.0",
|
| 67 |
+
"model": settings.model,
|
| 68 |
"backend": "Transformers"
|
| 69 |
}
|
| 70 |
|
| 71 |
+
|
| 72 |
@app.get("/health")
|
| 73 |
async def health() -> Dict[str, str]:
|
| 74 |
+
"""Health check endpoint for monitoring and load balancers.
|
| 75 |
+
|
| 76 |
+
Returns:
|
| 77 |
+
Dictionary with service health status.
|
| 78 |
+
"""
|
| 79 |
return {"status": "healthy", "service": "LLM Pro Finance API"}
|
| 80 |
|
| 81 |
|
app/middleware.py
CHANGED
|
@@ -1,26 +1,46 @@
|
|
| 1 |
-
from fastapi import Request
|
| 2 |
-
from fastapi.responses import JSONResponse
|
|
|
|
| 3 |
|
| 4 |
from app.config import settings
|
| 5 |
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
|
|
|
|
|
|
|
|
|
| 11 |
# Skip auth for public endpoints
|
| 12 |
-
if request.url.path in
|
| 13 |
return await call_next(request)
|
| 14 |
|
| 15 |
# Skip auth if no API key is configured
|
| 16 |
if not settings.service_api_key:
|
| 17 |
return await call_next(request)
|
| 18 |
|
| 19 |
-
# Check API key
|
| 20 |
-
|
| 21 |
-
if
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
return await call_next(request)
|
| 23 |
|
| 24 |
-
return JSONResponse(
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
|
|
|
|
| 1 |
+
from fastapi import Request
|
| 2 |
+
from fastapi.responses import JSONResponse, Response
|
| 3 |
+
from typing import Callable, Awaitable, Union
|
| 4 |
|
| 5 |
from app.config import settings
|
| 6 |
|
| 7 |
+
# Public endpoints that don't require authentication
|
| 8 |
+
PUBLIC_PATHS = frozenset(["/", "/health", "/docs", "/redoc", "/openapi.json"])
|
| 9 |
|
| 10 |
+
|
| 11 |
+
async def api_key_guard(request: Request, call_next: Callable[[Request], Awaitable[Response]]) -> Union[Response, JSONResponse]:
|
| 12 |
+
"""
|
| 13 |
+
Middleware to protect API endpoints with optional API key authentication.
|
| 14 |
+
|
| 15 |
+
Args:
|
| 16 |
+
request: FastAPI request object
|
| 17 |
+
call_next: Next middleware/handler in the chain
|
| 18 |
|
| 19 |
+
Returns:
|
| 20 |
+
Response from next handler or 401 if unauthorized
|
| 21 |
+
"""
|
| 22 |
# Skip auth for public endpoints
|
| 23 |
+
if request.url.path in PUBLIC_PATHS:
|
| 24 |
return await call_next(request)
|
| 25 |
|
| 26 |
# Skip auth if no API key is configured
|
| 27 |
if not settings.service_api_key:
|
| 28 |
return await call_next(request)
|
| 29 |
|
| 30 |
+
# Check API key from headers
|
| 31 |
+
api_key = request.headers.get("x-api-key")
|
| 32 |
+
if not api_key:
|
| 33 |
+
# Also check Authorization header with Bearer token
|
| 34 |
+
auth_header = request.headers.get("authorization", "")
|
| 35 |
+
if auth_header.startswith("Bearer "):
|
| 36 |
+
api_key = auth_header.replace("Bearer ", "").strip()
|
| 37 |
+
|
| 38 |
+
if api_key and api_key == settings.service_api_key:
|
| 39 |
return await call_next(request)
|
| 40 |
|
| 41 |
+
return JSONResponse(
|
| 42 |
+
content={"error": {"message": "unauthorized", "type": "authentication_error"}},
|
| 43 |
+
status_code=401
|
| 44 |
+
)
|
| 45 |
|
| 46 |
|
app/models/openai.py
CHANGED
|
@@ -1,4 +1,7 @@
|
|
|
|
|
|
|
|
| 1 |
from typing import List, Literal, Optional
|
|
|
|
| 2 |
from pydantic import BaseModel, Field
|
| 3 |
|
| 4 |
|
|
@@ -6,42 +9,88 @@ Role = Literal["system", "user", "assistant", "tool"]
|
|
| 6 |
|
| 7 |
|
| 8 |
class Message(BaseModel):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
role: Role
|
| 10 |
-
content: str
|
| 11 |
|
| 12 |
|
| 13 |
class ChatCompletionRequest(BaseModel):
|
| 14 |
-
model
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
class ChoiceMessage(BaseModel):
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
|
| 27 |
class Choice(BaseModel):
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
class Usage(BaseModel):
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
|
| 39 |
class ChatCompletionResponse(BaseModel):
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
|
|
|
|
| 1 |
+
"""OpenAI-compatible API models using Pydantic."""
|
| 2 |
+
|
| 3 |
from typing import List, Literal, Optional
|
| 4 |
+
|
| 5 |
from pydantic import BaseModel, Field
|
| 6 |
|
| 7 |
|
|
|
|
| 9 |
|
| 10 |
|
| 11 |
class Message(BaseModel):
|
| 12 |
+
"""A single message in a conversation.
|
| 13 |
+
|
| 14 |
+
Attributes:
|
| 15 |
+
role: The role of the message sender
|
| 16 |
+
content: The text content of the message
|
| 17 |
+
"""
|
| 18 |
role: Role
|
| 19 |
+
content: str = Field(..., description="Message content")
|
| 20 |
|
| 21 |
|
| 22 |
class ChatCompletionRequest(BaseModel):
|
| 23 |
+
"""Request model for chat completions endpoint.
|
| 24 |
+
|
| 25 |
+
Attributes:
|
| 26 |
+
model: Optional model identifier (uses default from config if not provided)
|
| 27 |
+
messages: List of messages in the conversation
|
| 28 |
+
temperature: Sampling temperature (0-2)
|
| 29 |
+
max_tokens: Maximum tokens to generate
|
| 30 |
+
stream: Whether to stream the response
|
| 31 |
+
top_p: Nucleus sampling parameter
|
| 32 |
+
"""
|
| 33 |
+
model: Optional[str] = Field(default=None, description="Model identifier")
|
| 34 |
+
messages: List[Message] = Field(..., description="Conversation messages")
|
| 35 |
+
temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
|
| 36 |
+
max_tokens: Optional[int] = Field(default=None, ge=1, description="Maximum tokens to generate")
|
| 37 |
+
stream: Optional[bool] = Field(default=False, description="Stream response")
|
| 38 |
+
top_p: Optional[float] = Field(default=1.0, ge=0.0, le=1.0, description="Nucleus sampling parameter")
|
| 39 |
|
| 40 |
|
| 41 |
class ChoiceMessage(BaseModel):
|
| 42 |
+
"""Assistant message in a completion choice.
|
| 43 |
+
|
| 44 |
+
Attributes:
|
| 45 |
+
role: Always "assistant" for completion messages
|
| 46 |
+
content: The generated message content
|
| 47 |
+
"""
|
| 48 |
+
role: Literal["assistant"] = "assistant"
|
| 49 |
+
content: Optional[str] = Field(default=None, description="Generated message content")
|
| 50 |
|
| 51 |
|
| 52 |
class Choice(BaseModel):
|
| 53 |
+
"""A single completion choice.
|
| 54 |
+
|
| 55 |
+
Attributes:
|
| 56 |
+
index: Choice index
|
| 57 |
+
message: The generated message
|
| 58 |
+
finish_reason: Reason why generation finished (stop, length, etc.)
|
| 59 |
+
"""
|
| 60 |
+
index: int = Field(..., description="Choice index")
|
| 61 |
+
message: ChoiceMessage = Field(..., description="Generated message")
|
| 62 |
+
finish_reason: Optional[str] = Field(default=None, description="Reason for completion")
|
| 63 |
|
| 64 |
|
| 65 |
class Usage(BaseModel):
|
| 66 |
+
"""Token usage statistics.
|
| 67 |
+
|
| 68 |
+
Attributes:
|
| 69 |
+
prompt_tokens: Number of tokens in the prompt
|
| 70 |
+
completion_tokens: Number of tokens in the completion
|
| 71 |
+
total_tokens: Total tokens used
|
| 72 |
+
"""
|
| 73 |
+
prompt_tokens: int = Field(..., ge=0, description="Tokens in prompt")
|
| 74 |
+
completion_tokens: int = Field(..., ge=0, description="Tokens in completion")
|
| 75 |
+
total_tokens: int = Field(..., ge=0, description="Total tokens used")
|
| 76 |
|
| 77 |
|
| 78 |
class ChatCompletionResponse(BaseModel):
|
| 79 |
+
"""Response model for chat completions endpoint.
|
| 80 |
+
|
| 81 |
+
Attributes:
|
| 82 |
+
id: Unique completion ID
|
| 83 |
+
object: Always "chat.completion"
|
| 84 |
+
created: Unix timestamp of creation
|
| 85 |
+
model: Model identifier used
|
| 86 |
+
choices: List of completion choices
|
| 87 |
+
usage: Optional token usage statistics
|
| 88 |
+
"""
|
| 89 |
+
id: str = Field(..., description="Completion ID")
|
| 90 |
+
object: Literal["chat.completion"] = Field(default="chat.completion", description="Object type")
|
| 91 |
+
created: int = Field(..., description="Unix timestamp")
|
| 92 |
+
model: str = Field(..., description="Model identifier")
|
| 93 |
+
choices: List[Choice] = Field(..., description="Completion choices")
|
| 94 |
+
usage: Optional[Usage] = Field(default=None, description="Token usage statistics")
|
| 95 |
|
| 96 |
|
app/providers/base.py
CHANGED
|
@@ -1,11 +1,33 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
| 2 |
|
| 3 |
|
| 4 |
class LLMProvider(Protocol):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
async def list_models(self) -> Dict[str, Any]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
...
|
| 7 |
-
|
| 8 |
async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
...
|
| 10 |
|
| 11 |
|
|
|
|
| 1 |
+
"""Base protocol for LLM providers."""
|
| 2 |
+
|
| 3 |
+
from typing import Any, Dict, Protocol
|
| 4 |
|
| 5 |
|
| 6 |
class LLMProvider(Protocol):
|
| 7 |
+
"""Protocol defining the interface for LLM providers.
|
| 8 |
+
|
| 9 |
+
Any class implementing this protocol must provide async methods
|
| 10 |
+
for listing models and generating chat completions.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
async def list_models(self) -> Dict[str, Any]:
|
| 14 |
+
"""List available models.
|
| 15 |
+
|
| 16 |
+
Returns:
|
| 17 |
+
Dictionary containing model information.
|
| 18 |
+
"""
|
| 19 |
...
|
| 20 |
+
|
| 21 |
async def chat(self, payload: Dict[str, Any], stream: bool = False) -> Any:
|
| 22 |
+
"""Generate chat completion.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
payload: Request payload containing messages and parameters
|
| 26 |
+
stream: Whether to stream the response
|
| 27 |
+
|
| 28 |
+
Returns:
|
| 29 |
+
Chat completion response (varies by implementation)
|
| 30 |
+
"""
|
| 31 |
...
|
| 32 |
|
| 33 |
|
app/providers/transformers_provider.py
CHANGED
|
@@ -3,7 +3,7 @@ import time
|
|
| 3 |
import json
|
| 4 |
import logging
|
| 5 |
import torch
|
| 6 |
-
from typing import Dict, Any, AsyncIterator, Union
|
| 7 |
import asyncio
|
| 8 |
from threading import Thread, Lock
|
| 9 |
from huggingface_hub import login, hf_hub_download
|
|
@@ -386,20 +386,28 @@ class TransformersProvider:
|
|
| 386 |
yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
|
| 387 |
yield "data: [DONE]\n\n"
|
| 388 |
|
| 389 |
-
def _messages_to_prompt(self, messages:
|
| 390 |
-
"""
|
| 391 |
-
prompt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 392 |
for message in messages:
|
| 393 |
-
role = message
|
| 394 |
-
content = message
|
| 395 |
if role == "system":
|
| 396 |
-
|
| 397 |
elif role == "user":
|
| 398 |
-
|
| 399 |
elif role == "assistant":
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
return
|
| 403 |
|
| 404 |
|
| 405 |
# Module-level provider instance
|
|
|
|
| 3 |
import json
|
| 4 |
import logging
|
| 5 |
import torch
|
| 6 |
+
from typing import Dict, Any, AsyncIterator, Union, List
|
| 7 |
import asyncio
|
| 8 |
from threading import Thread, Lock
|
| 9 |
from huggingface_hub import login, hf_hub_download
|
|
|
|
| 386 |
yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
|
| 387 |
yield "data: [DONE]\n\n"
|
| 388 |
|
| 389 |
+
def _messages_to_prompt(self, messages: List[Dict[str, str]]) -> str:
|
| 390 |
+
"""
|
| 391 |
+
Convert OpenAI messages format to prompt (fallback).
|
| 392 |
+
|
| 393 |
+
Args:
|
| 394 |
+
messages: List of message dictionaries with 'role' and 'content'
|
| 395 |
+
|
| 396 |
+
Returns:
|
| 397 |
+
Formatted prompt string
|
| 398 |
+
"""
|
| 399 |
+
prompt_parts = []
|
| 400 |
for message in messages:
|
| 401 |
+
role = message.get("role", "user")
|
| 402 |
+
content = message.get("content", "")
|
| 403 |
if role == "system":
|
| 404 |
+
prompt_parts.append(f"System: {content}")
|
| 405 |
elif role == "user":
|
| 406 |
+
prompt_parts.append(f"User: {content}")
|
| 407 |
elif role == "assistant":
|
| 408 |
+
prompt_parts.append(f"Assistant: {content}")
|
| 409 |
+
prompt_parts.append("Assistant: ")
|
| 410 |
+
return "\n".join(prompt_parts)
|
| 411 |
|
| 412 |
|
| 413 |
# Module-level provider instance
|
app/routers/openai_api.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
from typing import Any, Dict
|
| 2 |
import logging
|
| 3 |
|
| 4 |
from fastapi import APIRouter, Query
|
|
@@ -15,13 +15,13 @@ router = APIRouter()
|
|
| 15 |
|
| 16 |
|
| 17 |
@router.get("/models")
|
| 18 |
-
async def list_models():
|
| 19 |
"""List available models (OpenAI-compatible endpoint)"""
|
| 20 |
return await chat_service.list_models()
|
| 21 |
|
| 22 |
|
| 23 |
@router.post("/models/reload")
|
| 24 |
-
async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")):
|
| 25 |
"""
|
| 26 |
Reload the model from cache or Hugging Face Hub.
|
| 27 |
|
|
@@ -51,7 +51,7 @@ async def reload_model(force: bool = Query(False, description="Force reload from
|
|
| 51 |
|
| 52 |
|
| 53 |
@router.post("/chat/completions")
|
| 54 |
-
async def chat_completions(body: ChatCompletionRequest):
|
| 55 |
"""Chat completions endpoint (OpenAI-compatible)"""
|
| 56 |
try:
|
| 57 |
# Validate messages list is not empty
|
|
@@ -61,22 +61,23 @@ async def chat_completions(body: ChatCompletionRequest):
|
|
| 61 |
content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
|
| 62 |
)
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
# Build payload with all supported parameters
|
| 65 |
payload: Dict[str, Any] = {
|
| 66 |
"model": body.model or settings.model,
|
| 67 |
"messages": [m.model_dump() for m in body.messages],
|
| 68 |
-
"temperature":
|
| 69 |
"top_p": body.top_p or 1.0,
|
| 70 |
"stream": body.stream or False,
|
| 71 |
}
|
| 72 |
|
| 73 |
-
# Validate temperature range
|
| 74 |
-
if payload["temperature"] < 0 or payload["temperature"] > 2:
|
| 75 |
-
return JSONResponse(
|
| 76 |
-
status_code=400,
|
| 77 |
-
content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
|
| 78 |
-
)
|
| 79 |
-
|
| 80 |
# Add optional max_tokens if provided
|
| 81 |
if body.max_tokens is not None:
|
| 82 |
if body.max_tokens < 1:
|
|
|
|
| 1 |
+
from typing import Any, Dict, Union
|
| 2 |
import logging
|
| 3 |
|
| 4 |
from fastapi import APIRouter, Query
|
|
|
|
| 15 |
|
| 16 |
|
| 17 |
@router.get("/models")
|
| 18 |
+
async def list_models() -> Dict[str, Any]:
|
| 19 |
"""List available models (OpenAI-compatible endpoint)"""
|
| 20 |
return await chat_service.list_models()
|
| 21 |
|
| 22 |
|
| 23 |
@router.post("/models/reload")
|
| 24 |
+
async def reload_model(force: bool = Query(False, description="Force reload from Hugging Face Hub")) -> JSONResponse:
|
| 25 |
"""
|
| 26 |
Reload the model from cache or Hugging Face Hub.
|
| 27 |
|
|
|
|
| 51 |
|
| 52 |
|
| 53 |
@router.post("/chat/completions")
|
| 54 |
+
async def chat_completions(body: ChatCompletionRequest) -> Union[JSONResponse, StreamingResponse]:
|
| 55 |
"""Chat completions endpoint (OpenAI-compatible)"""
|
| 56 |
try:
|
| 57 |
# Validate messages list is not empty
|
|
|
|
| 61 |
content={"error": {"message": "messages list cannot be empty", "type": "invalid_request_error"}}
|
| 62 |
)
|
| 63 |
|
| 64 |
+
# Validate temperature range before building payload
|
| 65 |
+
temperature = body.temperature or 0.7
|
| 66 |
+
if temperature < 0 or temperature > 2:
|
| 67 |
+
return JSONResponse(
|
| 68 |
+
status_code=400,
|
| 69 |
+
content={"error": {"message": "temperature must be between 0 and 2", "type": "invalid_request_error"}}
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
# Build payload with all supported parameters
|
| 73 |
payload: Dict[str, Any] = {
|
| 74 |
"model": body.model or settings.model,
|
| 75 |
"messages": [m.model_dump() for m in body.messages],
|
| 76 |
+
"temperature": temperature,
|
| 77 |
"top_p": body.top_p or 1.0,
|
| 78 |
"stream": body.stream or False,
|
| 79 |
}
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
# Add optional max_tokens if provided
|
| 82 |
if body.max_tokens is not None:
|
| 83 |
if body.max_tokens < 1:
|
app/services/chat_service.py
CHANGED
|
@@ -1,13 +1,33 @@
|
|
| 1 |
-
|
|
|
|
| 2 |
|
| 3 |
from app.providers import transformers_provider as provider
|
| 4 |
|
| 5 |
|
| 6 |
async def list_models() -> Dict[str, Any]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
return await provider.list_models()
|
| 8 |
|
| 9 |
|
| 10 |
-
async def chat(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
return await provider.chat(payload, stream=stream)
|
| 12 |
|
| 13 |
|
|
|
|
| 1 |
+
"""Chat service layer providing abstraction over the provider."""
|
| 2 |
+
from typing import Any, Dict, Union, AsyncIterator
|
| 3 |
|
| 4 |
from app.providers import transformers_provider as provider
|
| 5 |
|
| 6 |
|
| 7 |
async def list_models() -> Dict[str, Any]:
|
| 8 |
+
"""
|
| 9 |
+
List available models.
|
| 10 |
+
|
| 11 |
+
Returns:
|
| 12 |
+
Dictionary containing model list in OpenAI-compatible format
|
| 13 |
+
"""
|
| 14 |
return await provider.list_models()
|
| 15 |
|
| 16 |
|
| 17 |
+
async def chat(
|
| 18 |
+
payload: Dict[str, Any],
|
| 19 |
+
stream: bool = False
|
| 20 |
+
) -> Union[Dict[str, Any], AsyncIterator[str]]:
|
| 21 |
+
"""
|
| 22 |
+
Process chat completion request.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
payload: Request payload containing messages and generation parameters
|
| 26 |
+
stream: Whether to stream the response
|
| 27 |
+
|
| 28 |
+
Returns:
|
| 29 |
+
Response dictionary or async iterator for streaming
|
| 30 |
+
"""
|
| 31 |
return await provider.chat(payload, stream=stream)
|
| 32 |
|
| 33 |
|
app/utils/constants.py
CHANGED
|
@@ -1,18 +1,25 @@
|
|
| 1 |
-
"""Application-wide constants."""
|
| 2 |
|
| 3 |
import os
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Model configuration
|
| 6 |
-
MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
|
| 7 |
|
| 8 |
# Cache directory - respect HF_HOME if set, otherwise use default
|
| 9 |
-
CACHE_DIR = os.getenv("HF_HOME", "/tmp/huggingface")
|
| 10 |
|
| 11 |
# Hugging Face token environment variable priority order
|
| 12 |
-
HF_TOKEN_VARS = [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# French language detection patterns
|
| 15 |
-
FRENCH_PHRASES = [
|
| 16 |
"en français",
|
| 17 |
"répondez en français",
|
| 18 |
"réponse française",
|
|
@@ -20,9 +27,11 @@ FRENCH_PHRASES = [
|
|
| 20 |
"expliquez en français",
|
| 21 |
]
|
| 22 |
|
| 23 |
-
FRENCH_CHARS = [
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
FRENCH_PATTERNS = [
|
| 26 |
"qu'est-ce",
|
| 27 |
"qu'est",
|
| 28 |
"expliquez",
|
|
@@ -38,7 +47,7 @@ FRENCH_PATTERNS = [
|
|
| 38 |
"définissez",
|
| 39 |
]
|
| 40 |
|
| 41 |
-
FRENCH_SYSTEM_PROMPT = (
|
| 42 |
"Vous êtes un assistant financier expert. "
|
| 43 |
"Répondez TOUJOURS en français. "
|
| 44 |
"Soyez concis et précis dans vos explications. "
|
|
@@ -46,13 +55,13 @@ FRENCH_SYSTEM_PROMPT = (
|
|
| 46 |
)
|
| 47 |
|
| 48 |
# Qwen3 EOS tokens
|
| 49 |
-
EOS_TOKENS = [151645, 151643] # [<|im_end|>, <|endoftext|>]
|
| 50 |
-
PAD_TOKEN_ID = 151643 # <|endoftext|>
|
| 51 |
|
| 52 |
# Generation defaults
|
| 53 |
-
DEFAULT_MAX_TOKENS = 1000 # Increased for complete answers with concise reasoning
|
| 54 |
-
DEFAULT_TEMPERATURE = 0.7
|
| 55 |
-
DEFAULT_TOP_P = 1.0
|
| 56 |
-
DEFAULT_TOP_K = 20
|
| 57 |
-
REPETITION_PENALTY = 1.05
|
| 58 |
|
|
|
|
| 1 |
+
"""Application-wide constants and configuration."""
|
| 2 |
|
| 3 |
import os
|
| 4 |
+
from typing import Final, List
|
| 5 |
+
|
| 6 |
|
| 7 |
# Model configuration
|
| 8 |
+
MODEL_NAME: Final[str] = "DragonLLM/qwen3-8b-fin-v1.0"
|
| 9 |
|
| 10 |
# Cache directory - respect HF_HOME if set, otherwise use default
|
| 11 |
+
CACHE_DIR: Final[str] = os.getenv("HF_HOME", "/tmp/huggingface")
|
| 12 |
|
| 13 |
# Hugging Face token environment variable priority order
|
| 14 |
+
HF_TOKEN_VARS: Final[List[str]] = [
|
| 15 |
+
"HF_TOKEN_LC2",
|
| 16 |
+
"HF_TOKEN_LC",
|
| 17 |
+
"HF_TOKEN",
|
| 18 |
+
"HUGGING_FACE_HUB_TOKEN"
|
| 19 |
+
]
|
| 20 |
|
| 21 |
# French language detection patterns
|
| 22 |
+
FRENCH_PHRASES: Final[List[str]] = [
|
| 23 |
"en français",
|
| 24 |
"répondez en français",
|
| 25 |
"réponse française",
|
|
|
|
| 27 |
"expliquez en français",
|
| 28 |
]
|
| 29 |
|
| 30 |
+
FRENCH_CHARS: Final[List[str]] = [
|
| 31 |
+
"é", "è", "ê", "à", "ç", "ù", "ô", "î", "â", "û", "ë", "ï"
|
| 32 |
+
]
|
| 33 |
|
| 34 |
+
FRENCH_PATTERNS: Final[List[str]] = [
|
| 35 |
"qu'est-ce",
|
| 36 |
"qu'est",
|
| 37 |
"expliquez",
|
|
|
|
| 47 |
"définissez",
|
| 48 |
]
|
| 49 |
|
| 50 |
+
FRENCH_SYSTEM_PROMPT: Final[str] = (
|
| 51 |
"Vous êtes un assistant financier expert. "
|
| 52 |
"Répondez TOUJOURS en français. "
|
| 53 |
"Soyez concis et précis dans vos explications. "
|
|
|
|
| 55 |
)
|
| 56 |
|
| 57 |
# Qwen3 EOS tokens
|
| 58 |
+
EOS_TOKENS: Final[List[int]] = [151645, 151643] # [<|im_end|>, <|endoftext|>]
|
| 59 |
+
PAD_TOKEN_ID: Final[int] = 151643 # <|endoftext|>
|
| 60 |
|
| 61 |
# Generation defaults
|
| 62 |
+
DEFAULT_MAX_TOKENS: Final[int] = 1000 # Increased for complete answers with concise reasoning
|
| 63 |
+
DEFAULT_TEMPERATURE: Final[float] = 0.7
|
| 64 |
+
DEFAULT_TOP_P: Final[float] = 1.0
|
| 65 |
+
DEFAULT_TOP_K: Final[int] = 20
|
| 66 |
+
REPETITION_PENALTY: Final[float] = 1.05
|
| 67 |
|
app/utils/helpers.py
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
import os
|
| 4 |
import logging
|
| 5 |
-
from typing import Optional, Tuple
|
| 6 |
|
| 7 |
from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
|
| 8 |
|
|
@@ -24,7 +24,7 @@ def get_hf_token() -> Tuple[Optional[str], str]:
|
|
| 24 |
return None, "none"
|
| 25 |
|
| 26 |
|
| 27 |
-
def is_french_request(messages:
|
| 28 |
"""
|
| 29 |
Detect if the request is in French based on user messages.
|
| 30 |
|
|
@@ -55,7 +55,7 @@ def is_french_request(messages: list) -> bool:
|
|
| 55 |
return False
|
| 56 |
|
| 57 |
|
| 58 |
-
def has_french_system_prompt(messages:
|
| 59 |
"""Check if messages already contain a French system prompt."""
|
| 60 |
return any(
|
| 61 |
"français" in msg.get("content", "").lower()
|
|
|
|
| 2 |
|
| 3 |
import os
|
| 4 |
import logging
|
| 5 |
+
from typing import Optional, Tuple, List, Dict, Any
|
| 6 |
|
| 7 |
from app.utils.constants import HF_TOKEN_VARS, FRENCH_PHRASES, FRENCH_CHARS, FRENCH_PATTERNS
|
| 8 |
|
|
|
|
| 24 |
return None, "none"
|
| 25 |
|
| 26 |
|
| 27 |
+
def is_french_request(messages: List[Dict[str, Any]]) -> bool:
|
| 28 |
"""
|
| 29 |
Detect if the request is in French based on user messages.
|
| 30 |
|
|
|
|
| 55 |
return False
|
| 56 |
|
| 57 |
|
| 58 |
+
def has_french_system_prompt(messages: List[Dict[str, Any]]) -> bool:
|
| 59 |
"""Check if messages already contain a French system prompt."""
|
| 60 |
return any(
|
| 61 |
"français" in msg.get("content", "").lower()
|
app/utils/memory.py
CHANGED
|
@@ -1,12 +1,23 @@
|
|
| 1 |
"""GPU memory management utilities."""
|
| 2 |
|
| 3 |
import gc
|
|
|
|
|
|
|
| 4 |
import torch
|
| 5 |
-
from typing import Optional
|
| 6 |
|
| 7 |
|
| 8 |
-
def clear_gpu_memory(model=None, tokenizer=None):
|
| 9 |
-
"""Clear GPU memory completely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
if not torch.cuda.is_available():
|
| 11 |
return
|
| 12 |
|
|
|
|
| 1 |
"""GPU memory management utilities."""
|
| 2 |
|
| 3 |
import gc
|
| 4 |
+
from typing import Optional, Any
|
| 5 |
+
|
| 6 |
import torch
|
|
|
|
| 7 |
|
| 8 |
|
| 9 |
+
def clear_gpu_memory(model: Optional[Any] = None, tokenizer: Optional[Any] = None) -> None:
|
| 10 |
+
"""Clear GPU memory completely.
|
| 11 |
+
|
| 12 |
+
This function performs aggressive GPU memory cleanup by:
|
| 13 |
+
1. Deleting model and tokenizer objects if provided
|
| 14 |
+
2. Clearing CUDA cache
|
| 15 |
+
3. Running multiple garbage collection passes
|
| 16 |
+
|
| 17 |
+
Args:
|
| 18 |
+
model: Optional model object to delete
|
| 19 |
+
tokenizer: Optional tokenizer object to delete
|
| 20 |
+
"""
|
| 21 |
if not torch.cuda.is_available():
|
| 22 |
return
|
| 23 |
|