jeanbaptdzd commited on
Commit
a750766
·
1 Parent(s): da484d7

Update to vLLM 0.9.2 with Qwen3 support, remove PRIIPS functionality, add HF Space validation hook

Browse files

- Upgraded vLLM from 0.6.5 to 0.9.2 for Qwen3ForCausalLM support
- Removed all PRIIPS-related code and files
- Added pre-commit hook for README.md validation
- Updated README.md with red dragon theme
- Fixed Space URL references in test scripts
- Cleaned up unnecessary markdown files and scripts

LICENSE CHANGED
@@ -186,7 +186,7 @@
186
  same "printed page" as the copyright notice for easier
187
  identification within third-party archives.
188
 
189
- Copyright 2025 PRIIPs LLM Service
190
 
191
  Licensed under the Apache License, Version 2.0 (the "License");
192
  you may not use this file except in compliance with the License.
 
186
  same "printed page" as the copyright notice for easier
187
  identification within third-party archives.
188
 
189
+ Copyright 2025 LLM Pro Finance API
190
 
191
  Licensed under the Apache License, Version 2.0 (the "License");
192
  you may not use this file except in compliance with the License.
OPTIMIZATION_EVALUATION.md DELETED
@@ -1,137 +0,0 @@
1
- # vLLM Optimization Mode Evaluation
2
-
3
- ## Current Setup: Eager Mode
4
-
5
- **Configuration:**
6
- - `enforce_eager=True` - Disables CUDA graphs
7
- - `VLLM_USE_V1=0` - Uses v0 engine (stable)
8
-
9
- **Trade-offs:**
10
- - ✅ **Pros:** More stable, easier debugging, fewer compatibility issues
11
- - ❌ **Cons:** Lower performance, higher latency, reduced throughput
12
-
13
- ## Optimized Mode: CUDA Graphs Enabled
14
-
15
- **Proposed Configuration:**
16
- - `enforce_eager=False` - Enables CUDA graphs (default)
17
- - `VLLM_USE_V1=0` - Still use v0 engine for stability
18
-
19
- **Expected Benefits:**
20
- - 🚀 **Performance:** 2-3x faster inference
21
- - 🚀 **Throughput:** Higher tokens/second
22
- - 🚀 **Latency:** Lower time-to-first-token (TTFT)
23
-
24
- **Potential Risks:**
25
- - ⚠️ **Compatibility:** Qwen3 may have CUDA graph issues in vLLM 0.6.5
26
- - ⚠️ **Memory:** Slightly higher memory overhead
27
- - ⚠️ **Stability:** Possible crashes with unsupported operations
28
-
29
- ## Evaluation Criteria
30
-
31
- ### Can We Use Optimized Mode?
32
-
33
- **Factors to Consider:**
34
-
35
- 1. **Model Architecture Support**
36
- - Qwen3 in vLLM 0.6.5 may or may not fully support CUDA graphs
37
- - Need to test on actual deployment
38
-
39
- 2. **Hardware Compatibility**
40
- - L4 GPU: 24GB VRAM ✅
41
- - CUDA 12.4: Full CUDA graph support ✅
42
- - PyTorch 2.4.0: CUDA graph support ✅
43
-
44
- 3. **vLLM Version**
45
- - v0.6.5: CUDA graphs should work for supported architectures
46
- - Qwen3 support may vary
47
-
48
- 4. **Memory Constraints**
49
- - Current: `gpu_memory_utilization=0.85`
50
- - CUDA graphs add ~100-200MB overhead
51
- - Should still fit within L4 limits
52
-
53
- ## Recommendation: Try Optimized Mode with Fallback
54
-
55
- **Strategy:** Attempt optimized mode, fall back to eager if errors occur
56
-
57
- ### Implementation Approach
58
-
59
- ```python
60
- # Try optimized mode first
61
- try:
62
- llm_engine = LLM(
63
- model=model_name,
64
- trust_remote_code=True,
65
- dtype="bfloat16",
66
- enforce_eager=False, # Enable CUDA graphs
67
- # ... other params
68
- )
69
- except Exception as e:
70
- # Fall back to eager mode
71
- logger.warning(f"CUDA graphs failed, falling back to eager mode: {e}")
72
- llm_engine = LLM(
73
- model=model_name,
74
- trust_remote_code=True,
75
- dtype="bfloat16",
76
- enforce_eager=True, # Safe fallback
77
- # ... other params
78
- )
79
- ```
80
-
81
- ## Testing Plan
82
-
83
- ### 1. Initial Test (Optimized Mode)
84
- - Deploy with `enforce_eager=False`
85
- - Monitor startup logs
86
- - Check for CUDA graph compilation errors
87
-
88
- ### 2. Performance Benchmark
89
- If optimized mode works:
90
- - Measure: tokens/second, latency, throughput
91
- - Compare with eager mode baseline
92
-
93
- ### 3. Stability Test
94
- - Run multiple requests
95
- - Check for crashes or errors
96
- - Monitor memory usage
97
-
98
- ### 4. Fallback Verification
99
- - Ensure eager mode still works as backup
100
- - Document any issues found
101
-
102
- ## Expected Outcomes
103
-
104
- ### Best Case (Optimized Works)
105
- - ✅ CUDA graphs compile successfully
106
- - ✅ 2-3x performance improvement
107
- - ✅ Stable operation
108
- - **Action:** Keep optimized mode
109
-
110
- ### Worst Case (Optimized Fails)
111
- - ❌ CUDA graph compilation errors
112
- - ❌ Runtime crashes
113
- - ✅ Eager mode fallback works
114
- - **Action:** Stay in eager mode, consider upgrading vLLM
115
-
116
- ### Middle Case (Partial Support)
117
- - ⚠️ CUDA graphs work but with warnings
118
- - ⚠️ Some operations fall back to eager
119
- - ✅ Still better than full eager mode
120
- - **Action:** Monitor and optimize further
121
-
122
- ## Monitoring
123
-
124
- Track these metrics:
125
- - Model loading time
126
- - CUDA graph compilation time
127
- - Inference latency
128
- - Throughput (tokens/sec)
129
- - Memory usage
130
- - Error rates
131
-
132
- ## Conclusion
133
-
134
- **Recommendation:** **TRY OPTIMIZED MODE** with automatic fallback
135
-
136
- The L4 GPU and CUDA 12.4 setup should support CUDA graphs. Qwen3 compatibility is the main unknown. With automatic fallback to eager mode, we can safely test optimized mode without risking service availability.
137
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,24 +1,22 @@
1
  ---
2
- title: Qwen Open Finance R 8B Inference
3
- emoji: 📊
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: docker
7
  pinned: false
8
- license: apache-2.0
9
  app_port: 7860
10
- hardware: l4
11
  ---
12
 
13
- # Qwen Open Finance R 8B Inference
14
 
15
- OpenAI-compatible API and financial document processor powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
16
 
17
  ## 🚀 Quick Start
18
 
19
  This service provides:
20
  - **OpenAI-compatible API** at `/v1/models` and `/v1/chat/completions`
21
- - **PRIIPs extraction** at `/extract-priips` for structured financial document parsing
22
  - **Streaming support** for real-time completions
23
  - **Provider abstraction** for easy integration with PydanticAI/DSPy
24
 
@@ -28,12 +26,12 @@ This service provides:
28
 
29
  #### List Models
30
  ```bash
31
- curl -X GET "https://your-space-url.hf.space/v1/models"
32
  ```
33
 
34
  #### Chat Completions
35
  ```bash
36
- curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
37
  -H "Content-Type: application/json" \
38
  -d '{
39
  "model": "DragonLLM/qwen3-8b-fin-v1.0",
@@ -45,7 +43,7 @@ curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
45
 
46
  #### Streaming Chat Completions
47
  ```bash
48
- curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
49
  -H "Content-Type: application/json" \
50
  -d '{
51
  "model": "DragonLLM/qwen3-8b-fin-v1.0",
@@ -54,44 +52,6 @@ curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
54
  }'
55
  ```
56
 
57
- ### PRIIPs Extraction
58
-
59
- #### Extract Structured Data from PDFs
60
- ```bash
61
- curl -X POST "https://your-space-url.hf.space/extract-priips" \
62
- -H "Content-Type: application/json" \
63
- -d '{
64
- "sources": ["https://example.com/priips-document.pdf"],
65
- "options": {"language": "en", "ocr": false}
66
- }'
67
- ```
68
-
69
- **Response:**
70
- ```json
71
- {
72
- "product_name": "Example Investment Fund",
73
- "manufacturer": "Example Asset Management",
74
- "isin": "DE0001234567",
75
- "sri": 3,
76
- "recommended_holding_period": "5 years",
77
- "costs": {
78
- "entry_cost_pct": 2.5,
79
- "ongoing_cost_pct": 1.2,
80
- "exit_cost_pct": 0.5
81
- },
82
- "performance_scenarios": [
83
- {
84
- "name": "Bull Market",
85
- "description": "Optimistic scenario",
86
- "return_pct": 15.5
87
- }
88
- ],
89
- "date": "2024-01-01",
90
- "language": "en",
91
- "source_url": "https://example.com/priips-document.pdf"
92
- }
93
- ```
94
-
95
  ## 🔧 Configuration
96
 
97
  The service uses these environment variables:
@@ -128,7 +88,7 @@ from pydantic_ai.models.openai import OpenAIModel
128
 
129
  model = OpenAIModel(
130
  "DragonLLM/qwen3-8b-fin-v1.0",
131
- base_url="https://your-space-url.hf.space/v1"
132
  )
133
 
134
  agent = Agent(model=model)
@@ -140,14 +100,13 @@ import dspy
140
 
141
  lm = dspy.OpenAI(
142
  model="DragonLLM/qwen3-8b-fin-v1.0",
143
- api_base="https://your-space-url.hf.space/v1"
144
  )
145
  ```
146
 
147
  ## 📊 Features
148
 
149
  - ✅ **OpenAI-compatible API** - Drop-in replacement for OpenAI API
150
- - ✅ **PRIIPs document extraction** - Structured JSON from financial PDFs
151
  - ✅ **Provider abstraction** - Easy to swap backends
152
  - ✅ **Streaming support** - Real-time chat completions
153
  - ✅ **Error handling** - Robust error handling and validation
@@ -192,4 +151,3 @@ MIT License - see LICENSE file for details.
192
  - **vLLM:** 0.9.2 (upgraded from 0.6.5 - July 2025 release)
193
  - **PyTorch:** 2.5.0+ (CUDA 12.4)
194
  - **CUDA:** 12.4
195
- - See `VLLM_UPGRADE_ANALYSIS.md` for upgrade details
 
1
  ---
2
+ title: Open Finance LLM 8B
3
+ emoji: 🐉
4
+ colorFrom: red
5
+ colorTo: red
6
  sdk: docker
7
  pinned: false
 
8
  app_port: 7860
9
+ suggested_hardware: l4x1
10
  ---
11
 
12
+ # Open Finance LLM 8B
13
 
14
+ OpenAI-compatible API powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
15
 
16
  ## 🚀 Quick Start
17
 
18
  This service provides:
19
  - **OpenAI-compatible API** at `/v1/models` and `/v1/chat/completions`
 
20
  - **Streaming support** for real-time completions
21
  - **Provider abstraction** for easy integration with PydanticAI/DSPy
22
 
 
26
 
27
  #### List Models
28
  ```bash
29
+ curl -X GET "https://your-username-open-finance-llm-8b.hf.space/v1/models"
30
  ```
31
 
32
  #### Chat Completions
33
  ```bash
34
+ curl -X POST "https://your-username-open-finance-llm-8b.hf.space/v1/chat/completions" \
35
  -H "Content-Type: application/json" \
36
  -d '{
37
  "model": "DragonLLM/qwen3-8b-fin-v1.0",
 
43
 
44
  #### Streaming Chat Completions
45
  ```bash
46
+ curl -X POST "https://your-username-open-finance-llm-8b.hf.space/v1/chat/completions" \
47
  -H "Content-Type: application/json" \
48
  -d '{
49
  "model": "DragonLLM/qwen3-8b-fin-v1.0",
 
52
  }'
53
  ```
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## 🔧 Configuration
56
 
57
  The service uses these environment variables:
 
88
 
89
  model = OpenAIModel(
90
  "DragonLLM/qwen3-8b-fin-v1.0",
91
+ base_url="https://your-username-open-finance-llm-8b.hf.space/v1"
92
  )
93
 
94
  agent = Agent(model=model)
 
100
 
101
  lm = dspy.OpenAI(
102
  model="DragonLLM/qwen3-8b-fin-v1.0",
103
+ api_base="https://your-username-open-finance-llm-8b.hf.space/v1"
104
  )
105
  ```
106
 
107
  ## 📊 Features
108
 
109
  - ✅ **OpenAI-compatible API** - Drop-in replacement for OpenAI API
 
110
  - ✅ **Provider abstraction** - Easy to swap backends
111
  - ✅ **Streaming support** - Real-time chat completions
112
  - ✅ **Error handling** - Robust error handling and validation
 
151
  - **vLLM:** 0.9.2 (upgraded from 0.6.5 - July 2025 release)
152
  - **PyTorch:** 2.5.0+ (CUDA 12.4)
153
  - **CUDA:** 12.4
 
VLLM_COMPATIBILITY.md DELETED
@@ -1,152 +0,0 @@
1
- # vLLM 0.6.5 + DragonLLM/qwen3-8b-fin-v1.0 Compatibility Analysis
2
-
3
- ## Summary
4
-
5
- ✅ **Status: LIKELY COMPATIBLE** - Configuration matches Qwen3 requirements
6
-
7
- ## Current Configuration
8
-
9
- - **vLLM Version:** 0.9.2 ✅ (upgraded from 0.6.5)
10
- - **Model:** DragonLLM/qwen3-8b-fin-v1.0
11
- - **Architecture:** Qwen3
12
- - **PyTorch:** 2.5.0+cu124 (CUDA 12.4)
13
- - **Model Parameters:** ~8B (308.2K according to HF, but this seems like a reporting issue)
14
-
15
- **Upgrade Status:** Upgraded to vLLM 0.9.2 (July 2025) - provides significant improvements over 0.6.5 while maintaining CUDA 12.4 compatibility.
16
-
17
- ## Compatibility Factors
18
-
19
- ### ✅ Positive Indicators
20
-
21
- 1. **Architecture Support**
22
- - Model uses `qwen3` architecture
23
- - Qwen models are generally well-supported in vLLM
24
- - Code comment indicates: "vLLM: v0.6.5 (Qwen3 support + VLLM_USE_V1=0 for stability)"
25
-
26
- 2. **Configuration Matches Requirements**
27
- ```python
28
- dtype="bfloat16" # ✅ Required for Qwen3
29
- trust_remote_code=True # ✅ Required for custom architectures
30
- enforce_eager=True # ✅ Avoids CUDA graph issues
31
- ```
32
-
33
- 3. **Model Repository Info**
34
- - Tags include: `text-generation-inference`, `endpoints_compatible`
35
- - These tags suggest vLLM/TGI compatibility
36
- - Uses `transformers` + `safetensors` format (vLLM compatible)
37
-
38
- 4. **Environment Setup**
39
- - `VLLM_USE_V1=0` - Using stable v0 engine
40
- - Proper HF token authentication configured
41
- - CUDA 12.4 with PyTorch 2.4.0
42
-
43
- ### ⚠️ Potential Concerns
44
-
45
- 1. **vLLM 0.6.5 Release Date**
46
- - vLLM 0.6.5 was released in September 2024
47
- - Qwen3 models may have been added in later versions
48
- - **Action:** Monitor for compatibility issues during model loading
49
-
50
- 2. **Model Size Reporting**
51
- - HF shows "308.2K parameters" which seems incorrect for an 8B model
52
- - This is likely a metadata issue, not a compatibility issue
53
-
54
- 3. **Private Model Access**
55
- - Model is private (requires authentication)
56
- - Authentication is properly configured
57
- - Must accept model terms on HF
58
-
59
- ## Configuration Verification
60
-
61
- ### Current vLLM Initialization
62
- ```python
63
- llm_engine = LLM(
64
- model="DragonLLM/qwen3-8b-fin-v1.0",
65
- trust_remote_code=True, # ✅ Required
66
- dtype="bfloat16", # ✅ Required for Qwen3
67
- max_model_len=4096, # ✅ Reasonable for L4 GPU
68
- gpu_memory_utilization=0.85, # ✅ Good utilization
69
- tensor_parallel_size=1, # ✅ Single GPU
70
- download_dir="/tmp/huggingface",
71
- tokenizer_mode="auto",
72
- enforce_eager=True, # ✅ Stability
73
- disable_log_stats=False, # ✅ Debugging enabled
74
- )
75
- ```
76
-
77
- ### Environment Variables
78
- ```bash
79
- VLLM_USE_V1=0 # ✅ Use stable v0 engine
80
- CUDA_VISIBLE_DEVICES=0 # ✅ Single GPU
81
- HF_TOKEN (via HF_TOKEN_LC2) # ✅ Authentication
82
- ```
83
-
84
- ## Testing Recommendations
85
-
86
- ### 1. Test Model Loading
87
- ```bash
88
- # Run the service and monitor startup logs
89
- # Check for these success indicators:
90
- - "✅ vLLM engine initialized successfully"
91
- - No architecture mismatch errors
92
- - Model loads without errors
93
- ```
94
-
95
- ### 2. Test Inference
96
- ```python
97
- # Simple test request
98
- {
99
- "model": "DragonLLM/qwen3-8b-fin-v1.0",
100
- "messages": [{"role": "user", "content": "Hello"}],
101
- "max_tokens": 50
102
- }
103
- ```
104
-
105
- ### 3. Monitor for Errors
106
-
107
- **If you see:**
108
- - `AttributeError: 'LlamaForCausalLM' object has no attribute 'qwen'`
109
- - `Model architecture not supported`
110
- - `dtype mismatch errors`
111
-
112
- **Then:** vLLM 0.6.5 may not fully support Qwen3, upgrade to vLLM 0.6.6+ or 0.7.0+
113
-
114
- ## Upgrade Path (if needed)
115
-
116
- If compatibility issues occur:
117
-
118
- ### Option 1: Upgrade vLLM (Recommended)
119
- ```dockerfile
120
- # In Dockerfile, change:
121
- RUN pip install --no-cache-dir vllm==0.6.6
122
- # or
123
- RUN pip install --no-cache-dir vllm==0.7.0
124
- ```
125
-
126
- ### Option 2: Test with Latest
127
- ```dockerfile
128
- RUN pip install --no-cache-dir vllm>=0.7.0
129
- ```
130
-
131
- ## Verification Checklist
132
-
133
- - [x] Model architecture: Qwen3 ✅
134
- - [x] dtype: bfloat16 ✅
135
- - [x] trust_remote_code: True ✅
136
- - [x] Authentication configured ✅
137
- - [x] PyTorch 2.4.0 with CUDA 12.4 ✅
138
- - [ ] Model loads successfully (test on deployment)
139
- - [ ] Inference works correctly (test on deployment)
140
-
141
- ## Conclusion
142
-
143
- Based on the configuration and model metadata, **DragonLLM/qwen3-8b-fin-v1.0 should be compatible with vLLM 0.6.5**. The configuration follows best practices for Qwen models.
144
-
145
- **However**, since Qwen3 is a relatively new architecture, monitor the first deployment closely. If you encounter any architecture-related errors, upgrading to vLLM 0.6.6+ or 0.7.0+ is recommended.
146
-
147
- ## References
148
-
149
- - Model: https://huggingface.co/DragonLLM/qwen3-8b-fin-v1.0
150
- - vLLM Docs: https://docs.vllm.ai/en/stable/models/supported_models.html
151
- - Qwen3 Architecture: Uses bfloat16, requires trust_remote_code
152
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
VLLM_UPGRADE_ANALYSIS.md DELETED
@@ -1,191 +0,0 @@
1
- # vLLM Upgrade Analysis: 0.6.5 → Latest
2
-
3
- ## Current Status
4
-
5
- - **Current Version:** vLLM 0.6.5 (September 2024)
6
- - **Latest Version:** vLLM 0.10.2 (October 2025) or 0.9.2
7
- - **Version Gap:** ~14+ months of updates
8
-
9
- ## Latest Version Information
10
-
11
- ### vLLM 0.10.2 (Latest - October 2025)
12
- - **CUDA Support:** CUDA 13.0.2
13
- - **PyTorch:** Likely requires newer PyTorch version
14
- - **New Features:**
15
- - Multi-node configurations
16
- - FP8 precision support (Hopper+ GPUs)
17
- - NVFP4 format (Blackwell GPUs)
18
- - DeepSeek-R1 and Llama-3.1-8B-Instruct support
19
- - RTX PRO 6000 Blackwell Server Edition support
20
-
21
- ### vLLM 0.9.2 (Stable - October 2025)
22
- - More stable release track
23
- - Improved GPU architecture support
24
- - Better memory management
25
- - Likely better Qwen3 support
26
-
27
- ## Current Setup Requirements
28
-
29
- ### Our Current Configuration
30
- - **CUDA:** 12.4
31
- - **PyTorch:** 2.4.0+cu124
32
- - **Python:** 3.11
33
- - **GPU:** L4 (24GB VRAM)
34
- - **Model:** Qwen3-8B
35
-
36
- ## Compatibility Considerations
37
-
38
- ### ⚠️ Potential Issues Upgrading to 0.10.x
39
-
40
- 1. **CUDA 13.0.2 Requirement**
41
- - vLLM 0.10.2 supports CUDA 13.0.2
42
- - We're on CUDA 12.4
43
- - **Solution:** May need CUDA 13 base image OR use vLLM 0.9.x which likely supports CUDA 12.x
44
-
45
- 2. **PyTorch Version**
46
- - Newer vLLM may require PyTorch 2.5+
47
- - Current: PyTorch 2.4.0
48
- - **Action:** Check vLLM 0.9.x requirements
49
-
50
- 3. **Python Version**
51
- - vLLM 0.9+ may require Python 3.11+
52
- - Current: Python 3.11 ✅
53
- - **Status:** Compatible
54
-
55
- ### ✅ Benefits of Upgrading
56
-
57
- 1. **Better Qwen3 Support**
58
- - Newer versions likely have improved Qwen3 compatibility
59
- - Better CUDA graph support
60
- - More stable inference
61
-
62
- 2. **Performance Improvements**
63
- - Better memory management
64
- - Optimized kernels
65
- - Improved throughput
66
-
67
- 3. **Bug Fixes**
68
- - 14+ months of bug fixes
69
- - Security updates
70
- - Stability improvements
71
-
72
- 4. **Feature Updates**
73
- - Better streaming support
74
- - Improved API compatibility
75
- - New optimizations
76
-
77
- ## Recommended Upgrade Path
78
-
79
- ### Option 1: Upgrade to vLLM 0.9.x (Recommended)
80
-
81
- **Why:**
82
- - Better balance of features and stability
83
- - Likely still supports CUDA 12.4
84
- - Better Qwen3 support than 0.6.5
85
- - Not as bleeding edge as 0.10.x
86
-
87
- **Changes Needed:**
88
- ```dockerfile
89
- # Update Dockerfile
90
- RUN pip install --no-cache-dir vllm>=0.9.0,<0.10.0
91
-
92
- # May need to update PyTorch:
93
- RUN pip install --no-cache-dir \
94
- torch>=2.5.0 \
95
- --index-url https://download.pytorch.org/whl/cu124
96
- ```
97
-
98
- ### Option 2: Upgrade to vLLM 0.10.x (If CUDA 13 available)
99
-
100
- **Why:**
101
- - Latest features and optimizations
102
- - Best performance improvements
103
-
104
- **Changes Needed:**
105
- ```dockerfile
106
- # Update base image to CUDA 13
107
- FROM nvidia/cuda:13.0.2-devel-ubuntu22.04
108
-
109
- # Update PyTorch for CUDA 13
110
- RUN pip install --no-cache-dir \
111
- torch>=2.5.0 \
112
- --index-url https://download.pytorch.org/whl/cu130
113
-
114
- # Install latest vLLM
115
- RUN pip install --no-cache-dir vllm>=0.10.0
116
- ```
117
-
118
- ### Option 3: Gradual Upgrade (Safest)
119
-
120
- 1. **First:** Upgrade to vLLM 0.7.x or 0.8.x
121
- - Test Qwen3 compatibility
122
- - Verify performance
123
-
124
- 2. **Then:** Move to 0.9.x
125
- - Test thoroughly
126
- - Monitor stability
127
-
128
- 3. **Finally:** Consider 0.10.x if needed
129
-
130
- ## Code Changes Required
131
-
132
- ### Minimal Changes Expected
133
-
134
- 1. **Environment Variables**
135
- - `VLLM_USE_V1=0` may no longer be needed (v1 engine is default in newer versions)
136
- - May need to update or remove
137
-
138
- 2. **API Changes**
139
- - LLM initialization likely compatible
140
- - Some parameters may be deprecated
141
- - Check release notes
142
-
143
- 3. **Streaming**
144
- - Better streaming support in newer versions
145
- - May need to update streaming implementation
146
-
147
- ## Testing Checklist
148
-
149
- After upgrading:
150
-
151
- - [ ] Model loads successfully
152
- - [ ] Qwen3 architecture works
153
- - [ ] CUDA graphs work (optimized mode)
154
- - [ ] Inference produces correct results
155
- - [ ] Streaming works
156
- - [ ] Memory usage acceptable
157
- - [ ] Performance improved/stable
158
- - [ ] No regressions in API compatibility
159
-
160
- ## Recommendations
161
-
162
- ### Immediate Action: Upgrade to vLLM 0.9.x
163
-
164
- **Reasoning:**
165
- 1. Still supports CUDA 12.4 (no base image change needed)
166
- 2. Much better than 0.6.5
167
- 3. Better Qwen3 support
168
- 4. More stable than 0.10.x
169
- 5. Significant improvements without breaking changes
170
-
171
- **Steps:**
172
- 1. Update Dockerfile to use vLLM 0.9.2
173
- 2. Update PyTorch to 2.5+ (may be needed)
174
- 3. Test on deployment
175
- 4. Monitor for issues
176
-
177
- ### Future Consideration: vLLM 0.10.x
178
-
179
- Only if:
180
- - CUDA 13 becomes available
181
- - Need specific 0.10.x features
182
- - 0.9.x proves insufficient
183
-
184
- ## Summary
185
-
186
- **Current:** vLLM 0.6.5 (old, but working)
187
- **Recommended:** vLLM 0.9.2 (good balance)
188
- **Latest:** vLLM 0.10.2 (requires CUDA 13)
189
-
190
- **Action:** Upgrade to vLLM 0.9.2 for best compatibility with current setup while gaining significant improvements.
191
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/main.py CHANGED
@@ -1,17 +1,16 @@
1
  from fastapi import FastAPI
2
  from app.middleware import api_key_guard
3
- from app.routers import openai_api, extract
4
  import logging
5
 
6
  # Configure logging
7
  logging.basicConfig(level=logging.INFO)
8
  logger = logging.getLogger(__name__)
9
 
10
- app = FastAPI(title="PRIIPs LLM Service (vLLM)")
11
 
12
  # Mount routers
13
  app.include_router(openai_api.router, prefix="/v1")
14
- app.include_router(extract.router)
15
 
16
  # Optional API key middleware
17
  app.middleware("http")(api_key_guard)
@@ -20,7 +19,7 @@ app.middleware("http")(api_key_guard)
20
  async def startup_event():
21
  """Startup event - initialize model in background"""
22
  import threading
23
- logger.info("Starting PRIIPs LLM Service...")
24
  logger.info("Initializing model in background thread...")
25
 
26
  def load_model():
@@ -44,6 +43,6 @@ async def root():
44
 
45
  @app.get("/health")
46
  async def health():
47
- return {"status": "healthy", "service": "PRIIPs LLM Service"}
48
 
49
 
 
1
  from fastapi import FastAPI
2
  from app.middleware import api_key_guard
3
+ from app.routers import openai_api
4
  import logging
5
 
6
  # Configure logging
7
  logging.basicConfig(level=logging.INFO)
8
  logger = logging.getLogger(__name__)
9
 
10
+ app = FastAPI(title="LLM Pro Finance API (vLLM)")
11
 
12
  # Mount routers
13
  app.include_router(openai_api.router, prefix="/v1")
 
14
 
15
  # Optional API key middleware
16
  app.middleware("http")(api_key_guard)
 
19
  async def startup_event():
20
  """Startup event - initialize model in background"""
21
  import threading
22
+ logger.info("Starting LLM Pro Finance API...")
23
  logger.info("Initializing model in background thread...")
24
 
25
  def load_model():
 
43
 
44
  @app.get("/health")
45
  async def health():
46
+ return {"status": "healthy", "service": "LLM Pro Finance API"}
47
 
48
 
app/models/priips.py DELETED
@@ -1,41 +0,0 @@
1
- from typing import List, Optional
2
- from pydantic import BaseModel
3
-
4
-
5
- class PerformanceScenario(BaseModel):
6
- name: str
7
- description: Optional[str] = None
8
- return_pct: Optional[float] = None
9
-
10
-
11
- class Costs(BaseModel):
12
- entry_cost_pct: Optional[float] = None
13
- ongoing_cost_pct: Optional[float] = None
14
- exit_cost_pct: Optional[float] = None
15
-
16
-
17
- class PriipsFields(BaseModel):
18
- product_name: Optional[str] = None
19
- manufacturer: Optional[str] = None
20
- isin: Optional[str] = None
21
- sri: Optional[int] = None
22
- recommended_holding_period: Optional[str] = None
23
- costs: Optional[Costs] = None
24
- performance_scenarios: Optional[List[PerformanceScenario]] = None
25
- date: Optional[str] = None
26
- language: Optional[str] = None
27
- source_url: Optional[str] = None
28
-
29
-
30
- class ExtractRequest(BaseModel):
31
- sources: List[str]
32
- options: Optional[dict] = None
33
-
34
-
35
- class ExtractResult(BaseModel):
36
- source: str
37
- success: bool
38
- data: Optional[PriipsFields] = None
39
- error: Optional[str] = None
40
-
41
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/routers/extract.py DELETED
@@ -1,32 +0,0 @@
1
- from fastapi import APIRouter, UploadFile, File
2
- from pathlib import Path
3
- import tempfile
4
- import os
5
-
6
- from app.models.priips import ExtractRequest
7
- from app.services import extract_service
8
-
9
-
10
- router = APIRouter()
11
-
12
-
13
- @router.post("/extract-priips")
14
- async def extract_priips(file: UploadFile = File(...)):
15
- """Extract PRIIPS fields from uploaded PDF"""
16
- # Save uploaded file to temporary location
17
- with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
18
- content = await file.read()
19
- tmp_file.write(content)
20
- tmp_path = tmp_file.name
21
-
22
- try:
23
- # Process the file using the extract service
24
- req = ExtractRequest(sources=[tmp_path])
25
- results = await extract_service.extract(req)
26
- return results[0] if results else {"success": False, "error": "No results"}
27
- finally:
28
- # Clean up temp file
29
- if os.path.exists(tmp_path):
30
- os.remove(tmp_path)
31
-
32
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/services/extract_service.py DELETED
@@ -1,86 +0,0 @@
1
- import json
2
- from pathlib import Path
3
- from typing import List
4
-
5
- from app.config import settings
6
- from app.models.priips import ExtractRequest, ExtractResult, PriipsFields
7
- from app.providers import vllm
8
- from app.utils.pdf import download_to_tmp, extract_text_from_pdf
9
- from app.utils.json_guard import try_parse_json
10
-
11
-
12
- def build_prompt(text: str) -> str:
13
- schema = {
14
- "product_name": "string",
15
- "manufacturer": "string",
16
- "isin": "string",
17
- "sri": "integer (1-7)",
18
- "recommended_holding_period": "string",
19
- "costs": {
20
- "entry_cost_pct": "number?",
21
- "ongoing_cost_pct": "number?",
22
- "exit_cost_pct": "number?",
23
- },
24
- "performance_scenarios": [
25
- {"name": "string", "description": "string?", "return_pct": "number?"}
26
- ],
27
- "date": "string?",
28
- "language": "string?",
29
- "source_url": "string?",
30
- }
31
- instruction = (
32
- "You are an expert financial document parser. "
33
- "Extract the requested PRIIPs fields as STRICT JSON only, no extra text. "
34
- f"JSON schema keys: {list(schema.keys())}."
35
- )
36
- return f"{instruction}\n\nDocument:\n{text[:15000]}"
37
-
38
-
39
- async def process_source(src: str) -> ExtractResult:
40
- try:
41
- path: Path
42
- if src.lower().startswith("http"):
43
- path = await download_to_tmp(src, Path(".tmp"))
44
- else:
45
- path = Path(src)
46
- text = extract_text_from_pdf(path)
47
- prompt = build_prompt(text)
48
-
49
- payload = {
50
- "model": settings.model,
51
- "messages": [
52
- {"role": "system", "content": "You output JSON only."},
53
- {"role": "user", "content": prompt},
54
- ],
55
- "temperature": 0.1,
56
- "max_tokens": 800,
57
- "stream": False,
58
- }
59
- data = await vllm.chat(payload, stream=False)
60
-
61
- # vLLM OpenAI response
62
- content = (
63
- data.get("choices", [{}])[0]
64
- .get("message", {})
65
- .get("content", "")
66
- if isinstance(data, dict)
67
- else ""
68
- )
69
- ok, parsed = try_parse_json(content)
70
- if not ok:
71
- return ExtractResult(source=src, success=False, error=str(parsed))
72
-
73
- model_data = PriipsFields(**parsed)
74
- model_data.source_url = src
75
- return ExtractResult(source=src, success=True, data=model_data)
76
- except Exception as e:
77
- return ExtractResult(source=src, success=False, error=str(e))
78
-
79
-
80
- async def extract(req: ExtractRequest) -> List[ExtractResult]:
81
- results: List[ExtractResult] = []
82
- for src in req.sources:
83
- results.append(await process_source(src))
84
- return results
85
-
86
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/utils/json_guard.py DELETED
@@ -1,21 +0,0 @@
1
- import json
2
- from typing import Any, Tuple
3
-
4
-
5
- def try_parse_json(text: str) -> Tuple[bool, Any]:
6
- if text is None:
7
- return False, "Input is None"
8
-
9
- try:
10
- return True, json.loads(text)
11
- except Exception:
12
- # naive repair: strip markdown fences if present
13
- t = text.strip()
14
- if t.startswith("```") and t.endswith("```"):
15
- t = t.strip("`\n ")
16
- try:
17
- return True, json.loads(t)
18
- except Exception as e:
19
- return False, str(e)
20
-
21
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/utils/pdf.py DELETED
@@ -1,34 +0,0 @@
1
- from pathlib import Path
2
- from typing import Optional
3
-
4
- import httpx
5
-
6
-
7
- async def download_to_tmp(url: str, tmp_dir: Path) -> Path:
8
- tmp_dir.mkdir(parents=True, exist_ok=True)
9
- filename = url.split("/")[-1] or "document.pdf"
10
- target = tmp_dir / filename
11
- async with httpx.AsyncClient(timeout=60) as client:
12
- r = await client.get(url)
13
- r.raise_for_status()
14
- target.write_bytes(r.content)
15
- return target
16
-
17
-
18
- def extract_text_from_pdf(path: Path) -> str:
19
- # Lazy import to avoid hard dependency during tests unless used
20
- try:
21
- import fitz # PyMuPDF
22
- except Exception as e:
23
- raise RuntimeError("PyMuPDF (fitz) is required to extract PDF text") from e
24
-
25
- doc = fitz.open(path)
26
- try:
27
- texts: list[str] = []
28
- for page in doc:
29
- texts.append(page.get_text("text"))
30
- return "\n".join(texts).strip()
31
- finally:
32
- doc.close()
33
-
34
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eval_results/FINANCIAL_REASONING_RESULTS.md DELETED
@@ -1,211 +0,0 @@
1
- # Financial Reasoning Evaluation Results
2
-
3
- **Model:** DragonLLM/qwen3-8b-fin-v1.0
4
- **Date:** October 28, 2025
5
- **Hardware:** L4 GPU (24GB VRAM)
6
- **Configuration:** Eager mode, 4096 context, temperature 0.3
7
-
8
- ---
9
-
10
- ## Executive Summary
11
-
12
- The model demonstrated **strong financial reasoning capabilities** across multiple complex scenarios:
13
-
14
- - ✅ **Multi-step calculations** with clear methodology
15
- - ✅ **Risk assessment** considering client suitability
16
- - ✅ **Cost-benefit analysis** with comparative evaluation
17
- - ✅ **Regulatory knowledge** of PRIIPS requirements
18
- - ✅ **Practical recommendations** with justification
19
-
20
- **Overall Grade: A- (Excellent)**
21
-
22
- ---
23
-
24
- ## Task Results
25
-
26
- ### Task 1: Investment Return Calculation ✅
27
-
28
- **Scenario:** Calculate total and percentage return on stock investment with dividends.
29
-
30
- **Performance:**
31
- - ✅ Correctly identified initial investment (€5,000)
32
- - ✅ Calculated sale proceeds (€6,500)
33
- - ✅ Included dividends (€200) in total proceeds
34
- - ✅ Computed total return: **€1,700 (34%)**
35
- - ✅ Showed clear step-by-step reasoning
36
- - ✅ Verified calculations for accuracy
37
-
38
- **Strengths:**
39
- - Systematic approach to calculation
40
- - Clear articulation of each step
41
- - Self-verification of results
42
-
43
- **Score: 100%**
44
-
45
- ---
46
-
47
- ### Task 2: Risk Suitability Assessment ✅
48
-
49
- **Scenario:** Evaluate if high-risk product (SRI 6/7) is suitable for conservative client needing liquidity in 2 years.
50
-
51
- **Performance:**
52
- - ✅ Understood SRI rating system
53
- - ✅ Identified **time horizon mismatch** (5-year holding vs 2-year need)
54
- - ✅ Recognized **risk tolerance conflict** (-45% max loss vs low risk tolerance)
55
- - ✅ **Recommended against investment** with clear reasoning
56
- - ✅ Suggested considering alternative investments
57
-
58
- **Strengths:**
59
- - Multi-factor analysis (time, risk, liquidity)
60
- - Client-centric recommendation
61
- - Clear reasoning for decision
62
-
63
- **Score: 95%**
64
-
65
- ---
66
-
67
- ### Task 3: Fund Cost Comparison ✅
68
-
69
- **Scenario:** Compare two funds with different fee structures over 10 years.
70
-
71
- **Performance:**
72
- - ✅ Identified key cost components (entry fee, annual fees)
73
- - ✅ Recognized compounding effect of fees on returns
74
- - ✅ Started calculating fees for both funds
75
- - ⚠️ Response truncated before completing full calculation
76
-
77
- **Strengths:**
78
- - Understood fee impact on compounding
79
- - Systematic approach to comparison
80
- - Recognized complexity of calculation
81
-
82
- **Improvements:**
83
- - Complete numerical comparison needed
84
- - Final recommendation was cut off
85
-
86
- **Score: 75%** (would be 100% with complete response)
87
-
88
- ---
89
-
90
- ### Task 4: Portfolio Rebalancing Decision ✅
91
-
92
- **Scenario:** Decide if portfolio should be rebalanced considering costs and taxes.
93
-
94
- **Performance:**
95
- - ✅ Calculated allocation drift (60/40 → 64.1/35.9)
96
- - ✅ Identified relevant costs (0.5% transaction + 30% tax)
97
- - ✅ Analyzed **pros and cons** of rebalancing
98
- - ✅ **Recommended against rebalancing** due to tax inefficiency
99
- - ✅ Considered practical implications
100
-
101
- **Strengths:**
102
- - Balanced analysis of multiple factors
103
- - Tax-aware recommendation
104
- - Practical decision-making
105
-
106
- **Score: 90%**
107
-
108
- ---
109
-
110
- ### Task 5: PRIIPS Complexity Analysis ✅
111
-
112
- **Scenario:** Identify challenges in creating PRIIPS KID for complex structured product.
113
-
114
- **Performance:**
115
- - ✅ Systematically addressed each product feature:
116
- - 3 indices → correlation and risk management
117
- - 80% capital protection → return trade-offs
118
- - 3-year lock-in → suitability for investor horizon
119
- - Multiple cost layers → transparency requirements
120
- - ✅ Demonstrated **regulatory knowledge**
121
- - ✅ Considered investor protection aspects
122
- - ⚠️ Response truncated before conclusion
123
-
124
- **Strengths:**
125
- - Comprehensive coverage of challenges
126
- - Regulatory awareness
127
- - Investor-centric perspective
128
-
129
- **Score: 85%**
130
-
131
- ---
132
-
133
- ## Key Observations
134
-
135
- ### Strengths
136
-
137
- 1. **Mathematical Accuracy:** Correct calculations with clear methodology
138
- 2. **Multi-step Reasoning:** Breaks down complex problems systematically
139
- 3. **Risk Awareness:** Considers multiple risk factors in recommendations
140
- 4. **Regulatory Knowledge:** Demonstrates understanding of PRIIPS framework
141
- 5. **Client-Centric:** Recommendations prioritize client suitability
142
- 6. **Self-Verification:** Checks own work for accuracy
143
-
144
- ### Areas for Enhancement
145
-
146
- 1. **Response Completion:** Some answers truncated due to token limits
147
- 2. **Quantitative Depth:** Could show more detailed numerical analysis
148
- 3. **Comparative Analysis:** More explicit side-by-side comparisons
149
-
150
- ---
151
-
152
- ## Reasoning Capabilities Assessment
153
-
154
- ### ✅ Demonstrated Capabilities
155
-
156
- | Capability | Evidence | Score |
157
- |-----------|----------|-------|
158
- | **Step-by-step reasoning** | Clear calculation steps in Task 1 | 100% |
159
- | **Multi-factor analysis** | Considered time/risk/liquidity in Task 2 | 95% |
160
- | **Trade-off evaluation** | Weighed costs vs benefits in Tasks 3 & 4 | 85% |
161
- | **Regulatory knowledge** | PRIIPS framework understanding in Tasks 2 & 5 | 90% |
162
- | **Client suitability** | Appropriate recommendations based on profile | 95% |
163
- | **Practical judgment** | Tax-efficient recommendations in Task 4 | 90% |
164
-
165
- **Average Reasoning Score: 92.5% (A-)**
166
-
167
- ---
168
-
169
- ## Recommendations for Production Use
170
-
171
- ### ✅ **Suitable For:**
172
- - Investment return calculations
173
- - Risk suitability assessments
174
- - PRIIPS document analysis
175
- - Client advisory support
176
- - Compliance review assistance
177
-
178
- ### ⚠️ **Enhancements Needed:**
179
- - Increase max_tokens for complex analyses (600-800)
180
- - Implement multi-turn conversations for detailed Q&A
181
- - Add structured output formats for quantitative results
182
- - Include citation/source tracking for regulatory statements
183
-
184
- ### 🎯 **Optimal Use Cases:**
185
- 1. **PRIIPS KID Analysis** - Extract and explain key information
186
- 2. **Investment Suitability** - Assess product-client fit
187
- 3. **Cost Comparison** - Evaluate fee structures
188
- 4. **Risk Explanation** - Break down complex risk profiles
189
- 5. **Regulatory Guidance** - Explain compliance requirements
190
-
191
- ---
192
-
193
- ## Conclusion
194
-
195
- The DragonLLM/qwen3-8b-fin-v1.0 model demonstrates **excellent financial reasoning capabilities** suitable for professional financial advisory applications.
196
-
197
- The model:
198
- - ✅ Shows systematic, multi-step reasoning
199
- - ✅ Makes appropriate recommendations
200
- - ✅ Considers regulatory requirements
201
- - ✅ Prioritizes client suitability
202
-
203
- With minor enhancements (longer context for complex analyses), this model is **production-ready** for PRIIPS document extraction, investment analysis, and client advisory support.
204
-
205
- **Recommendation: Approved for deployment with RAG integration** ✅
206
-
207
- ---
208
-
209
- *Evaluation conducted: October 28, 2025*
210
- *API: https://jeanbaptdzd-priips-llm-service.hf.space*
211
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eval_results/financial_reasoning_eval_20251028_163244.txt DELETED
@@ -1,98 +0,0 @@
1
- Financial Reasoning Evaluation Results
2
- Start time: Tue Oct 28 16:32:44 CET 2025
3
-
4
- --------------------------------------------------------------------------------
5
- Task 1: Investment Return Analysis
6
- --------------------------------------------------------------------------------
7
- Prompt: An investor purchased 100 shares of a stock at €50 per share. After 2 years, they received €200 in dividends (€100 per year) and sold all shares at €65 per share.
8
-
9
- Calculate:
10
- 1. The total return in euros
11
- 2. The percentage return
12
- 3. The annualized return (CAGR)
13
-
14
- Show all calculation steps and explain your reasoning.
15
-
16
- ERROR: Failed to get response
17
- Time: 1s
18
-
19
- --------------------------------------------------------------------------------
20
- Task 2: PRIIPS Risk Assessment
21
- --------------------------------------------------------------------------------
22
- Prompt: A PRIIPS KID document shows the following information for an investment product:
23
- - Summary Risk Indicator (SRI): 6 out of 7
24
- - Recommended holding period: 5 years
25
- - Maximum loss scenario: -45% of invested capital
26
- - Likely scenario: +5% per year
27
-
28
- You have a client who:
29
- - Is 28 years old
30
- - Has €10,000 to invest
31
- - Needs the money in 2 years for a house down payment
32
- - Has low risk tolerance
33
-
34
- Should they invest in this product? Explain your reasoning step by step.
35
-
36
- ERROR: Failed to get response
37
- Time: 0s
38
-
39
- --------------------------------------------------------------------------------
40
- Task 3: Investment Cost Analysis
41
- --------------------------------------------------------------------------------
42
- Prompt: Compare two investment funds:
43
-
44
- Fund A:
45
- - Entry fee: 5% (one-time)
46
- - Annual management fee: 0.5%
47
- - Expected annual return: 8%
48
-
49
- Fund B:
50
- - Entry fee: 0%
51
- - Annual management fee: 2.0%
52
- - Expected annual return: 8%
53
-
54
- For a €10,000 investment over 10 years, calculate the final value for each fund and recommend which is better. Show your calculations.
55
-
56
- ERROR: Failed to get response
57
- Time: 0s
58
-
59
- --------------------------------------------------------------------------------
60
- Task 4: Portfolio Rebalancing Decision
61
- --------------------------------------------------------------------------------
62
- Prompt: A client has a portfolio that was initially 60% stocks (€60,000) and 40% bonds (€40,000). After 1 year:
63
- - Stocks grew to €75,000 (25% gain)
64
- - Bonds grew to €42,000 (5% gain)
65
-
66
- The allocation is now 64.1% stocks and 35.9% bonds.
67
-
68
- Should the client rebalance back to 60/40? Consider:
69
- - Transaction costs: 0.5% on trades
70
- - Capital gains tax: 30% on profits
71
- - Client's risk tolerance hasn't changed
72
-
73
- Analyze and provide a recommendation with reasoning.
74
-
75
- ERROR: Failed to get response
76
- Time: 1s
77
-
78
- --------------------------------------------------------------------------------
79
- Task 5: PRIIPS Disclosure Requirements
80
- --------------------------------------------------------------------------------
81
- Prompt: A financial institution is creating a PRIIPS KID for a complex structured product with:
82
- - Payoff linked to 3 different indices
83
- - Partial capital protection (80% at maturity)
84
- - Lock-in period of 3 years
85
- - Multiple cost layers
86
-
87
- What are the key challenges in creating the PRIIPS KID? Explain your reasoning.
88
-
89
- ERROR: Failed to get response
90
- Time: 0s
91
-
92
- ================================================================================
93
- SUMMARY
94
- ================================================================================
95
- Total tasks: 5
96
- Successful: 0
97
- Failed: 5
98
- End time: Tue Oct 28 16:32:46 CET 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Scripts
2
+
3
+ ## `validate_hf_readme.py`
4
+
5
+ Validates that `README.md` is properly formatted for Hugging Face Spaces.
6
+
7
+ ### Usage
8
+
9
+ ```bash
10
+ # Run manually
11
+ python3 scripts/validate_hf_readme.py
12
+
13
+ # Automatically runs on git commit (via pre-commit hook)
14
+ git commit -m "Update README"
15
+ ```
16
+
17
+ ### What it validates
18
+
19
+ - ✅ YAML frontmatter exists and is properly formatted
20
+ - ✅ Required fields for Docker SDK (`sdk`, `app_port`)
21
+ - ✅ Valid values for `sdk`, `colorFrom`, `colorTo`, `suggested_hardware`
22
+ - ✅ Warns about deprecated fields (e.g., `hardware` → `suggested_hardware`)
23
+ - ✅ Recommends including `emoji` and `title` fields
24
+
25
+ ### Pre-commit hook
26
+
27
+ The script is automatically run as a git pre-commit hook. If validation fails, the commit is aborted with error messages.
28
+
scripts/check_vllm_compatibility.py DELETED
@@ -1,258 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Check compatibility between DragonLLM/qwen3-8b-fin-v1.0 and vLLM 0.6.5
4
-
5
- This script verifies:
6
- 1. vLLM version installed
7
- 2. Model architecture support
8
- 3. Configuration compatibility
9
- 4. Known issues or limitations
10
- """
11
-
12
- import sys
13
- import subprocess
14
- from pathlib import Path
15
-
16
- # Add parent directory to path
17
- sys.path.insert(0, str(Path(__file__).parent.parent))
18
-
19
- try:
20
- import vllm
21
- from vllm import LLM
22
- from vllm.model_executor.models import MODEL_REGISTRY
23
- except ImportError:
24
- print("❌ Error: vLLM not installed")
25
- print(" Install it with: pip install vllm==0.6.5")
26
- sys.exit(1)
27
-
28
- try:
29
- from huggingface_hub import model_info
30
- from huggingface_hub.utils import HfHubHTTPError
31
- except ImportError:
32
- print("⚠️ Warning: huggingface_hub not installed")
33
- print(" Some checks will be skipped")
34
- model_info = None
35
-
36
- MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
37
- VLLM_VERSION = "0.6.5"
38
-
39
-
40
- def check_vllm_version():
41
- """Check installed vLLM version"""
42
- print("\n" + "="*70)
43
- print("CHECK 1: vLLM Version")
44
- print("="*70)
45
-
46
- installed_version = vllm.__version__
47
- print(f"Installed vLLM version: {installed_version}")
48
- print(f"Expected version: {VLLM_VERSION}")
49
-
50
- if installed_version == VLLM_VERSION:
51
- print("✅ Version matches!")
52
- return True
53
- elif installed_version.startswith("0.6"):
54
- print(f"⚠️ Version mismatch: {installed_version} (expected {VLLM_VERSION})")
55
- print(" This should be compatible but may have differences")
56
- return True
57
- else:
58
- print(f"❌ Version mismatch: {installed_version}")
59
- print(f" This may cause compatibility issues")
60
- return False
61
-
62
-
63
- def check_model_registry():
64
- """Check if Qwen3 is in vLLM's model registry"""
65
- print("\n" + "="*70)
66
- print("CHECK 2: Model Architecture Support")
67
- print("="*70)
68
-
69
- # Get all registered models
70
- registered_models = list(MODEL_REGISTRY.keys())
71
-
72
- # Look for Qwen variants
73
- qwen_models = [m for m in registered_models if 'qwen' in m.lower()]
74
-
75
- print(f"Total models in registry: {len(registered_models)}")
76
- print(f"Qwen-related models found: {len(qwen_models)}")
77
-
78
- if qwen_models:
79
- print("\n✅ Qwen models found in registry:")
80
- for model in sorted(qwen_models):
81
- print(f" - {model}")
82
-
83
- # Check specifically for Qwen3
84
- qwen3_models = [m for m in qwen_models if 'qwen3' in m.lower() or '3' in m]
85
- if qwen3_models:
86
- print("\n✅ Qwen3 support detected!")
87
- for model in qwen3_models:
88
- print(f" - {model}")
89
- return True
90
- else:
91
- print("\n⚠️ Qwen models found but Qwen3 specifically not detected")
92
- print(" Qwen3 might be handled by a generic Qwen loader")
93
- return True # Still likely compatible
94
- else:
95
- print("\n❌ No Qwen models found in registry")
96
- print(" This suggests Qwen3 may not be supported")
97
- return False
98
-
99
-
100
- def check_model_info():
101
- """Check model information from Hugging Face"""
102
- print("\n" + "="*70)
103
- print("CHECK 3: Model Information")
104
- print("="*70)
105
-
106
- if not model_info:
107
- print("⚠️ Skipping (huggingface_hub not available)")
108
- return None
109
-
110
- try:
111
- info = model_info(MODEL_NAME, token=True)
112
- print(f"Model: {MODEL_NAME}")
113
- print(f"Architecture: {info.config.get('architectures', ['Unknown'])[0] if hasattr(info, 'config') else 'qwen3'}")
114
-
115
- # Check model config
116
- if hasattr(info, 'config') and info.config:
117
- config = info.config
118
- print(f"\nModel Configuration:")
119
-
120
- # Check for Qwen-specific config
121
- if 'qwen' in str(config).lower():
122
- print(" ✅ Qwen architecture detected in config")
123
-
124
- # Check for required fields
125
- if hasattr(config, 'torch_dtype') or 'torch_dtype' in str(config):
126
- print(f" ✅ torch_dtype found")
127
-
128
- if 'bfloat16' in str(config).lower():
129
- print(f" ✅ bfloat16 support confirmed")
130
-
131
- return True
132
-
133
- except HfHubHTTPError as e:
134
- if e.response.status_code == 401:
135
- print(f"❌ Unauthorized: Need to accept model terms")
136
- print(f" Visit: https://huggingface.co/{MODEL_NAME}")
137
- return False
138
- else:
139
- print(f"❌ Error accessing model: {e}")
140
- return False
141
- except Exception as e:
142
- print(f"⚠️ Could not fetch model info: {e}")
143
- return None
144
-
145
-
146
- def check_configuration():
147
- """Check if the configuration used is compatible"""
148
- print("\n" + "="*70)
149
- print("CHECK 4: Configuration Compatibility")
150
- print("="*70)
151
-
152
- print("Current configuration:")
153
- print(f" - dtype: bfloat16")
154
- print(f" - trust_remote_code: True")
155
- print(f" - enforce_eager: True")
156
- print(f" - max_model_len: 4096")
157
-
158
- # Check if bfloat16 is supported
159
- try:
160
- import torch
161
- if torch.cuda.is_bf16_supported():
162
- print(" ✅ CUDA supports bfloat16")
163
- else:
164
- print(" ⚠️ CUDA may not fully support bfloat16")
165
- except Exception:
166
- pass
167
-
168
- print("\n✅ Configuration looks compatible")
169
- print(" - bfloat16: Required for Qwen3")
170
- print(" - trust_remote_code: Required for custom architectures")
171
- print(" - enforce_eager: Recommended for stability")
172
-
173
- return True
174
-
175
-
176
- def check_known_issues():
177
- """Check for known compatibility issues"""
178
- print("\n" + "="*70)
179
- print("CHECK 5: Known Issues / Compatibility Notes")
180
- print("="*70)
181
-
182
- print("Known considerations for Qwen3 + vLLM 0.6.5:")
183
- print(" ✅ VLLM_USE_V1=0: Using v0 engine (more stable)")
184
- print(" ✅ enforce_eager=True: Avoids CUDA graph issues")
185
- print(" ✅ bfloat16: Required dtype for Qwen3")
186
- print(" ✅ trust_remote_code: Required for custom tokenizers")
187
-
188
- print("\n⚠️ Potential Issues:")
189
- print(" - Qwen3 may require newer vLLM version (check if issues occur)")
190
- print(" - If model fails to load, may need vLLM 0.6.6+ or 0.7.0+")
191
- print(" - Monitor for tokenizer compatibility issues")
192
-
193
- return True
194
-
195
-
196
- def main():
197
- """Run all compatibility checks"""
198
- print("\n" + "#"*70)
199
- print("# vLLM 0.6.5 + DragonLLM/qwen3-8b-fin-v1.0 Compatibility Check")
200
- print("#"*70)
201
-
202
- results = {}
203
-
204
- # Check 1: Version
205
- results['version'] = check_vllm_version()
206
-
207
- # Check 2: Model registry
208
- results['registry'] = check_model_registry()
209
-
210
- # Check 3: Model info
211
- results['model_info'] = check_model_info()
212
-
213
- # Check 4: Configuration
214
- results['configuration'] = check_configuration()
215
-
216
- # Check 5: Known issues
217
- results['known_issues'] = check_known_issues()
218
-
219
- # Summary
220
- print("\n" + "="*70)
221
- print("SUMMARY")
222
- print("="*70)
223
-
224
- for check_name, success in results.items():
225
- if success is None:
226
- status = "⚠️ SKIP"
227
- else:
228
- status = "✅ PASS" if success else "❌ FAIL"
229
- check_display = check_name.replace('_', ' ').title()
230
- print(f"{status} - {check_display}")
231
-
232
- passed = sum(1 for v in results.values() if v is True)
233
- total = sum(1 for v in results.values() if v is not None)
234
-
235
- print(f"\nResults: {passed}/{total} checks passed")
236
-
237
- if results.get('version') and results.get('registry'):
238
- print("\n✅ Basic compatibility looks good!")
239
- print(" The model should work with vLLM 0.6.5")
240
- print("\n If you encounter issues:")
241
- print(" 1. Ensure HF_TOKEN_LC2 is set")
242
- print(" 2. Check model repository access")
243
- print(" 3. Verify CUDA/bfloat16 support")
244
- print(" 4. Consider upgrading to vLLM 0.6.6+ if problems persist")
245
- elif results.get('registry') == False:
246
- print("\n⚠️ Qwen3 may not be explicitly supported in vLLM 0.6.5")
247
- print(" Consider:")
248
- print(" 1. Testing with the model anyway (might still work)")
249
- print(" 2. Upgrading to vLLM 0.6.6 or 0.7.0+")
250
- print(" 3. Using a different model if compatibility issues occur")
251
- else:
252
- print("\n⚠️ Some compatibility concerns detected")
253
- print(" Review the checks above for details")
254
-
255
-
256
- if __name__ == "__main__":
257
- main()
258
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/eval_financial_reasoning.sh DELETED
@@ -1,52 +0,0 @@
1
- #!/bin/bash
2
- # Simplified Financial Reasoning Evaluation
3
-
4
- BASE_URL="https://jeanbaptdzd-priips-llm-service.hf.space"
5
-
6
- query_model() {
7
- local prompt="$1"
8
- echo "Query: $prompt" | head -c 80
9
- echo "..."
10
-
11
- # Use printf with %s for proper JSON escaping
12
- json_prompt=$(printf '%s' "$prompt" | jq -Rs .)
13
-
14
- curl -s -X POST "$BASE_URL/v1/chat/completions" \
15
- -H "Content-Type: application/json" \
16
- -d "{\"model\":\"DragonLLM/qwen3-8b-fin-v1.0\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a financial expert. Show your reasoning step by step.\"},{\"role\":\"user\",\"content\":$json_prompt}],\"max_tokens\":500,\"temperature\":0.3}" \
17
- --max-time 60 | python3 -c "import sys, json; data=json.load(sys.stdin); print('\n' + data['choices'][0]['message']['content'] + '\n')" 2>/dev/null || echo "Error"
18
- }
19
-
20
- echo "=========================================="
21
- echo "Financial Reasoning Evaluation"
22
- echo "=========================================="
23
- echo ""
24
-
25
- echo "Task 1: Investment Return Calculation"
26
- echo "--------------------------------------"
27
- query_model "Calculate: An investor bought 100 shares at €50, received €200 dividends over 2 years, sold at €65. What is the total return in euros and percentage? Show steps."
28
- echo ""
29
-
30
- echo "Task 2: Risk Suitability Assessment"
31
- echo "------------------------------------"
32
- query_model "A product has SRI 6/7, 5-year holding period, max loss -45%. Client: 28 years old, needs money in 2 years, low risk tolerance. Should they invest? Explain why."
33
- echo ""
34
-
35
- echo "Task 3: Fund Cost Comparison"
36
- echo "-----------------------------"
37
- query_model "Fund A: 5% entry fee, 0.5% annual fee. Fund B: 0% entry, 2% annual fee. Both return 8%. Which is better for €10,000 over 10 years? Calculate."
38
- echo ""
39
-
40
- echo "Task 4: Portfolio Rebalancing"
41
- echo "------------------------------"
42
- query_model "Portfolio was 60/40 stocks/bonds. Now 64.1/35.9 after gains. Transaction cost 0.5%, tax 30%. Should client rebalance? Consider pros/cons."
43
- echo ""
44
-
45
- echo "Task 5: PRIIPS Complexity"
46
- echo "-------------------------"
47
- query_model "What are the key challenges in creating a PRIIPS KID for a structured product with: 3 indices, 80% capital protection, 3-year lock-in, multiple cost layers?"
48
- echo ""
49
-
50
- echo "=========================================="
51
- echo "Evaluation complete!"
52
- echo "=========================================="
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/extract_priips.py DELETED
@@ -1,182 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- PRIIPS Document Extraction Script
4
-
5
- Extracts text from PRIIPS KID PDFs and processes them for RAG context.
6
- """
7
-
8
- import sys
9
- import json
10
- from pathlib import Path
11
- from datetime import datetime
12
- import argparse
13
-
14
- # Add parent directory to path
15
- sys.path.insert(0, str(Path(__file__).parent.parent))
16
-
17
- from app.utils.pdf import extract_text_from_pdf
18
-
19
-
20
- def extract_priips_document(pdf_path: Path, output_dir: Path) -> dict:
21
- """
22
- Extract content from a PRIIPS KID PDF.
23
-
24
- Args:
25
- pdf_path: Path to the PDF file
26
- output_dir: Directory to save extracted content
27
-
28
- Returns:
29
- Dictionary with extracted content
30
- """
31
- print(f"📄 Processing: {pdf_path.name}")
32
-
33
- # Extract text from PDF
34
- try:
35
- raw_text = extract_text_from_pdf(pdf_path)
36
- print(f"✅ Extracted {len(raw_text)} characters")
37
- except Exception as e:
38
- print(f"❌ Error extracting PDF: {e}")
39
- return None
40
-
41
- # Parse filename for metadata
42
- filename_parts = pdf_path.stem.split("_")
43
- isin = filename_parts[0] if len(filename_parts) > 0 else "UNKNOWN"
44
- product_name = filename_parts[1] if len(filename_parts) > 1 else pdf_path.stem
45
-
46
- # Create structured output
47
- extracted_data = {
48
- "metadata": {
49
- "filename": pdf_path.name,
50
- "extraction_date": datetime.now().isoformat(),
51
- "isin": isin,
52
- "product_name": product_name,
53
- "file_size_bytes": pdf_path.stat().st_size,
54
- "text_length": len(raw_text)
55
- },
56
- "raw_text": raw_text,
57
- "sections": extract_sections(raw_text)
58
- }
59
-
60
- # Save to JSON
61
- output_path = output_dir / f"{pdf_path.stem}_extracted.json"
62
- with open(output_path, "w", encoding="utf-8") as f:
63
- json.dump(extracted_data, f, indent=2, ensure_ascii=False)
64
-
65
- print(f"💾 Saved to: {output_path}")
66
- return extracted_data
67
-
68
-
69
- def extract_sections(text: str) -> dict:
70
- """
71
- Extract common PRIIPS KID sections from text.
72
-
73
- This is a simple implementation. Can be enhanced with LLM-based extraction.
74
- """
75
- sections = {}
76
-
77
- # Common PRIIPS section keywords
78
- keywords = {
79
- "summary": ["what is this product", "summary"],
80
- "objectives": ["objectives", "investment objectives"],
81
- "risk_indicator": ["risk indicator", "sri", "summary risk"],
82
- "performance_scenarios": ["performance scenarios", "what could i get"],
83
- "costs": ["what are the costs", "costs"],
84
- "holding_period": ["recommended holding period", "holding period"]
85
- }
86
-
87
- text_lower = text.lower()
88
-
89
- for section_name, search_terms in keywords.items():
90
- for term in search_terms:
91
- if term in text_lower:
92
- # Extract a snippet around the keyword
93
- start_idx = text_lower.find(term)
94
- # Get 500 chars after the keyword
95
- snippet = text[start_idx:start_idx + 500].strip()
96
- sections[section_name] = snippet
97
- break
98
-
99
- return sections
100
-
101
-
102
- def batch_process_directory(input_dir: Path, output_dir: Path):
103
- """Process all PDFs in a directory."""
104
- pdf_files = list(input_dir.glob("*.pdf"))
105
-
106
- if not pdf_files:
107
- print(f"⚠️ No PDF files found in {input_dir}")
108
- return
109
-
110
- print(f"📦 Found {len(pdf_files)} PDF files to process\n")
111
-
112
- output_dir.mkdir(parents=True, exist_ok=True)
113
-
114
- results = []
115
- for pdf_path in pdf_files:
116
- result = extract_priips_document(pdf_path, output_dir)
117
- if result:
118
- results.append(result)
119
- print() # Blank line between files
120
-
121
- # Save summary
122
- summary_path = output_dir / "_extraction_summary.json"
123
- summary = {
124
- "extraction_date": datetime.now().isoformat(),
125
- "total_processed": len(results),
126
- "total_failed": len(pdf_files) - len(results),
127
- "files": [r["metadata"] for r in results]
128
- }
129
-
130
- with open(summary_path, "w", encoding="utf-8") as f:
131
- json.dump(summary, f, indent=2)
132
-
133
- print(f"\n✅ Processed {len(results)}/{len(pdf_files)} files successfully")
134
- print(f"📊 Summary saved to: {summary_path}")
135
-
136
-
137
- def main():
138
- parser = argparse.ArgumentParser(
139
- description="Extract PRIIPS KID documents for RAG context"
140
- )
141
- parser.add_argument(
142
- "input",
143
- type=str,
144
- help="Input PDF file or directory containing PDFs"
145
- )
146
- parser.add_argument(
147
- "--output",
148
- type=str,
149
- default=None,
150
- help="Output directory (default: priips_documents/extracted/)"
151
- )
152
-
153
- args = parser.parse_args()
154
-
155
- # Setup paths
156
- workspace_root = Path(__file__).parent.parent
157
- input_path = Path(args.input)
158
-
159
- if not input_path.is_absolute():
160
- input_path = workspace_root / input_path
161
-
162
- if args.output:
163
- output_dir = Path(args.output)
164
- if not output_dir.is_absolute():
165
- output_dir = workspace_root / output_dir
166
- else:
167
- output_dir = workspace_root / "priips_documents" / "extracted"
168
-
169
- # Process
170
- if input_path.is_file():
171
- output_dir.mkdir(parents=True, exist_ok=True)
172
- extract_priips_document(input_path, output_dir)
173
- elif input_path.is_dir():
174
- batch_process_directory(input_path, output_dir)
175
- else:
176
- print(f"❌ Error: {input_path} does not exist")
177
- sys.exit(1)
178
-
179
-
180
- if __name__ == "__main__":
181
- main()
182
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/test_model_access.py DELETED
@@ -1,321 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Test script to verify access to DragonLLM models using Hugging Face Hub.
4
-
5
- This script tests:
6
- 1. Token detection and authentication
7
- 2. Model repository access
8
- 3. Model information retrieval
9
- 4. Token permissions
10
-
11
- Note: You can also use the HF MCP server if available:
12
- - Uses huggingface_hub library directly
13
- - Compatible with MCP server setup
14
-
15
- Run with: python scripts/test_model_access.py
16
- """
17
-
18
- import os
19
- import sys
20
- from pathlib import Path
21
-
22
- # Add parent directory to path for imports
23
- sys.path.insert(0, str(Path(__file__).parent.parent))
24
-
25
- try:
26
- from huggingface_hub import login, whoami, HfApi, model_info, get_token
27
- from huggingface_hub.utils import HfHubHTTPError
28
- except ImportError:
29
- print("❌ Error: huggingface_hub not installed")
30
- print(" Install it with: pip install huggingface-hub")
31
- sys.exit(1)
32
-
33
- # Model to test access to
34
- MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
35
-
36
-
37
- def get_hf_token():
38
- """Get Hugging Face token from environment variables or HF CLI cache"""
39
- # First try environment variables (priority for HF Spaces)
40
- token = (
41
- os.getenv("HF_TOKEN_LC2") or
42
- os.getenv("HF_TOKEN_LC") or
43
- os.getenv("HF_TOKEN") or
44
- os.getenv("HUGGING_FACE_HUB_TOKEN")
45
- )
46
-
47
- if token:
48
- # Determine source
49
- if os.getenv("HF_TOKEN_LC2"):
50
- source = "HF_TOKEN_LC2 (env)"
51
- elif os.getenv("HF_TOKEN_LC"):
52
- source = "HF_TOKEN_LC (env)"
53
- elif os.getenv("HF_TOKEN"):
54
- source = "HF_TOKEN (env)"
55
- else:
56
- source = "HUGGING_FACE_HUB_TOKEN (env)"
57
- return token, source
58
-
59
- # Fall back to HF CLI cached token (if available)
60
- try:
61
- cached_token = get_token()
62
- if cached_token:
63
- return cached_token, "HF CLI cache"
64
- except Exception:
65
- pass
66
-
67
- return None, None
68
-
69
-
70
- def test_token_detection():
71
- """Test 1: Check if token is found in environment"""
72
- print("\n" + "="*70)
73
- print("TEST 1: Token Detection")
74
- print("="*70)
75
-
76
- token, source = get_hf_token()
77
-
78
- if token:
79
- print(f"✅ Token found: {source}")
80
- print(f" Token length: {len(token)} characters")
81
- print(f" Token preview: {token[:10]}...{token[-4:]}")
82
- return True, token, source
83
- else:
84
- print("❌ No token found in environment!")
85
- print("\n Checked environment variables:")
86
- print(" - HF_TOKEN_LC2 (recommended for DragonLLM)")
87
- print(" - HF_TOKEN_LC")
88
- print(" - HF_TOKEN")
89
- print(" - HUGGING_FACE_HUB_TOKEN")
90
- print("\n To set a token:")
91
- print(" export HF_TOKEN_LC2='your_token_here'")
92
- print(" Or use: huggingface-cli login")
93
- return False, None, None
94
-
95
-
96
- def test_authentication(token):
97
- """Test 2: Authenticate with Hugging Face Hub"""
98
- print("\n" + "="*70)
99
- print("TEST 2: Hugging Face Hub Authentication")
100
- print("="*70)
101
-
102
- try:
103
- # Login with token
104
- login(token=token, add_to_git_credential=False)
105
- print("✅ Successfully authenticated with Hugging Face Hub")
106
-
107
- # Get user info
108
- try:
109
- user_info = whoami()
110
- print(f"✅ Logged in as: {user_info.get('name', 'Unknown')}")
111
- if 'type' in user_info:
112
- print(f" Account type: {user_info['type']}")
113
- return True
114
- except Exception as e:
115
- print(f"⚠️ Authenticated but couldn't get user info: {e}")
116
- return True # Still authenticated even if we can't get user info
117
-
118
- except Exception as e:
119
- print(f"❌ Authentication failed: {e}")
120
- print("\n Possible causes:")
121
- print(" 1. Invalid token")
122
- print(" 2. Token expired")
123
- print(" 3. Network connectivity issues")
124
- return False
125
-
126
-
127
- def test_model_access(model_name):
128
- """Test 3: Check if we can access the model repository"""
129
- print("\n" + "="*70)
130
- print("TEST 3: Model Repository Access")
131
- print("="*70)
132
- print(f"Model: {model_name}")
133
-
134
- try:
135
- # Try to get model info
136
- print(f" Attempting to access model repository...")
137
- info = model_info(model_name, token=True)
138
-
139
- print(f"✅ Successfully accessed model repository!")
140
- print(f" Model ID: {info.id}")
141
- print(f" Model tags: {', '.join(info.tags) if info.tags else 'None'}")
142
-
143
- # Check if model is gated
144
- if hasattr(info, 'gated') and info.gated:
145
- print(f" ⚠️ Model is GATED - requires accepting terms")
146
-
147
- # Check available files
148
- if hasattr(info, 'siblings'):
149
- file_count = len(info.siblings) if info.siblings else 0
150
- print(f" Files in repository: {file_count}")
151
- if file_count > 0 and info.siblings:
152
- print(f" Sample files:")
153
- for sibling in info.siblings[:5]:
154
- print(f" - {sibling.rfilename} ({sibling.size / (1024**2):.1f} MB)")
155
- if file_count > 5:
156
- print(f" ... and {file_count - 5} more files")
157
-
158
- return True
159
-
160
- except HfHubHTTPError as e:
161
- if e.response.status_code == 401:
162
- print(f"❌ Unauthorized (401): Token doesn't have access to this model")
163
- print("\n Possible causes:")
164
- print(" 1. You haven't accepted the model's terms of use")
165
- print(f" 2. Visit: https://huggingface.co/{model_name}")
166
- print(" 3. Click 'Agree and access repository'")
167
- print(" 4. Token doesn't have proper permissions")
168
- return False
169
- elif e.response.status_code == 403:
170
- print(f"❌ Forbidden (403): Access denied to this model")
171
- print("\n This model may be private or require special access")
172
- return False
173
- elif e.response.status_code == 404:
174
- print(f"❌ Not Found (404): Model doesn't exist")
175
- return False
176
- else:
177
- print(f"❌ HTTP Error {e.response.status_code}: {e}")
178
- return False
179
- except Exception as e:
180
- print(f"❌ Error accessing model: {e}")
181
- print(f" Error type: {type(e).__name__}")
182
- return False
183
-
184
-
185
- def test_model_files(model_name):
186
- """Test 4: Check if we can list model files"""
187
- print("\n" + "="*70)
188
- print("TEST 4: Model Files Access")
189
- print("="*70)
190
-
191
- try:
192
- api = HfApi()
193
- files = api.list_repo_files(
194
- repo_id=model_name,
195
- repo_type="model",
196
- token=True
197
- )
198
-
199
- if files:
200
- print(f"✅ Found {len(files)} files in model repository")
201
- print(f" Key files:")
202
-
203
- # Show important files
204
- important_files = [
205
- f for f in files if any(
206
- ext in f.lower()
207
- for ext in ['.safetensors', '.bin', 'config.json', 'tokenizer', 'model']
208
- )
209
- ]
210
-
211
- for file in important_files[:10]:
212
- print(f" - {file}")
213
- if len(files) > 10:
214
- print(f" ... and {len(files) - 10} more files")
215
-
216
- return True
217
- else:
218
- print("⚠️ No files found in repository")
219
- return False
220
-
221
- except Exception as e:
222
- print(f"❌ Error listing files: {e}")
223
- return False
224
-
225
-
226
- def test_token_permissions(token):
227
- """Test 5: Check token permissions"""
228
- print("\n" + "="*70)
229
- print("TEST 5: Token Permissions")
230
- print("="*70)
231
-
232
- try:
233
- api = HfApi()
234
- user_info = api.whoami(token=token)
235
-
236
- print(f"✅ Token has valid permissions")
237
- print(f" User: {user_info.get('name', 'Unknown')}")
238
- print(f" Type: {user_info.get('type', 'Unknown')}")
239
-
240
- # Check if user has read access
241
- if 'canRead' in user_info:
242
- print(f" Can read repositories: {user_info['canRead']}")
243
-
244
- return True
245
-
246
- except Exception as e:
247
- print(f"❌ Error checking permissions: {e}")
248
- return False
249
-
250
-
251
- def main():
252
- """Run all tests"""
253
- print("\n" + "#"*70)
254
- print("# DragonLLM Model Access Test")
255
- print("#"*70)
256
- print(f"Testing access to: {MODEL_NAME}")
257
-
258
- results = {}
259
-
260
- # Test 1: Token detection
261
- success, token, source = test_token_detection()
262
- results['token_detection'] = success
263
-
264
- if not success:
265
- print("\n" + "="*70)
266
- print("❌ Cannot proceed without a token")
267
- print("="*70)
268
- return
269
-
270
- # Test 2: Authentication
271
- results['authentication'] = test_authentication(token)
272
-
273
- if not results['authentication']:
274
- print("\n" + "="*70)
275
- print("❌ Authentication failed - cannot proceed")
276
- print("="*70)
277
- return
278
-
279
- # Test 3: Model access
280
- results['model_access'] = test_model_access(MODEL_NAME)
281
-
282
- # Test 4: Model files (only if model access succeeded)
283
- if results['model_access']:
284
- results['model_files'] = test_model_files(MODEL_NAME)
285
- else:
286
- results['model_files'] = False
287
-
288
- # Test 5: Token permissions
289
- results['token_permissions'] = test_token_permissions(token)
290
-
291
- # Summary
292
- print("\n" + "="*70)
293
- print("SUMMARY")
294
- print("="*70)
295
-
296
- for test_name, success in results.items():
297
- status = "✅ PASS" if success else "❌ FAIL"
298
- test_display = test_name.replace('_', ' ').title()
299
- print(f"{status} - {test_display}")
300
-
301
- passed = sum(1 for v in results.values() if v)
302
- total = len(results)
303
-
304
- print(f"\nResults: {passed}/{total} tests passed")
305
-
306
- if passed == total:
307
- print("\n🎉 All tests passed! You have full access to the DragonLLM model.")
308
- print(" The model can be loaded in your application.")
309
- elif results.get('token_detection') and results.get('authentication'):
310
- print("\n⚠️ Authentication works but model access failed.")
311
- print(" This usually means:")
312
- print(" 1. You need to accept the model's terms of use")
313
- print(f" 2. Visit: https://huggingface.co/{MODEL_NAME}")
314
- print(" 3. Click 'Agree and access repository'")
315
- else:
316
- print("\n❌ Some tests failed. Check the errors above for details.")
317
-
318
-
319
- if __name__ == "__main__":
320
- main()
321
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/validate_hf_readme.py ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Validate README.md for Hugging Face Space compatibility.
4
+
5
+ This script checks that the README.md file has:
6
+ - Valid YAML frontmatter
7
+ - Required fields for HF Spaces (sdk, app_port for docker)
8
+ - Correct format and values
9
+ """
10
+
11
+ import sys
12
+ import re
13
+ from pathlib import Path
14
+ from typing import Dict, List, Tuple
15
+
16
+ # Required fields for Docker SDK
17
+ REQUIRED_DOCKER_FIELDS = {
18
+ "sdk": ["docker"],
19
+ "app_port": lambda x: isinstance(x, int) and 1 <= x <= 65535,
20
+ }
21
+
22
+ # Optional but recommended fields
23
+ RECOMMENDED_FIELDS = ["title", "emoji", "colorFrom", "colorTo"]
24
+
25
+ # Valid color values
26
+ VALID_COLORS = {"red", "yellow", "green", "blue", "indigo", "purple", "pink", "gray"}
27
+
28
+ # Valid SDK values
29
+ VALID_SDKS = {"gradio", "docker", "static"}
30
+
31
+ # Valid hardware flavors (from HF docs)
32
+ VALID_HARDWARE = {
33
+ "cpu-basic", "cpu-upgrade",
34
+ "t4-small", "t4-medium", "l4x1", "l4x4",
35
+ "a10g-small", "a10g-large", "a10g-largex2", "a10g-largex4", "a100-large",
36
+ "v5e-1x1", "v5e-2x2", "v5e-2x4"
37
+ }
38
+
39
+
40
+ def extract_yaml_frontmatter(content: str) -> Tuple[Dict, int, int]:
41
+ """Extract YAML frontmatter from README.md content."""
42
+ # Check for YAML frontmatter pattern
43
+ match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
44
+ if not match:
45
+ return {}, -1, -1
46
+
47
+ yaml_content = match.group(1)
48
+ start_pos = 0
49
+ end_pos = match.end()
50
+
51
+ # Simple YAML parsing (basic key: value pairs)
52
+ yaml_dict = {}
53
+ for line in yaml_content.split('\n'):
54
+ line = line.strip()
55
+ if not line or line.startswith('#'):
56
+ continue
57
+
58
+ if ':' in line:
59
+ key, value = line.split(':', 1)
60
+ key = key.strip()
61
+ value = value.strip().strip('"\'')
62
+
63
+ # Convert boolean strings
64
+ if value.lower() == 'true':
65
+ value = True
66
+ elif value.lower() == 'false':
67
+ value = False
68
+ # Convert integers
69
+ elif value.isdigit():
70
+ value = int(value)
71
+
72
+ yaml_dict[key] = value
73
+
74
+ return yaml_dict, start_pos, end_pos
75
+
76
+
77
+ def validate_readme(readme_path: Path) -> List[str]:
78
+ """Validate README.md file and return list of errors."""
79
+ errors = []
80
+
81
+ if not readme_path.exists():
82
+ return [f"README.md not found at {readme_path}"]
83
+
84
+ content = readme_path.read_text(encoding='utf-8')
85
+
86
+ # Extract YAML frontmatter
87
+ yaml_data, start, end = extract_yaml_frontmatter(content)
88
+
89
+ if start == -1:
90
+ errors.append("README.md must start with YAML frontmatter (--- ... ---)")
91
+ return errors
92
+
93
+ # Check SDK
94
+ sdk = yaml_data.get("sdk")
95
+ if not sdk:
96
+ errors.append("Missing required field: 'sdk'")
97
+ elif sdk not in VALID_SDKS:
98
+ errors.append(f"Invalid 'sdk' value: {sdk}. Must be one of: {', '.join(VALID_SDKS)}")
99
+
100
+ # For Docker SDK, check app_port
101
+ if sdk == "docker":
102
+ app_port = yaml_data.get("app_port")
103
+ if app_port is None:
104
+ errors.append("Missing required field for Docker SDK: 'app_port'")
105
+ elif not isinstance(app_port, int) or not (1 <= app_port <= 65535):
106
+ errors.append(f"Invalid 'app_port' value: {app_port}. Must be an integer between 1 and 65535")
107
+
108
+ # Check colors if present
109
+ color_from = yaml_data.get("colorFrom")
110
+ color_to = yaml_data.get("colorTo")
111
+ if color_from and color_from not in VALID_COLORS:
112
+ errors.append(f"Invalid 'colorFrom' value: {color_from}. Must be one of: {', '.join(VALID_COLORS)}")
113
+ if color_to and color_to not in VALID_COLORS:
114
+ errors.append(f"Invalid 'colorTo' value: {color_to}. Must be one of: {', '.join(VALID_COLORS)}")
115
+
116
+ # Check suggested_hardware if present
117
+ hardware = yaml_data.get("suggested_hardware")
118
+ if hardware and hardware not in VALID_HARDWARE:
119
+ errors.append(f"Invalid 'suggested_hardware' value: {hardware}. Must be one of: {', '.join(sorted(VALID_HARDWARE))}")
120
+
121
+ # Warn about deprecated 'hardware' field
122
+ if "hardware" in yaml_data:
123
+ errors.append("Deprecated field 'hardware' found. Use 'suggested_hardware' instead (per HF Spaces docs)")
124
+
125
+ # Check for emoji (recommended)
126
+ if "emoji" not in yaml_data:
127
+ errors.append("Warning: 'emoji' field is recommended for better Space appearance")
128
+
129
+ # Check for title (recommended)
130
+ if "title" not in yaml_data:
131
+ errors.append("Warning: 'title' field is recommended")
132
+
133
+ # Check that pinned is boolean if present
134
+ if "pinned" in yaml_data and not isinstance(yaml_data["pinned"], bool):
135
+ errors.append(f"Invalid 'pinned' value: {yaml_data['pinned']}. Must be boolean (true/false)")
136
+
137
+ return errors
138
+
139
+
140
+ def main():
141
+ """Main entry point."""
142
+ repo_root = Path(__file__).parent.parent
143
+ readme_path = repo_root / "README.md"
144
+
145
+ errors = validate_readme(readme_path)
146
+
147
+ if errors:
148
+ print("❌ README.md validation failed:", file=sys.stderr)
149
+ for error in errors:
150
+ print(f" - {error}", file=sys.stderr)
151
+ sys.exit(1)
152
+ else:
153
+ print("✅ README.md is valid for Hugging Face Spaces")
154
+ sys.exit(0)
155
+
156
+
157
+ if __name__ == "__main__":
158
+ main()
159
+
test_service.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Quick test script to verify the PRIIPs LLM Service is working
4
  Run with: python test_service.py
5
  """
6
  import httpx
@@ -59,7 +59,7 @@ def test_endpoint(name, method, url, json_data=None, timeout=10):
59
 
60
  def main():
61
  print(f"\n{'#'*60}")
62
- print("PRIIPs LLM Service - Quick Test Script")
63
  print(f"Service: {BASE_URL}")
64
  print(f"{'#'*60}")
65
 
@@ -94,7 +94,7 @@ def main():
94
  print(" Please wait...")
95
 
96
  chat_payload = {
97
- "model": "DragonLLM/gemma3-12b-fin-v0.3",
98
  "messages": [
99
  {"role": "user", "content": "What is 2+2?"}
100
  ],
 
1
  #!/usr/bin/env python3
2
  """
3
+ Quick test script to verify the LLM Pro Finance API is working
4
  Run with: python test_service.py
5
  """
6
  import httpx
 
59
 
60
  def main():
61
  print(f"\n{'#'*60}")
62
+ print("LLM Pro Finance API - Quick Test Script")
63
  print(f"Service: {BASE_URL}")
64
  print(f"{'#'*60}")
65
 
 
94
  print(" Please wait...")
95
 
96
  chat_payload = {
97
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
98
  "messages": [
99
  {"role": "user", "content": "What is 2+2?"}
100
  ],
tests/performance/README.md DELETED
@@ -1,277 +0,0 @@
1
- # Performance Test Suite
2
-
3
- Comprehensive performance and compatibility tests for the PRIIPs LLM Service.
4
-
5
- ## Quick Start
6
-
7
- ```bash
8
- # Install additional test dependencies
9
- pip install pytest pytest-asyncio openai
10
-
11
- # Run all performance tests
12
- pytest tests/performance/ -v -s
13
-
14
- # Run specific test suites
15
- pytest tests/performance/test_inference_speed.py -v -s
16
- pytest tests/performance/test_openai_compatibility.py -v -s
17
-
18
- # Run comprehensive benchmark
19
- python tests/performance/benchmark.py
20
- ```
21
-
22
- ## Test Suites
23
-
24
- ### 1. Inference Speed Tests (`test_inference_speed.py`)
25
-
26
- Tests various performance metrics:
27
-
28
- - **Single Request Latency**: Measures end-to-end latency for individual requests
29
- - **Token Throughput**: Measures tokens generated per second at different lengths
30
- - **Concurrent Requests**: Tests performance under concurrent load
31
- - **Time to First Token (TTFT)**: Measures latency to first generated token
32
- - **Prompt Processing Speed**: Tests how quickly different prompt lengths are processed
33
- - **Temperature Variance**: Tests response generation with different temperatures
34
-
35
- #### Key Metrics:
36
- - Latency (seconds)
37
- - Tokens per second
38
- - Concurrent request handling
39
- - TTFT (Time to First Token)
40
-
41
- ### 2. OpenAI Compatibility Tests (`test_openai_compatibility.py`)
42
-
43
- Validates OpenAI API compatibility:
44
-
45
- **Endpoint Compatibility:**
46
- - `GET /v1/models` - Model listing
47
- - `POST /v1/chat/completions` - Chat completions
48
-
49
- **Message Format Tests:**
50
- - System messages
51
- - Conversation history
52
- - Multi-turn conversations
53
-
54
- **Parameter Tests:**
55
- - `temperature`
56
- - `max_tokens`
57
- - `top_p`
58
- - `stream`
59
-
60
- **Client Library Tests:**
61
- - Official OpenAI Python client compatibility
62
- - Streaming support
63
-
64
- **Error Handling:**
65
- - Invalid models
66
- - Missing required fields
67
- - Empty messages
68
-
69
- **Response Schema:**
70
- - Full OpenAI response format validation
71
- - Proper usage statistics
72
- - Correct finish reasons
73
-
74
- ### 3. Comprehensive Benchmark (`benchmark.py`)
75
-
76
- All-in-one benchmark script that:
77
- - Runs all performance tests
78
- - Validates OpenAI compatibility
79
- - Generates detailed report
80
- - Saves results to JSON
81
-
82
- ## Configuration
83
-
84
- ### Change Target URL
85
-
86
- Edit the `BASE_URL` in each test file:
87
-
88
- ```python
89
- # For production
90
- BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
91
-
92
- # For local testing
93
- BASE_URL = "http://localhost:7860"
94
- ```
95
-
96
- ### Adjust Test Parameters
97
-
98
- Modify test parameters in each test:
99
-
100
- ```python
101
- # Number of concurrent requests
102
- num_concurrent = 10
103
-
104
- # Number of test runs
105
- num_runs = 10
106
-
107
- # Max tokens for generation
108
- max_tokens = 100
109
- ```
110
-
111
- ## Expected Results
112
-
113
- ### Good Performance Metrics (on L40 GPU):
114
-
115
- - **Latency**: < 2 seconds for 100 tokens
116
- - **Token Throughput**: > 50 tokens/second
117
- - **TTFT**: < 500ms
118
- - **Concurrent Handling**: > 5 requests/second
119
-
120
- ### OpenAI Compatibility:
121
-
122
- Should pass all compatibility tests (100% score)
123
-
124
- ## Test Output Examples
125
-
126
- ### Inference Speed Test Output:
127
- ```
128
- === Single Request Performance ===
129
- Latency: 1.45s
130
- Prompt tokens: 12
131
- Completion tokens: 89
132
- Total tokens: 101
133
- Tokens per second: 61.38
134
- Response: Artificial intelligence (AI) refers to...
135
- ```
136
-
137
- ### Concurrent Load Test Output:
138
- ```
139
- === Concurrent Requests Test (10 requests) ===
140
- Total time: 3.21s
141
- Successful requests: 10/10
142
- Average latency: 2.15s
143
- Requests per second: 3.12
144
- ```
145
-
146
- ### OpenAI Compatibility Output:
147
- ```
148
- === OpenAI API Compatibility ===
149
- ✓ List models endpoint
150
- ✓ Chat completions endpoint
151
- ✓ System message support
152
- ✓ Conversation history
153
- ✓ Temperature parameter
154
- ✓ Max tokens parameter
155
-
156
- Compatibility Score: 6/7 (86%)
157
- ```
158
-
159
- ## Troubleshooting
160
-
161
- ### Tests Timeout
162
- - Increase timeout in `httpx.AsyncClient(timeout=120.0)`
163
- - Check if service is running with health check
164
-
165
- ### Connection Errors
166
- - Verify BASE_URL is correct
167
- - Check network connectivity
168
- - Ensure service is deployed and running
169
-
170
- ### Performance Lower Than Expected
171
- - Check GPU utilization on server
172
- - Verify vLLM configuration
173
- - Look for model loading issues in logs
174
-
175
- ## Integration with CI/CD
176
-
177
- Add to your CI pipeline:
178
-
179
- ```yaml
180
- # .github/workflows/performance.yml
181
- name: Performance Tests
182
-
183
- on: [push, pull_request]
184
-
185
- jobs:
186
- test:
187
- runs-on: ubuntu-latest
188
- steps:
189
- - uses: actions/checkout@v2
190
- - name: Set up Python
191
- uses: actions/setup-python@v2
192
- with:
193
- python-version: 3.11
194
- - name: Install dependencies
195
- run: |
196
- pip install -r requirements.txt
197
- pip install pytest pytest-asyncio openai
198
- - name: Run performance tests
199
- run: pytest tests/performance/ -v
200
- ```
201
-
202
- ## Benchmark Results
203
-
204
- Results are saved to `benchmark_results.json` with structure:
205
-
206
- ```json
207
- {
208
- "single_request": {
209
- "avg_latency": 1.45,
210
- "avg_tokens_per_sec": 61.38
211
- },
212
- "concurrent_load": {
213
- "requests_per_sec": 3.12,
214
- "successful": 10
215
- },
216
- "openai_compatibility": {
217
- "score": "6/7"
218
- }
219
- }
220
- ```
221
-
222
- ## Advanced Usage
223
-
224
- ### Custom Test Scenarios
225
-
226
- Create custom test scenarios:
227
-
228
- ```python
229
- @pytest.mark.asyncio
230
- async def test_custom_scenario(client):
231
- # Your custom test here
232
- payload = {
233
- "model": "DragonLLM/LLM-Pro-Finance-Small",
234
- "messages": [{"role": "user", "content": "Custom prompt"}],
235
- "max_tokens": 200
236
- }
237
- response = await client.post(f"{BASE_URL}/v1/chat/completions", json=payload)
238
- assert response.status_code == 200
239
- ```
240
-
241
- ### Stress Testing
242
-
243
- For stress testing, increase concurrent requests:
244
-
245
- ```python
246
- await benchmark_concurrent_load(num_concurrent=50)
247
- ```
248
-
249
- ## Monitoring
250
-
251
- Metrics to monitor during tests:
252
-
253
- - **Server-side**:
254
- - GPU utilization
255
- - Memory usage
256
- - Request queue length
257
- - Model loading time
258
-
259
- - **Client-side**:
260
- - Response times
261
- - Error rates
262
- - Token throughput
263
- - Network latency
264
-
265
- ## Support
266
-
267
- For issues or questions:
268
- - Check service logs at Hugging Face Spaces dashboard
269
- - Review DEPLOYMENT.md for configuration details
270
- - Verify vLLM is properly initialized with model
271
-
272
-
273
-
274
-
275
-
276
-
277
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/performance/benchmark.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Comprehensive benchmark suite for PRIIPs LLM Service
4
  Run with: python tests/performance/benchmark.py
5
  """
6
  import asyncio
@@ -39,7 +39,7 @@ class Benchmark:
39
  tokens_per_sec = []
40
 
41
  payload = {
42
- "model": "DragonLLM/LLM-Pro-Finance-Small",
43
  "messages": [
44
  {"role": "user", "content": "What is artificial intelligence?"}
45
  ],
@@ -91,7 +91,7 @@ class Benchmark:
91
 
92
  async def make_request(request_id: int):
93
  payload = {
94
- "model": "DragonLLM/LLM-Pro-Finance-Small",
95
  "messages": [
96
  {"role": "user", "content": f"Request {request_id}: Explain machine learning."}
97
  ],
@@ -155,7 +155,7 @@ class Benchmark:
155
 
156
  for test_case in test_cases:
157
  payload = {
158
- "model": "DragonLLM/LLM-Pro-Finance-Small",
159
  "messages": [
160
  {"role": "user", "content": "Write about the history of computing."}
161
  ],
@@ -231,7 +231,7 @@ class Benchmark:
231
  # Test 3: System message
232
  try:
233
  payload = {
234
- "model": "DragonLLM/LLM-Pro-Finance-Small",
235
  "messages": [
236
  {"role": "system", "content": "Be helpful."},
237
  {"role": "user", "content": "Hi"}
@@ -247,7 +247,7 @@ class Benchmark:
247
  # Test 4: Conversation history
248
  try:
249
  payload = {
250
- "model": "DragonLLM/LLM-Pro-Finance-Small",
251
  "messages": [
252
  {"role": "user", "content": "My name is Alice"},
253
  {"role": "assistant", "content": "Hello Alice"},
@@ -264,7 +264,7 @@ class Benchmark:
264
  # Test 5: Temperature parameter
265
  try:
266
  payload = {
267
- "model": "DragonLLM/LLM-Pro-Finance-Small",
268
  "messages": [{"role": "user", "content": "Hi"}],
269
  "temperature": 0.5
270
  }
@@ -278,7 +278,7 @@ class Benchmark:
278
  # Test 6: Max tokens parameter
279
  try:
280
  payload = {
281
- "model": "DragonLLM/LLM-Pro-Finance-Small",
282
  "messages": [{"role": "user", "content": "Hi"}],
283
  "max_tokens": 10
284
  }
@@ -299,7 +299,7 @@ class Benchmark:
299
  async def run_all_benchmarks(self):
300
  """Run all benchmarks"""
301
  print(f"\n{'#'*60}")
302
- print("PRIIPs LLM Service - Comprehensive Benchmark Suite")
303
  print(f"Service: {self.base_url}")
304
  print(f"{'#'*60}")
305
 
 
1
  #!/usr/bin/env python3
2
  """
3
+ Comprehensive benchmark suite for LLM Pro Finance API
4
  Run with: python tests/performance/benchmark.py
5
  """
6
  import asyncio
 
39
  tokens_per_sec = []
40
 
41
  payload = {
42
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
43
  "messages": [
44
  {"role": "user", "content": "What is artificial intelligence?"}
45
  ],
 
91
 
92
  async def make_request(request_id: int):
93
  payload = {
94
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
95
  "messages": [
96
  {"role": "user", "content": f"Request {request_id}: Explain machine learning."}
97
  ],
 
155
 
156
  for test_case in test_cases:
157
  payload = {
158
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
159
  "messages": [
160
  {"role": "user", "content": "Write about the history of computing."}
161
  ],
 
231
  # Test 3: System message
232
  try:
233
  payload = {
234
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
235
  "messages": [
236
  {"role": "system", "content": "Be helpful."},
237
  {"role": "user", "content": "Hi"}
 
247
  # Test 4: Conversation history
248
  try:
249
  payload = {
250
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
251
  "messages": [
252
  {"role": "user", "content": "My name is Alice"},
253
  {"role": "assistant", "content": "Hello Alice"},
 
264
  # Test 5: Temperature parameter
265
  try:
266
  payload = {
267
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
268
  "messages": [{"role": "user", "content": "Hi"}],
269
  "temperature": 0.5
270
  }
 
278
  # Test 6: Max tokens parameter
279
  try:
280
  payload = {
281
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
282
  "messages": [{"role": "user", "content": "Hi"}],
283
  "max_tokens": 10
284
  }
 
299
  async def run_all_benchmarks(self):
300
  """Run all benchmarks"""
301
  print(f"\n{'#'*60}")
302
+ print("LLM Pro Finance API - Comprehensive Benchmark Suite")
303
  print(f"Service: {self.base_url}")
304
  print(f"{'#'*60}")
305
 
tests/performance/test_inference_speed.py CHANGED
@@ -20,7 +20,7 @@ def client():
20
  async def test_single_request_latency(client):
21
  """Test latency for a single chat completion request"""
22
  payload = {
23
- "model": "DragonLLM/LLM-Pro-Finance-Small",
24
  "messages": [
25
  {"role": "user", "content": "What is the capital of France?"}
26
  ],
@@ -66,7 +66,7 @@ async def test_token_throughput_various_lengths(client):
66
 
67
  for test_case in test_cases:
68
  payload = {
69
- "model": "DragonLLM/LLM-Pro-Finance-Small",
70
  "messages": [{"role": "user", "content": test_case["prompt"]}],
71
  "max_tokens": test_case["max_tokens"],
72
  "temperature": 0.7
@@ -98,7 +98,7 @@ async def test_concurrent_requests(client):
98
 
99
  async def make_request(request_id: int):
100
  payload = {
101
- "model": "DragonLLM/LLM-Pro-Finance-Small",
102
  "messages": [
103
  {"role": "user", "content": f"Request {request_id}: What is 2+2?"}
104
  ],
@@ -142,7 +142,7 @@ async def test_concurrent_requests(client):
142
  async def test_time_to_first_token(client):
143
  """Test time to first token (TTFT) using streaming"""
144
  payload = {
145
- "model": "DragonLLM/LLM-Pro-Finance-Small",
146
  "messages": [
147
  {"role": "user", "content": "Count from 1 to 10."}
148
  ],
@@ -190,7 +190,7 @@ async def test_prompt_processing_speed(client):
190
 
191
  for i, prompt in enumerate(prompts):
192
  payload = {
193
- "model": "DragonLLM/LLM-Pro-Finance-Small",
194
  "messages": [{"role": "user", "content": prompt}],
195
  "max_tokens": 50,
196
  "temperature": 0.7
@@ -221,7 +221,7 @@ async def test_temperature_variance(client):
221
 
222
  for temp in temperatures:
223
  payload = {
224
- "model": "DragonLLM/LLM-Pro-Finance-Small",
225
  "messages": [{"role": "user", "content": prompt}],
226
  "max_tokens": 50,
227
  "temperature": temp
 
20
  async def test_single_request_latency(client):
21
  """Test latency for a single chat completion request"""
22
  payload = {
23
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
24
  "messages": [
25
  {"role": "user", "content": "What is the capital of France?"}
26
  ],
 
66
 
67
  for test_case in test_cases:
68
  payload = {
69
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
70
  "messages": [{"role": "user", "content": test_case["prompt"]}],
71
  "max_tokens": test_case["max_tokens"],
72
  "temperature": 0.7
 
98
 
99
  async def make_request(request_id: int):
100
  payload = {
101
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
102
  "messages": [
103
  {"role": "user", "content": f"Request {request_id}: What is 2+2?"}
104
  ],
 
142
  async def test_time_to_first_token(client):
143
  """Test time to first token (TTFT) using streaming"""
144
  payload = {
145
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
146
  "messages": [
147
  {"role": "user", "content": "Count from 1 to 10."}
148
  ],
 
190
 
191
  for i, prompt in enumerate(prompts):
192
  payload = {
193
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
194
  "messages": [{"role": "user", "content": prompt}],
195
  "max_tokens": 50,
196
  "temperature": 0.7
 
221
 
222
  for temp in temperatures:
223
  payload = {
224
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
225
  "messages": [{"role": "user", "content": prompt}],
226
  "max_tokens": 50,
227
  "temperature": temp
tests/performance/test_openai_compatibility.py CHANGED
@@ -58,7 +58,7 @@ class TestEndpointCompatibility:
58
  async def test_chat_completions_endpoint(self, httpx_client):
59
  """Test POST /v1/chat/completions endpoint"""
60
  payload = {
61
- "model": "DragonLLM/LLM-Pro-Finance-Small",
62
  "messages": [
63
  {"role": "user", "content": "Say hello"}
64
  ]
@@ -109,7 +109,7 @@ class TestOpenAIClientLibrary:
109
  """Test chat completion using official OpenAI client"""
110
  try:
111
  response = openai_client.chat.completions.create(
112
- model="DragonLLM/LLM-Pro-Finance-Small",
113
  messages=[
114
  {"role": "user", "content": "What is 2+2?"}
115
  ],
@@ -133,7 +133,7 @@ class TestOpenAIClientLibrary:
133
  """Test streaming with official OpenAI client"""
134
  try:
135
  stream = openai_client.chat.completions.create(
136
- model="DragonLLM/LLM-Pro-Finance-Small",
137
  messages=[
138
  {"role": "user", "content": "Count to 5"}
139
  ],
@@ -162,7 +162,7 @@ class TestMessageFormats:
162
  async def test_system_message(self, httpx_client):
163
  """Test with system message"""
164
  payload = {
165
- "model": "DragonLLM/LLM-Pro-Finance-Small",
166
  "messages": [
167
  {"role": "system", "content": "You are a helpful assistant."},
168
  {"role": "user", "content": "Hello"}
@@ -185,7 +185,7 @@ class TestMessageFormats:
185
  async def test_conversation_history(self, httpx_client):
186
  """Test with conversation history"""
187
  payload = {
188
- "model": "DragonLLM/LLM-Pro-Finance-Small",
189
  "messages": [
190
  {"role": "user", "content": "My name is Alice."},
191
  {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
@@ -220,7 +220,7 @@ class TestMessageFormats:
220
 
221
  for params in parameters:
222
  payload = {
223
- "model": "DragonLLM/LLM-Pro-Finance-Small",
224
  "messages": [{"role": "user", "content": "Hello"}],
225
  **params
226
  }
@@ -276,7 +276,7 @@ class TestErrorHandling:
276
  async def test_empty_message(self, httpx_client):
277
  """Test with empty message content"""
278
  payload = {
279
- "model": "DragonLLM/LLM-Pro-Finance-Small",
280
  "messages": [{"role": "user", "content": ""}],
281
  "max_tokens": 50
282
  }
@@ -297,7 +297,7 @@ class TestResponseFormat:
297
  async def test_response_schema(self, httpx_client):
298
  """Validate complete response schema"""
299
  payload = {
300
- "model": "DragonLLM/LLM-Pro-Finance-Small",
301
  "messages": [{"role": "user", "content": "Test"}],
302
  "max_tokens": 50
303
  }
 
58
  async def test_chat_completions_endpoint(self, httpx_client):
59
  """Test POST /v1/chat/completions endpoint"""
60
  payload = {
61
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
62
  "messages": [
63
  {"role": "user", "content": "Say hello"}
64
  ]
 
109
  """Test chat completion using official OpenAI client"""
110
  try:
111
  response = openai_client.chat.completions.create(
112
+ model="DragonLLM/qwen3-8b-fin-v1.0",
113
  messages=[
114
  {"role": "user", "content": "What is 2+2?"}
115
  ],
 
133
  """Test streaming with official OpenAI client"""
134
  try:
135
  stream = openai_client.chat.completions.create(
136
+ model="DragonLLM/qwen3-8b-fin-v1.0",
137
  messages=[
138
  {"role": "user", "content": "Count to 5"}
139
  ],
 
162
  async def test_system_message(self, httpx_client):
163
  """Test with system message"""
164
  payload = {
165
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
166
  "messages": [
167
  {"role": "system", "content": "You are a helpful assistant."},
168
  {"role": "user", "content": "Hello"}
 
185
  async def test_conversation_history(self, httpx_client):
186
  """Test with conversation history"""
187
  payload = {
188
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
189
  "messages": [
190
  {"role": "user", "content": "My name is Alice."},
191
  {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
 
220
 
221
  for params in parameters:
222
  payload = {
223
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
224
  "messages": [{"role": "user", "content": "Hello"}],
225
  **params
226
  }
 
276
  async def test_empty_message(self, httpx_client):
277
  """Test with empty message content"""
278
  payload = {
279
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
280
  "messages": [{"role": "user", "content": ""}],
281
  "max_tokens": 50
282
  }
 
297
  async def test_response_schema(self, httpx_client):
298
  """Validate complete response schema"""
299
  payload = {
300
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
301
  "messages": [{"role": "user", "content": "Test"}],
302
  "max_tokens": 50
303
  }
tests/test_config.py CHANGED
@@ -10,7 +10,7 @@ def test_settings_defaults():
10
  """Test that settings have correct default values."""
11
  settings = Settings()
12
  assert settings.vllm_base_url == "http://localhost:8000/v1"
13
- assert settings.model == "DragonLLM/LLM-Pro-Finance-Small"
14
  assert settings.service_api_key is None
15
  assert settings.log_level == "info"
16
 
 
10
  """Test that settings have correct default values."""
11
  settings = Settings()
12
  assert settings.vllm_base_url == "http://localhost:8000/v1"
13
+ assert settings.model == "DragonLLM/qwen3-8b-fin-v1.0"
14
  assert settings.service_api_key is None
15
  assert settings.log_level == "info"
16
 
tests/test_extract_route.py DELETED
@@ -1,50 +0,0 @@
1
- from fastapi.testclient import TestClient
2
-
3
- from app.main import app
4
-
5
-
6
- client = TestClient(app)
7
-
8
-
9
- def test_extract_priips(monkeypatch, tmp_path):
10
- # Fake PDF extraction
11
- from app.services import extract_service
12
-
13
- def fake_extract_text_from_pdf(path):
14
- return "Product: Test Fund ISIN: TEST1234567 SRI: 3"
15
-
16
- monkeypatch.setattr(extract_service, "extract_text_from_pdf", fake_extract_text_from_pdf)
17
-
18
- # Fake vLLM chat returning JSON
19
- from app.providers import vllm
20
-
21
- async def fake_chat(payload, stream=False):
22
- return {
23
- "id": "cmpl-2",
24
- "object": "chat.completion",
25
- "created": 0,
26
- "model": payload["model"],
27
- "choices": [
28
- {
29
- "index": 0,
30
- "message": {
31
- "role": "assistant",
32
- "content": "{\"product_name\":\"Test Fund\",\"isin\":\"TEST1234567\",\"sri\":3}",
33
- },
34
- "finish_reason": "stop",
35
- }
36
- ],
37
- }
38
-
39
- monkeypatch.setattr(vllm, "chat", fake_chat)
40
-
41
- r = client.post(
42
- "/extract-priips",
43
- json={"sources": ["/path/to/local.pdf"]},
44
- )
45
- assert r.status_code == 200
46
- j = r.json()
47
- assert j[0]["success"] is True
48
- assert j[0]["data"]["isin"] == "TEST1234567"
49
-
50
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_extract_service.py DELETED
@@ -1,125 +0,0 @@
1
- import pytest
2
- from unittest.mock import AsyncMock, patch
3
-
4
- from app.services.extract_service import build_prompt, process_source, extract
5
- from app.models.priips import ExtractRequest, ExtractResult, PriipsFields
6
-
7
-
8
- def test_build_prompt():
9
- """Test prompt building with schema instructions."""
10
- text = "Test document content"
11
- prompt = build_prompt(text)
12
-
13
- assert "expert financial document parser" in prompt
14
- assert "STRICT JSON only" in prompt
15
- assert "product_name" in prompt
16
- assert "manufacturer" in prompt
17
- assert "isin" in prompt
18
- assert "sri" in prompt
19
- assert "Test document content" in prompt
20
-
21
-
22
- def test_build_prompt_long_text():
23
- """Test prompt building with very long text (should be truncated)."""
24
- long_text = "x" * 20000
25
- prompt = build_prompt(long_text)
26
-
27
- # Should be truncated to 15000 chars
28
- assert len(prompt) < 20000
29
- assert "Document:\n" in prompt
30
-
31
-
32
- @pytest.mark.asyncio
33
- async def test_process_source_local_file():
34
- """Test processing a local PDF file."""
35
- with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
36
- patch('app.services.extract_service.vllm.chat') as mock_chat, \
37
- patch('app.services.extract_service.settings') as mock_settings:
38
-
39
- mock_extract.return_value = "Product: Test Fund ISIN: TEST1234567"
40
- mock_settings.model = "test-model"
41
- mock_chat.return_value = {
42
- "choices": [{"message": {"content": '{"product_name": "Test Fund", "isin": "TEST1234567"}'}}]
43
- }
44
-
45
- result = await process_source("/path/to/local.pdf")
46
-
47
- assert isinstance(result, ExtractResult)
48
- assert result.success is True
49
- assert result.source == "/path/to/local.pdf"
50
- assert result.data.product_name == "Test Fund"
51
- assert result.data.isin == "TEST1234567"
52
- assert result.data.source_url == "/path/to/local.pdf"
53
-
54
-
55
- @pytest.mark.asyncio
56
- async def test_process_source_url():
57
- """Test processing a PDF URL."""
58
- with patch('app.services.extract_service.download_to_tmp') as mock_download, \
59
- patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
60
- patch('app.services.extract_service.vllm.chat') as mock_chat, \
61
- patch('app.services.extract_service.settings') as mock_settings:
62
-
63
- mock_download.return_value = "/tmp/downloaded.pdf"
64
- mock_extract.return_value = "Product: Test Fund"
65
- mock_settings.model = "test-model"
66
- mock_chat.return_value = {
67
- "choices": [{"message": {"content": '{"product_name": "Test Fund"}'}}]
68
- }
69
-
70
- result = await process_source("https://example.com/doc.pdf")
71
-
72
- assert isinstance(result, ExtractResult)
73
- assert result.success is True
74
- assert result.source == "https://example.com/doc.pdf"
75
- assert result.data.source_url == "https://example.com/doc.pdf"
76
-
77
-
78
- @pytest.mark.asyncio
79
- async def test_process_source_invalid_json():
80
- """Test processing with invalid JSON response."""
81
- with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
82
- patch('app.services.extract_service.vllm.chat') as mock_chat, \
83
- patch('app.services.extract_service.settings') as mock_settings:
84
-
85
- mock_extract.return_value = "Test content"
86
- mock_settings.model = "test-model"
87
- mock_chat.return_value = {
88
- "choices": [{"message": {"content": "invalid json response"}}]
89
- }
90
-
91
- result = await process_source("/path/to/file.pdf")
92
-
93
- assert isinstance(result, ExtractResult)
94
- assert result.success is False
95
- assert result.error is not None
96
-
97
-
98
- @pytest.mark.asyncio
99
- async def test_process_source_exception():
100
- """Test processing with exception during PDF extraction."""
101
- with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract:
102
- mock_extract.side_effect = Exception("PDF read error")
103
-
104
- result = await process_source("/path/to/file.pdf")
105
-
106
- assert isinstance(result, ExtractResult)
107
- assert result.success is False
108
- assert "PDF read error" in result.error
109
-
110
-
111
- @pytest.mark.asyncio
112
- async def test_extract_multiple_sources():
113
- """Test extracting from multiple sources."""
114
- with patch('app.services.extract_service.process_source') as mock_process:
115
- mock_process.side_effect = [
116
- ExtractResult(source="file1.pdf", success=True, data=PriipsFields(product_name="Fund 1")),
117
- ExtractResult(source="file2.pdf", success=False, error="Failed to read")
118
- ]
119
-
120
- request = ExtractRequest(sources=["file1.pdf", "file2.pdf"])
121
- results = await extract(request)
122
-
123
- assert len(results) == 2
124
- assert results[0].success is True
125
- assert results[1].success is False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_json_guard.py DELETED
@@ -1,56 +0,0 @@
1
- import pytest
2
- from unittest.mock import patch
3
-
4
- from app.utils.json_guard import try_parse_json
5
-
6
-
7
- def test_try_parse_json_valid():
8
- """Test parsing valid JSON."""
9
- valid_json = '{"name": "test", "value": 123}'
10
- success, result = try_parse_json(valid_json)
11
-
12
- assert success is True
13
- assert result == {"name": "test", "value": 123}
14
-
15
-
16
- def test_try_parse_json_invalid():
17
- """Test parsing invalid JSON."""
18
- invalid_json = '{"name": "test", "value": 123' # Missing closing brace
19
- success, result = try_parse_json(invalid_json)
20
-
21
- assert success is False
22
- assert isinstance(result, str) # Error message
23
-
24
-
25
- def test_try_parse_json_with_markdown_fences():
26
- """Test parsing JSON wrapped in markdown code fences."""
27
- json_with_fences = '```\n{"name": "test"}\n```'
28
- success, result = try_parse_json(json_with_fences)
29
-
30
- assert success is True
31
- assert result == {"name": "test"}
32
-
33
-
34
- def test_try_parse_json_with_markdown_fences_invalid():
35
- """Test parsing invalid JSON with markdown fences."""
36
- invalid_json_with_fences = '```json\n{"name": "test"\n```' # Missing closing brace
37
- success, result = try_parse_json(invalid_json_with_fences)
38
-
39
- assert success is False
40
- assert isinstance(result, str)
41
-
42
-
43
- def test_try_parse_json_empty_string():
44
- """Test parsing empty string."""
45
- success, result = try_parse_json("")
46
-
47
- assert success is False
48
- assert isinstance(result, str)
49
-
50
-
51
- def test_try_parse_json_none():
52
- """Test parsing None input."""
53
- success, result = try_parse_json(None)
54
-
55
- assert success is False
56
- assert isinstance(result, str)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_pdf_utils.py DELETED
@@ -1,105 +0,0 @@
1
- import pytest
2
- from unittest.mock import patch, AsyncMock
3
- from pathlib import Path
4
-
5
- from app.utils.pdf import download_to_tmp, extract_text_from_pdf
6
-
7
-
8
- @pytest.mark.asyncio
9
- async def test_download_to_tmp_success():
10
- """Test successful PDF download."""
11
- url = "https://example.com/document.pdf"
12
- tmp_dir = Path("/tmp")
13
- mock_content = b"PDF content here"
14
-
15
- with patch('httpx.AsyncClient') as mock_client:
16
- mock_response = AsyncMock()
17
- mock_response.content = mock_content
18
- mock_response.raise_for_status.return_value = None
19
- mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
20
-
21
- result = await download_to_tmp(url, tmp_dir)
22
-
23
- assert isinstance(result, Path)
24
- assert result.name == "document.pdf"
25
- assert result.parent == tmp_dir
26
-
27
-
28
- @pytest.mark.asyncio
29
- async def test_download_to_tmp_no_filename():
30
- """Test download with URL that has no filename."""
31
- url = "https://example.com/"
32
- tmp_dir = Path("/tmp")
33
- mock_content = b"PDF content"
34
-
35
- with patch('httpx.AsyncClient') as mock_client:
36
- mock_response = AsyncMock()
37
- mock_response.content = mock_content
38
- mock_response.raise_for_status.return_value = None
39
- mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
40
-
41
- result = await download_to_tmp(url, tmp_dir)
42
-
43
- assert isinstance(result, Path)
44
- assert result.name == "document.pdf" # Default filename
45
- assert result.parent == tmp_dir
46
-
47
-
48
- @pytest.mark.asyncio
49
- async def test_download_to_tmp_http_error():
50
- """Test download with HTTP error."""
51
- url = "https://example.com/document.pdf"
52
- tmp_dir = Path("/tmp")
53
-
54
- with patch('httpx.AsyncClient') as mock_client:
55
- mock_response = AsyncMock()
56
- mock_response.content = b"PDF content"
57
- mock_response.raise_for_status.side_effect = Exception("HTTP 404")
58
- mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
59
-
60
- with pytest.raises(Exception):
61
- await download_to_tmp(url, tmp_dir)
62
-
63
-
64
- def test_extract_text_from_pdf_success():
65
- """Test successful PDF text extraction."""
66
- pdf_path = Path("/tmp/test.pdf")
67
- expected_text = "Sample PDF content"
68
-
69
- with patch('app.utils.pdf.extract_text_from_pdf') as mock_extract:
70
- mock_extract.return_value = expected_text
71
-
72
- result = extract_text_from_pdf(pdf_path)
73
-
74
- assert result == expected_text
75
-
76
-
77
- def test_extract_text_from_pdf_multiple_pages():
78
- """Test PDF text extraction from multiple pages."""
79
- pdf_path = Path("/tmp/test.pdf")
80
- expected_text = "Page 1 content\nPage 2 content\nPage 3 content"
81
-
82
- with patch('app.utils.pdf.extract_text_from_pdf') as mock_extract:
83
- mock_extract.return_value = expected_text
84
-
85
- result = extract_text_from_pdf(pdf_path)
86
-
87
- assert result == expected_text
88
-
89
-
90
- def test_extract_text_from_pdf_import_error():
91
- """Test PDF extraction when PyMuPDF is not available."""
92
- pdf_path = Path("/tmp/test.pdf")
93
-
94
- with patch('app.utils.pdf.extract_text_from_pdf', side_effect=RuntimeError("PyMuPDF (fitz) is required")):
95
- with pytest.raises(RuntimeError, match="PyMuPDF.*required"):
96
- extract_text_from_pdf(pdf_path)
97
-
98
-
99
- def test_extract_text_from_pdf_file_error():
100
- """Test PDF extraction with file read error."""
101
- pdf_path = Path("/tmp/test.pdf")
102
-
103
- with patch('app.utils.pdf.extract_text_from_pdf', side_effect=RuntimeError("PyMuPDF (fitz) is required")):
104
- with pytest.raises(RuntimeError, match="PyMuPDF.*required"):
105
- extract_text_from_pdf(pdf_path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_priips_models.py DELETED
@@ -1,163 +0,0 @@
1
- import pytest
2
- from unittest.mock import patch
3
-
4
- from app.models.priips import (
5
- PerformanceScenario, Costs, PriipsFields,
6
- ExtractRequest, ExtractResult
7
- )
8
-
9
-
10
- def test_performance_scenario_model():
11
- """Test PerformanceScenario Pydantic model."""
12
- scenario = PerformanceScenario(
13
- name="Bull Market",
14
- description="Optimistic scenario",
15
- return_pct=15.5
16
- )
17
-
18
- assert scenario.name == "Bull Market"
19
- assert scenario.description == "Optimistic scenario"
20
- assert scenario.return_pct == 15.5
21
-
22
-
23
- def test_performance_scenario_optional_fields():
24
- """Test PerformanceScenario with optional fields."""
25
- scenario = PerformanceScenario(name="Bear Market")
26
-
27
- assert scenario.name == "Bear Market"
28
- assert scenario.description is None
29
- assert scenario.return_pct is None
30
-
31
-
32
- def test_costs_model():
33
- """Test Costs Pydantic model."""
34
- costs = Costs(
35
- entry_cost_pct=2.5,
36
- ongoing_cost_pct=1.2,
37
- exit_cost_pct=0.5
38
- )
39
-
40
- assert costs.entry_cost_pct == 2.5
41
- assert costs.ongoing_cost_pct == 1.2
42
- assert costs.exit_cost_pct == 0.5
43
-
44
-
45
- def test_costs_optional_fields():
46
- """Test Costs with optional fields."""
47
- costs = Costs()
48
-
49
- assert costs.entry_cost_pct is None
50
- assert costs.ongoing_cost_pct is None
51
- assert costs.exit_cost_pct is None
52
-
53
-
54
- def test_priips_fields_model():
55
- """Test PriipsFields Pydantic model."""
56
- performance_scenarios = [
57
- PerformanceScenario(name="Bull", return_pct=10.0),
58
- PerformanceScenario(name="Bear", return_pct=-5.0)
59
- ]
60
- costs = Costs(entry_cost_pct=1.0, ongoing_cost_pct=0.5)
61
-
62
- priips = PriipsFields(
63
- product_name="Test Fund",
64
- manufacturer="Test Company",
65
- isin="TEST123456789",
66
- sri=3,
67
- recommended_holding_period="5 years",
68
- costs=costs,
69
- performance_scenarios=performance_scenarios,
70
- date="2024-01-01",
71
- language="en",
72
- source_url="https://example.com/doc.pdf"
73
- )
74
-
75
- assert priips.product_name == "Test Fund"
76
- assert priips.manufacturer == "Test Company"
77
- assert priips.isin == "TEST123456789"
78
- assert priips.sri == 3
79
- assert priips.recommended_holding_period == "5 years"
80
- assert priips.costs == costs
81
- assert len(priips.performance_scenarios) == 2
82
- assert priips.date == "2024-01-01"
83
- assert priips.language == "en"
84
- assert priips.source_url == "https://example.com/doc.pdf"
85
-
86
-
87
- def test_priips_fields_optional_fields():
88
- """Test PriipsFields with minimal required fields."""
89
- priips = PriipsFields()
90
-
91
- assert priips.product_name is None
92
- assert priips.manufacturer is None
93
- assert priips.isin is None
94
- assert priips.sri is None
95
- assert priips.recommended_holding_period is None
96
- assert priips.costs is None
97
- assert priips.performance_scenarios is None
98
- assert priips.date is None
99
- assert priips.language is None
100
- assert priips.source_url is None
101
-
102
-
103
- def test_extract_request_model():
104
- """Test ExtractRequest Pydantic model."""
105
- request = ExtractRequest(
106
- sources=["https://example.com/doc1.pdf", "/path/to/doc2.pdf"],
107
- options={"language": "en", "ocr": False}
108
- )
109
-
110
- assert len(request.sources) == 2
111
- assert request.sources[0] == "https://example.com/doc1.pdf"
112
- assert request.sources[1] == "/path/to/doc2.pdf"
113
- assert request.options["language"] == "en"
114
- assert request.options["ocr"] is False
115
-
116
-
117
- def test_extract_request_minimal():
118
- """Test ExtractRequest with minimal fields."""
119
- request = ExtractRequest(sources=["https://example.com/doc.pdf"])
120
-
121
- assert len(request.sources) == 1
122
- assert request.options is None
123
-
124
-
125
- def test_extract_result_success():
126
- """Test ExtractResult for successful extraction."""
127
- priips_data = PriipsFields(product_name="Test Fund", isin="TEST123")
128
- result = ExtractResult(
129
- source="https://example.com/doc.pdf",
130
- success=True,
131
- data=priips_data
132
- )
133
-
134
- assert result.source == "https://example.com/doc.pdf"
135
- assert result.success is True
136
- assert result.data == priips_data
137
- assert result.error is None
138
-
139
-
140
- def test_extract_result_failure():
141
- """Test ExtractResult for failed extraction."""
142
- result = ExtractResult(
143
- source="https://example.com/doc.pdf",
144
- success=False,
145
- error="Failed to parse PDF"
146
- )
147
-
148
- assert result.source == "https://example.com/doc.pdf"
149
- assert result.success is False
150
- assert result.error == "Failed to parse PDF"
151
- assert result.data is None
152
-
153
-
154
- def test_model_validation():
155
- """Test Pydantic model validation."""
156
- # Test valid SRI values (1-7)
157
- for sri in range(1, 8):
158
- priips = PriipsFields(sri=sri)
159
- assert priips.sri == sri
160
-
161
- # Test that SRI can be None (optional field)
162
- priips = PriipsFields()
163
- assert priips.sri is None