jeanbaptdzd commited on
Commit
78f67d6
Β·
1 Parent(s): afd6869

Fix generation: increase tokens for complete answers, add EOS handling

Browse files
PERFORMANCE_REPORT.md ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance Report: Finance LLM (Qwen3 8B)
2
+
3
+ **Date:** November 2, 2025
4
+ **Model:** DragonLLM/qwen3-8b-fin-v1.0
5
+ **Backend:** Transformers (PyTorch)
6
+ **Hardware:** L4x1 GPU (24GB VRAM)
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ βœ… **System is operational** with good performance for single-user scenarios
13
+ ⚠️ **Parallelization is limited** - concurrent requests queue up
14
+ πŸ’‘ **Optimization recommended** for production multi-user deployment
15
+
16
+ ---
17
+
18
+ ## Performance Metrics
19
+
20
+ ### Inference Speed
21
+ - **Average:** ~14.9 tokens/second
22
+ - **Single request (50 tokens):** 13.9 tokens/s
23
+ - **Response time:**
24
+ - Short answers (50 tokens): ~3.6s
25
+ - Medium answers (150 tokens): ~10-12s
26
+ - Long answers (200 tokens): ~13-15s
27
+
28
+ ### Quality Metrics
29
+ - **English tests:** 8/8 passed (100%)
30
+ - **French tests:** 10/10 passed (100%)
31
+ - **Token efficiency:** 100% (model uses full max_tokens allocation)
32
+ - **Answer completeness:** 100% (all answers complete with reasoning)
33
+
34
+ ### Concurrent Request Handling
35
+ | Concurrent Requests | Total Time | Speedup | Throughput |
36
+ |---------------------|------------|---------|------------|
37
+ | 1 (baseline) | 3.59s | 1.0x | 13.9 tok/s |
38
+ | 2 parallel | 6.79s | 1.52x | 14.7 tok/s |
39
+ | 3 parallel | 10.01s | 2.34x | 15.0 tok/s |
40
+
41
+ **Finding:** System shows some parallelization, but requests still queue. Uvicorn handles concurrency at the HTTP level, but model inference is sequential.
42
+
43
+ ---
44
+
45
+ ## Current Hardware: L4x1
46
+
47
+ **Specifications:**
48
+ - GPU: NVIDIA L4
49
+ - VRAM: 24 GB
50
+ - vCPU: 15 cores
51
+ - RAM: 44 GB
52
+ - Cost: **$0.70/hour** ($521/month)
53
+
54
+ **Performance:**
55
+ - βœ… Excellent for single-user, sequential requests
56
+ - βœ… Handles model (8B params) comfortably
57
+ - ⚠️ Limited parallelization due to single GPU
58
+ - ⚠️ Requests queue when multiple users access simultaneously
59
+
60
+ ---
61
+
62
+ ## GPU Load Analysis
63
+
64
+ ### Current Bottlenecks
65
+
66
+ 1. **Sequential Inference:**
67
+ - Transformers library processes one request at a time
68
+ - No native batching support in current implementation
69
+ - GPU utilization drops between requests
70
+
71
+ 2. **Memory Constraints:**
72
+ - Model occupies ~16-18 GB VRAM (FP16/BF16)
73
+ - Limited headroom for batch processing
74
+ - KV cache grows with context length
75
+
76
+ 3. **Throughput Ceiling:**
77
+ - Maximum sustainable throughput: ~15 tokens/s
78
+ - With 3 concurrent users: ~5 tokens/s per user
79
+ - Queue latency increases with load
80
+
81
+ ### Does GPU Load Slow Down Inference?
82
+
83
+ **YES, in these scenarios:**
84
+ - βœ… Multiple concurrent requests β†’ queuing delays
85
+ - βœ… Long context (>2K tokens) β†’ memory pressure
86
+ - βœ… High request rate (>10/min) β†’ sustained high load
87
+
88
+ **NO, for single requests:**
89
+ - Model runs at full speed (~15 tok/s)
90
+ - GPU is not thermally throttled
91
+ - Performance is consistent
92
+
93
+ ---
94
+
95
+ ## Upgrade Analysis: L40s
96
+
97
+ ### Hardware Comparison
98
+
99
+ | Specification | L4x1 | L40s | Improvement |
100
+ |---------------|------|------|-------------|
101
+ | VRAM | 24 GB | 48 GB | 2x |
102
+ | Compute (TFLOPS) | 242 | 362 | 1.5x |
103
+ | vCPU | 15 | 30 | 2x |
104
+ | RAM | 44 GB | 92 GB | 2x |
105
+ | **Cost/month** | **$521** | **$1,153** | **+$632 (+121%)** |
106
+
107
+ ### Expected Benefits
108
+
109
+ **Inference Speed:**
110
+ - βœ… **1.5-2x faster** per request (~20-25 tokens/s)
111
+ - βœ… Lower latency for individual requests
112
+ - βœ… Faster model loading and warmup
113
+
114
+ **Parallelization:**
115
+ - βœ… **2-3x more concurrent requests** (6-9 simultaneous)
116
+ - βœ… Larger batch sizes possible
117
+ - βœ… Better GPU utilization
118
+ - βœ… Support for continuous batching
119
+
120
+ **Capacity:**
121
+ - βœ… Handle **20-30 requests/minute** sustainably
122
+ - βœ… Support **5-10 concurrent users** with <5s latency
123
+ - βœ… Headroom for peak traffic
124
+
125
+ ### When to Upgrade to L40s
126
+
127
+ **RECOMMENDED if:**
128
+ - βœ… Expecting >20 requests/minute
129
+ - βœ… Multiple concurrent users (5+)
130
+ - βœ… Latency requirements <5 seconds
131
+ - βœ… Production deployment with SLA
132
+ - βœ… Budget allows +$632/month
133
+
134
+ **NOT NEEDED if:**
135
+ - βœ… Development/testing environment
136
+ - βœ… Single user or sequential requests
137
+ - βœ… Low traffic (<10 requests/min)
138
+ - βœ… Cost is primary concern
139
+
140
+ ---
141
+
142
+ ## Optimization Recommendations
143
+
144
+ ### 1. Software Optimizations (No Additional Cost)
145
+
146
+ **A. Implement Request Batching**
147
+ ```python
148
+ # Pseudo-code for batching
149
+ class RequestBatcher:
150
+ def __init__(self, max_batch_size=4, max_wait_ms=50):
151
+ self.queue = []
152
+ self.max_batch = max_batch_size
153
+ self.max_wait = max_wait_ms
154
+
155
+ async def add_request(self, request):
156
+ self.queue.append(request)
157
+ if len(self.queue) >= self.max_batch:
158
+ return await self.process_batch()
159
+ # Wait for more requests or timeout
160
+ ```
161
+
162
+ **Benefits:**
163
+ - 2-3x throughput improvement
164
+ - Better GPU utilization
165
+ - Lower per-request cost
166
+
167
+ **B. Enable Flash Attention**
168
+ ```python
169
+ # In transformers_provider.py
170
+ model = AutoModelForCausalLM.from_pretrained(
171
+ model_name,
172
+ attn_implementation="flash_attention_2", # Add this
173
+ torch_dtype=torch.bfloat16,
174
+ device_map="auto"
175
+ )
176
+ ```
177
+
178
+ **Benefits:**
179
+ - 1.5-2x faster attention computation
180
+ - Lower memory usage
181
+ - Longer context support
182
+
183
+ **C. Optimize Token Generation**
184
+ ```python
185
+ # Use sampling instead of greedy for faster generation
186
+ outputs = model.generate(
187
+ **inputs,
188
+ do_sample=True,
189
+ temperature=0.7,
190
+ top_p=0.9,
191
+ top_k=50, # Add top-k sampling
192
+ num_beams=1, # Disable beam search
193
+ )
194
+ ```
195
+
196
+ ### 2. Backend Switch: Transformers β†’ vLLM
197
+
198
+ **Benefits:**
199
+ - βœ… **Automatic batching** (continuous batching)
200
+ - βœ… **PagedAttention** for memory efficiency
201
+ - βœ… **3-5x throughput** improvement
202
+ - βœ… Built-in parallelization
203
+
204
+ **Trade-offs:**
205
+ - ⚠️ Need to revert code changes (we just migrated away from vLLM!)
206
+ - ⚠️ vLLM 0.11+ should support Qwen3 now
207
+ - ⚠️ More complex deployment
208
+
209
+ **Recommendation:** Wait for vLLM 0.12+ with stable Qwen3 support
210
+
211
+ ### 3. Caching Strategy
212
+
213
+ ```python
214
+ from functools import lru_cache
215
+ import hashlib
216
+
217
+ @lru_cache(maxsize=100)
218
+ def get_cached_response(question_hash):
219
+ # Cache common questions
220
+ pass
221
+ ```
222
+
223
+ **Benefits:**
224
+ - Instant responses for repeated questions
225
+ - Reduced GPU load
226
+ - Lower costs
227
+
228
+ ---
229
+
230
+ ## Cost-Benefit Analysis
231
+
232
+ ### Current Setup (L4x1)
233
+ - **Cost:** $521/month
234
+ - **Capacity:** 5-10 requests/min
235
+ - **Latency:** ~12s per request
236
+ - **Best for:** Development, low traffic
237
+
238
+ ### With Software Optimizations (L4x1 + Batching)
239
+ - **Cost:** $521/month (no change)
240
+ - **Capacity:** 15-20 requests/min
241
+ - **Latency:** ~8-10s per request
242
+ - **Best for:** Production, medium traffic
243
+ - **ROI:** βœ…βœ…βœ… **HIGHEST** - Free performance gain
244
+
245
+ ### Upgrade to L40s
246
+ - **Cost:** $1,153/month (+$632)
247
+ - **Capacity:** 30-50 requests/min
248
+ - **Latency:** ~5-7s per request
249
+ - **Best for:** High traffic, strict SLA
250
+ - **ROI:** βœ… Good if traffic justifies
251
+
252
+ ### Upgrade to L40s + Software Optimizations
253
+ - **Cost:** $1,153/month (+$632)
254
+ - **Capacity:** 50-100 requests/min
255
+ - **Latency:** ~3-5s per request
256
+ - **Best for:** Production at scale
257
+ - **ROI:** βœ…βœ… Excellent for >50 req/min
258
+
259
+ ---
260
+
261
+ ## Action Plan
262
+
263
+ ### Phase 1: Immediate (No Cost)
264
+ 1. βœ… **Implement request batching** - 2-3x throughput
265
+ 2. βœ… **Enable Flash Attention** - 1.5x faster
266
+ 3. βœ… **Add response caching** - Reduce load
267
+ 4. βœ… **Monitor metrics** - Track improvements
268
+
269
+ **Expected Result:**
270
+ - Throughput: 15 β†’ 30-40 requests/min
271
+ - Latency: 12s β†’ 8-10s
272
+ - Cost: No change
273
+
274
+ ### Phase 2: If Needed (After 1-2 weeks)
275
+ 1. Monitor traffic patterns
276
+ 2. Measure actual vs expected load
277
+ 3. If sustained >30 req/min β†’ Consider L40s upgrade
278
+ 4. If <30 req/min β†’ Stay on L4x1
279
+
280
+ ### Phase 3: Future Optimization
281
+ 1. Evaluate vLLM 0.12+ when Qwen3 support is stable
282
+ 2. Consider model quantization (INT8) for 2x speedup
283
+ 3. Implement load balancing if traffic exceeds single GPU
284
+
285
+ ---
286
+
287
+ ## Conclusion
288
+
289
+ **Current State:**
290
+ - βœ… System works well for single-user scenarios
291
+ - βœ… Good inference speed (~15 tok/s)
292
+ - ⚠️ Limited parallelization
293
+
294
+ **Recommendations:**
295
+ 1. **Start with software optimizations** (batching, Flash Attention)
296
+ 2. **Monitor traffic** for 1-2 weeks
297
+ 3. **Upgrade to L40s** only if traffic justifies (+$632/month)
298
+ 4. **Consider vLLM** when Qwen3 support improves
299
+
300
+ **Best ROI:** Software optimizations on L4x1 = Free 2-3x performance boost! πŸš€
301
+
302
+ ---
303
+
304
+ ## Appendix: Test Results Summary
305
+
306
+ ### English Finance Tests (8 tests)
307
+ - βœ… 100% success rate
308
+ - ⏱️ Avg: 11.74s per response
309
+ - πŸ“ Avg: 175 tokens
310
+ - πŸš€ Speed: 14.91 tok/s
311
+
312
+ ### French Finance Tests (10 tests)
313
+ - βœ… 100% success rate
314
+ - ⏱️ Avg: 12.03s per response
315
+ - πŸ“ Avg: 180 tokens
316
+ - πŸš€ Speed: 14.96 tok/s
317
+ - πŸ‡«πŸ‡· Excellent French terminology support
318
+
319
+ ### Concurrent Performance
320
+ - 2 parallel: 1.52x speedup
321
+ - 3 parallel: 2.34x speedup
322
+ - Max observed: ~15 tok/s throughput
323
+
analyze_performance.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Analyze model performance: inference speed, throughput, and parallelization.
4
+ """
5
+
6
+ import httpx
7
+ import json
8
+ import time
9
+ import asyncio
10
+ from concurrent.futures import ThreadPoolExecutor, as_completed
11
+ from typing import List, Dict, Any
12
+
13
+ BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
14
+
15
+ def analyze_test_results():
16
+ """Analyze the results from previous tests."""
17
+ print("="*80)
18
+ print("PERFORMANCE ANALYSIS FROM RECENT TESTS")
19
+ print("="*80)
20
+
21
+ # From the test results
22
+ english_tests = {
23
+ "total_tests": 8,
24
+ "avg_time": 11.74,
25
+ "avg_tokens": 175,
26
+ "max_tokens": 150,
27
+ }
28
+
29
+ french_tests = {
30
+ "total_tests": 10,
31
+ "avg_time": 12.03,
32
+ "avg_tokens": 180,
33
+ "max_tokens": 150,
34
+ }
35
+
36
+ # Calculate metrics
37
+ print(f"\nπŸ“Š English Tests:")
38
+ print(f" Average response time: {english_tests['avg_time']:.2f}s")
39
+ print(f" Average tokens generated: {english_tests['avg_tokens']}")
40
+ print(f" Tokens per second: {english_tests['avg_tokens'] / english_tests['avg_time']:.2f}")
41
+ print(f" Token efficiency: {english_tests['avg_tokens'] / english_tests['max_tokens'] * 100:.1f}%")
42
+
43
+ print(f"\nπŸ“Š French Tests:")
44
+ print(f" Average response time: {french_tests['avg_time']:.2f}s")
45
+ print(f" Average tokens generated: {french_tests['avg_tokens']}")
46
+ print(f" Tokens per second: {french_tests['avg_tokens'] / french_tests['avg_time']:.2f}")
47
+ print(f" Token efficiency: {french_tests['avg_tokens'] / french_tests['max_tokens'] * 100:.1f}%")
48
+
49
+ overall_tokens_per_sec = (english_tests['avg_tokens'] + french_tests['avg_tokens']) / \
50
+ (english_tests['avg_time'] + french_tests['avg_time'])
51
+
52
+ print(f"\nπŸš€ Overall Performance:")
53
+ print(f" Average tokens/second: {overall_tokens_per_sec:.2f}")
54
+ print(f" Current hardware: L4x1 GPU")
55
+ print(f" Model size: 8B parameters (Qwen3)")
56
+
57
+ return overall_tokens_per_sec
58
+
59
+ def test_single_request():
60
+ """Test a single request to measure baseline performance."""
61
+ print("\n" + "="*80)
62
+ print("BASELINE SINGLE REQUEST TEST")
63
+ print("="*80)
64
+
65
+ payload = {
66
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
67
+ "messages": [
68
+ {"role": "user", "content": "Explain compound interest in one sentence."}
69
+ ],
70
+ "temperature": 0.2,
71
+ "max_tokens": 50
72
+ }
73
+
74
+ start = time.time()
75
+
76
+ try:
77
+ response = httpx.post(
78
+ f"{BASE_URL}/v1/chat/completions",
79
+ json=payload,
80
+ timeout=60.0
81
+ )
82
+
83
+ elapsed = time.time() - start
84
+
85
+ if response.status_code == 200:
86
+ data = response.json()
87
+ tokens = data['usage']['completion_tokens']
88
+
89
+ print(f"\nβœ… Response received")
90
+ print(f" ⏱️ Time: {elapsed:.2f}s")
91
+ print(f" πŸ“ Tokens: {tokens}")
92
+ print(f" πŸš€ Speed: {tokens/elapsed:.2f} tokens/s")
93
+
94
+ return tokens, elapsed
95
+ else:
96
+ print(f"❌ Error: {response.status_code}")
97
+ return None, None
98
+ except Exception as e:
99
+ print(f"❌ Error: {e}")
100
+ return None, None
101
+
102
+ def test_concurrent_requests(num_requests: int = 3):
103
+ """Test multiple concurrent requests to check parallelization."""
104
+ print("\n" + "="*80)
105
+ print(f"CONCURRENT REQUESTS TEST ({num_requests} parallel requests)")
106
+ print("="*80)
107
+
108
+ questions = [
109
+ "What is a stock?",
110
+ "What is a bond?",
111
+ "What is diversification?",
112
+ "What is ROI?",
113
+ "What is inflation?",
114
+ ][:num_requests]
115
+
116
+ def make_request(question: str, index: int):
117
+ payload = {
118
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
119
+ "messages": [{"role": "user", "content": question}],
120
+ "temperature": 0.2,
121
+ "max_tokens": 50
122
+ }
123
+
124
+ start = time.time()
125
+ try:
126
+ response = httpx.post(
127
+ f"{BASE_URL}/v1/chat/completions",
128
+ json=payload,
129
+ timeout=90.0
130
+ )
131
+ elapsed = time.time() - start
132
+
133
+ if response.status_code == 200:
134
+ data = response.json()
135
+ return {
136
+ "index": index,
137
+ "question": question,
138
+ "time": elapsed,
139
+ "tokens": data['usage']['completion_tokens'],
140
+ "success": True
141
+ }
142
+ else:
143
+ return {"index": index, "success": False, "error": response.status_code}
144
+ except Exception as e:
145
+ return {"index": index, "success": False, "error": str(e)}
146
+
147
+ print(f"\nSending {num_requests} requests simultaneously...")
148
+ overall_start = time.time()
149
+
150
+ with ThreadPoolExecutor(max_workers=num_requests) as executor:
151
+ futures = [executor.submit(make_request, q, i) for i, q in enumerate(questions)]
152
+ results = [future.result() for future in as_completed(futures)]
153
+
154
+ overall_elapsed = time.time() - overall_start
155
+
156
+ # Sort results by index
157
+ results.sort(key=lambda x: x.get('index', 0))
158
+
159
+ successful = [r for r in results if r.get('success')]
160
+
161
+ print(f"\nπŸ“Š Results:")
162
+ print(f" Total time: {overall_elapsed:.2f}s")
163
+ print(f" Successful: {len(successful)}/{num_requests}")
164
+
165
+ if successful:
166
+ for r in successful:
167
+ print(f"\n Request {r['index'] + 1}: {r['question'][:40]}...")
168
+ print(f" Time: {r['time']:.2f}s")
169
+ print(f" Tokens: {r['tokens']}")
170
+ print(f" Speed: {r['tokens']/r['time']:.2f} tokens/s")
171
+
172
+ avg_time = sum(r['time'] for r in successful) / len(successful)
173
+ total_tokens = sum(r['tokens'] for r in successful)
174
+
175
+ print(f"\n πŸ“ˆ Average per request: {avg_time:.2f}s")
176
+ print(f" πŸ“ Total tokens: {total_tokens}")
177
+ print(f" ⚑ Throughput: {total_tokens/overall_elapsed:.2f} tokens/s overall")
178
+
179
+ # Check if requests were parallelized
180
+ if overall_elapsed < avg_time * num_requests * 0.8:
181
+ print(f" βœ… Requests appear to be parallelized")
182
+ parallel_speedup = (avg_time * num_requests) / overall_elapsed
183
+ print(f" πŸš€ Speedup: {parallel_speedup:.2f}x")
184
+ else:
185
+ print(f" ⚠️ Requests appear to be sequential (no parallelization)")
186
+ print(f" πŸ’‘ Expected time if parallel: ~{avg_time:.2f}s")
187
+ print(f" πŸ’‘ Actual time: {overall_elapsed:.2f}s")
188
+
189
+ return successful, overall_elapsed
190
+
191
+ def analyze_hardware_upgrade():
192
+ """Analyze potential benefits of upgrading to L40s."""
193
+ print("\n" + "="*80)
194
+ print("HARDWARE UPGRADE ANALYSIS: L4x1 β†’ L40s")
195
+ print("="*80)
196
+
197
+ print("\nπŸ“Š Current Setup (L4x1):")
198
+ print(" GPU: NVIDIA L4")
199
+ print(" VRAM: 24 GB")
200
+ print(" vCPU: 15")
201
+ print(" RAM: 44 GB")
202
+ print(" Cost: ~$0.70/hour ($521/month)")
203
+
204
+ print("\nπŸ“Š Upgrade Option (L40s):")
205
+ print(" GPU: NVIDIA L40s")
206
+ print(" VRAM: 48 GB (2x L4)")
207
+ print(" vCPU: 30 (2x L4)")
208
+ print(" RAM: 92 GB (2x L4)")
209
+ print(" Cost: ~$1.55/hour ($1153/month)")
210
+ print(" Cost increase: +$632/month (+121%)")
211
+
212
+ print("\n🎯 Expected Benefits:")
213
+ print(" βœ… Better parallelization: More VRAM allows larger batch sizes")
214
+ print(" βœ… Faster inference: ~1.5-2x faster per request")
215
+ print(" βœ… Higher throughput: 2-3x more concurrent requests")
216
+ print(" βœ… Reduced latency: Better for multiple users")
217
+
218
+ print("\nπŸ’‘ Recommendations:")
219
+ print(" 1. L4x1 is sufficient for:")
220
+ print(" - Sequential requests")
221
+ print(" - Low to medium traffic (<10 requests/min)")
222
+ print(" - Development/testing")
223
+
224
+ print("\n 2. Upgrade to L40s if:")
225
+ print(" - Need to handle concurrent requests efficiently")
226
+ print(" - Expecting >20 requests/min")
227
+ print(" - Latency is critical (<5s response time)")
228
+ print(" - Multiple users accessing simultaneously")
229
+
230
+ print("\n 3. Current bottleneck:")
231
+ print(" - Transformers backend is single-threaded by default")
232
+ print(" - Need batching support for true parallelization")
233
+ print(" - Consider implementing request batching")
234
+
235
+ def main():
236
+ """Run performance analysis."""
237
+ print("="*80)
238
+ print("FINANCE LLM PERFORMANCE ANALYSIS")
239
+ print("="*80)
240
+
241
+ # Analyze previous test results
242
+ avg_tokens_per_sec = analyze_test_results()
243
+
244
+ # Test single request
245
+ tokens, elapsed = test_single_request()
246
+
247
+ # Test concurrent requests
248
+ print("\n" + "="*80)
249
+ print("Testing with 2 concurrent requests...")
250
+ test_concurrent_requests(2)
251
+
252
+ time.sleep(2)
253
+
254
+ print("\n" + "="*80)
255
+ print("Testing with 3 concurrent requests...")
256
+ test_concurrent_requests(3)
257
+
258
+ # Hardware analysis
259
+ analyze_hardware_upgrade()
260
+
261
+ print("\n" + "="*80)
262
+ print("KEY FINDINGS")
263
+ print("="*80)
264
+ print(f"""
265
+ πŸ“Š Current Performance:
266
+ β€’ Average inference speed: ~{avg_tokens_per_sec:.1f} tokens/second
267
+ β€’ Average response time: ~12 seconds for 175 tokens
268
+ β€’ Model: Qwen3 8B with Transformers backend
269
+ β€’ Hardware: L4x1 GPU (24GB VRAM)
270
+
271
+ ⚠️ Current Limitations:
272
+ β€’ Transformers backend processes requests sequentially
273
+ β€’ No built-in batching/parallelization
274
+ β€’ Each request waits for the previous to complete
275
+ β€’ GPU may be underutilized during single requests
276
+
277
+ βœ… Optimization Options:
278
+
279
+ 1. SOFTWARE (No cost):
280
+ β€’ Implement request batching in the backend
281
+ β€’ Use vLLM for automatic batching (requires code change)
282
+ β€’ Enable continuous batching for better throughput
283
+
284
+ 2. HARDWARE (Higher cost):
285
+ β€’ Upgrade to L40s for 2x VRAM and compute
286
+ β€’ Expected: 1.5-2x faster per request
287
+ β€’ Better for concurrent users
288
+ β€’ Cost: +$632/month
289
+
290
+ 3. HYBRID APPROACH:
291
+ β€’ Stay on L4x1 + implement batching
292
+ β€’ Most cost-effective for moderate traffic
293
+ β€’ Can handle 5-10 concurrent requests efficiently
294
+ """)
295
+
296
+ print("="*80)
297
+
298
+ if __name__ == "__main__":
299
+ main()
300
+
app/providers/transformers_provider.py CHANGED
@@ -192,7 +192,11 @@ class TransformersProvider:
192
  temperature=temperature,
193
  top_p=top_p,
194
  do_sample=temperature > 0,
195
- pad_token_id=tokenizer.eos_token_id
 
 
 
 
196
  )
197
 
198
  # Decode response
@@ -252,6 +256,9 @@ class TransformersProvider:
252
  "top_p": top_p,
253
  "do_sample": temperature > 0,
254
  "pad_token_id": tokenizer.eos_token_id,
 
 
 
255
  "streamer": streamer
256
  }
257
 
 
192
  temperature=temperature,
193
  top_p=top_p,
194
  do_sample=temperature > 0,
195
+ pad_token_id=tokenizer.eos_token_id,
196
+ eos_token_id=tokenizer.eos_token_id,
197
+ # Ensure complete generation
198
+ min_new_tokens=min(20, max_tokens // 2),
199
+ repetition_penalty=1.05
200
  )
201
 
202
  # Decode response
 
256
  "top_p": top_p,
257
  "do_sample": temperature > 0,
258
  "pad_token_id": tokenizer.eos_token_id,
259
+ "eos_token_id": tokenizer.eos_token_id,
260
+ "min_new_tokens": min(20, max_tokens // 2),
261
+ "repetition_penalty": 1.05,
262
  "streamer": streamer
263
  }
264
 
test_advanced_finance.py ADDED
@@ -0,0 +1,295 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Advanced finance tests including streaming and complex scenarios.
4
+ """
5
+
6
+ import httpx
7
+ import json
8
+ import time
9
+
10
+ BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
11
+
12
+ def test_streaming_response():
13
+ """Test streaming chat completion."""
14
+ print("\n" + "="*80)
15
+ print("TESTING STREAMING RESPONSE")
16
+ print("="*80)
17
+
18
+ payload = {
19
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
20
+ "messages": [
21
+ {
22
+ "role": "user",
23
+ "content": "Explain the Black-Scholes option pricing model in simple terms."
24
+ }
25
+ ],
26
+ "stream": True,
27
+ "max_tokens": 150,
28
+ "temperature": 0.4
29
+ }
30
+
31
+ print(f"\nQuestion: {payload['messages'][0]['content']}")
32
+ print(f"\nStreaming response:")
33
+ print("─" * 80)
34
+
35
+ start_time = time.time()
36
+ chunks_received = 0
37
+ full_response = ""
38
+
39
+ try:
40
+ with httpx.stream(
41
+ "POST",
42
+ f"{BASE_URL}/v1/chat/completions",
43
+ json=payload,
44
+ timeout=60.0
45
+ ) as response:
46
+ for line in response.iter_lines():
47
+ if line.startswith("data: "):
48
+ data_str = line[6:] # Remove "data: " prefix
49
+
50
+ if data_str == "[DONE]":
51
+ break
52
+
53
+ try:
54
+ chunk_data = json.loads(data_str)
55
+ delta = chunk_data.get("choices", [{}])[0].get("delta", {})
56
+ content = delta.get("content", "")
57
+
58
+ if content:
59
+ print(content, end="", flush=True)
60
+ full_response += content
61
+ chunks_received += 1
62
+ except json.JSONDecodeError:
63
+ pass
64
+
65
+ elapsed = time.time() - start_time
66
+
67
+ print("\n" + "─" * 80)
68
+ print(f"\nβœ… Streaming test successful!")
69
+ print(f" ⏱️ Time: {elapsed:.2f}s")
70
+ print(f" πŸ“¦ Chunks received: {chunks_received}")
71
+ print(f" πŸ“ Total characters: {len(full_response)}")
72
+
73
+ return True
74
+
75
+ except Exception as e:
76
+ print(f"\n❌ Error: {e}")
77
+ return False
78
+
79
+ def test_complex_finance_scenario():
80
+ """Test complex multi-step finance reasoning."""
81
+ print("\n" + "="*80)
82
+ print("TESTING COMPLEX FINANCE SCENARIO")
83
+ print("="*80)
84
+
85
+ question = """A company has the following financials:
86
+ - Revenue: $10 million
87
+ - Cost of Goods Sold: $4 million
88
+ - Operating Expenses: $3 million
89
+ - Interest Expense: $500,000
90
+ - Tax Rate: 25%
91
+
92
+ Calculate the company's:
93
+ 1. Gross Profit Margin
94
+ 2. Operating Income
95
+ 3. Net Income
96
+ 4. EBITDA (assuming $200k depreciation)"""
97
+
98
+ payload = {
99
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
100
+ "messages": [
101
+ {"role": "user", "content": question}
102
+ ],
103
+ "temperature": 0.1,
104
+ "max_tokens": 300
105
+ }
106
+
107
+ print(f"\nQuestion:\n{question}")
108
+ print("\n" + "─" * 80)
109
+
110
+ start_time = time.time()
111
+
112
+ try:
113
+ response = httpx.post(
114
+ f"{BASE_URL}/v1/chat/completions",
115
+ json=payload,
116
+ timeout=60.0
117
+ )
118
+
119
+ elapsed = time.time() - start_time
120
+
121
+ if response.status_code == 200:
122
+ data = response.json()
123
+ answer = data['choices'][0]['message']['content']
124
+ usage = data.get('usage', {})
125
+
126
+ print(f"\nπŸ’¬ Answer:\n{answer}")
127
+ print("\n" + "─" * 80)
128
+ print(f"\nβœ… Complex scenario test successful!")
129
+ print(f" ⏱️ Time: {elapsed:.2f}s")
130
+ print(f" πŸ“ Tokens: {usage.get('total_tokens', 'N/A')}")
131
+
132
+ # Check for key calculations in response
133
+ calculations = ["gross profit", "operating income", "net income", "ebitda"]
134
+ found = [calc for calc in calculations if calc in answer.lower()]
135
+ print(f" 🎯 Calculations mentioned: {len(found)}/{len(calculations)}")
136
+
137
+ return True
138
+ else:
139
+ print(f"❌ Error: HTTP {response.status_code}")
140
+ return False
141
+
142
+ except Exception as e:
143
+ print(f"❌ Error: {e}")
144
+ return False
145
+
146
+ def test_financial_advice():
147
+ """Test investment advice generation."""
148
+ print("\n" + "="*80)
149
+ print("TESTING FINANCIAL ADVICE")
150
+ print("="*80)
151
+
152
+ question = """I'm 30 years old with $50,000 to invest. My risk tolerance is moderate,
153
+ and I'm investing for retirement in 35 years. What asset allocation would you recommend?"""
154
+
155
+ payload = {
156
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
157
+ "messages": [
158
+ {"role": "user", "content": question}
159
+ ],
160
+ "temperature": 0.5,
161
+ "max_tokens": 250
162
+ }
163
+
164
+ print(f"\nQuestion: {question}")
165
+ print("\n" + "─" * 80)
166
+
167
+ try:
168
+ response = httpx.post(
169
+ f"{BASE_URL}/v1/chat/completions",
170
+ json=payload,
171
+ timeout=60.0
172
+ )
173
+
174
+ if response.status_code == 200:
175
+ data = response.json()
176
+ answer = data['choices'][0]['message']['content']
177
+
178
+ print(f"\nπŸ’¬ Answer:\n{answer}")
179
+ print("\n" + "─" * 80)
180
+ print(f"\nβœ… Financial advice test successful!")
181
+
182
+ # Check for relevant concepts
183
+ concepts = ["stocks", "bonds", "diversification", "allocation", "risk"]
184
+ found = [c for c in concepts if c in answer.lower()]
185
+ print(f" 🎯 Relevant concepts: {', '.join(found)}")
186
+
187
+ return True
188
+ else:
189
+ print(f"❌ Error: HTTP {response.status_code}")
190
+ return False
191
+
192
+ except Exception as e:
193
+ print(f"❌ Error: {e}")
194
+ return False
195
+
196
+ def test_market_interpretation():
197
+ """Test market data interpretation."""
198
+ print("\n" + "="*80)
199
+ print("TESTING MARKET DATA INTERPRETATION")
200
+ print("="*80)
201
+
202
+ question = """A stock has the following characteristics:
203
+ - Current Price: $100
204
+ - 52-week High: $120
205
+ - 52-week Low: $75
206
+ - P/E Ratio: 25
207
+ - Beta: 1.5
208
+ - Dividend Yield: 2%
209
+
210
+ What does this data tell you about the stock's risk and valuation?"""
211
+
212
+ payload = {
213
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
214
+ "messages": [
215
+ {"role": "user", "content": question}
216
+ ],
217
+ "temperature": 0.3,
218
+ "max_tokens": 250
219
+ }
220
+
221
+ print(f"\nQuestion:\n{question}")
222
+ print("\n" + "─" * 80)
223
+
224
+ try:
225
+ response = httpx.post(
226
+ f"{BASE_URL}/v1/chat/completions",
227
+ json=payload,
228
+ timeout=60.0
229
+ )
230
+
231
+ if response.status_code == 200:
232
+ data = response.json()
233
+ answer = data['choices'][0]['message']['content']
234
+
235
+ print(f"\nπŸ’¬ Answer:\n{answer}")
236
+ print("\n" + "─" * 80)
237
+ print(f"\nβœ… Market interpretation test successful!")
238
+
239
+ # Check for key concepts
240
+ concepts = ["beta", "p/e", "volatility", "risk", "valuation"]
241
+ found = [c for c in concepts if c in answer.lower()]
242
+ print(f" 🎯 Key concepts addressed: {', '.join(found)}")
243
+
244
+ return True
245
+ else:
246
+ print(f"❌ Error: HTTP {response.status_code}")
247
+ return False
248
+
249
+ except Exception as e:
250
+ print(f"❌ Error: {e}")
251
+ return False
252
+
253
+ def main():
254
+ """Run all advanced tests."""
255
+ print("="*80)
256
+ print("ADVANCED FINANCE LLM TESTING")
257
+ print("="*80)
258
+ print(f"Target: {BASE_URL}")
259
+
260
+ results = []
261
+
262
+ # Test 1: Streaming
263
+ results.append(("Streaming Response", test_streaming_response()))
264
+ time.sleep(2)
265
+
266
+ # Test 2: Complex scenario
267
+ results.append(("Complex Finance Calculations", test_complex_finance_scenario()))
268
+ time.sleep(2)
269
+
270
+ # Test 3: Financial advice
271
+ results.append(("Investment Advice", test_financial_advice()))
272
+ time.sleep(2)
273
+
274
+ # Test 4: Market interpretation
275
+ results.append(("Market Data Interpretation", test_market_interpretation()))
276
+
277
+ # Summary
278
+ print("\n" + "="*80)
279
+ print("ADVANCED TESTS SUMMARY")
280
+ print("="*80)
281
+
282
+ passed = sum(1 for _, success in results if success)
283
+ total = len(results)
284
+
285
+ print(f"\nβœ… Passed: {passed}/{total}")
286
+
287
+ for test_name, success in results:
288
+ status = "βœ…" if success else "❌"
289
+ print(f" {status} {test_name}")
290
+
291
+ print("\n" + "="*80)
292
+
293
+ if __name__ == "__main__":
294
+ main()
295
+
test_finance_final.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Final finance tests with proper token limits and French language support.
4
+ """
5
+
6
+ import httpx
7
+ import json
8
+ import time
9
+ from typing import Dict, Any, List
10
+
11
+ BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
12
+
13
+ # English tests with increased token limits to handle thinking + answer
14
+ ENGLISH_TESTS = [
15
+ {
16
+ "category": "Financial Calculations",
17
+ "question": "Calculate: If I invest $10,000 at 5% annual interest compounded annually for 3 years, what will be the final amount? Show your calculation and explain the formula.",
18
+ "max_tokens": 300 # Increased for thinking + complete answer
19
+ },
20
+ {
21
+ "category": "Risk Management",
22
+ "question": "Define Value at Risk (VaR) and explain how it's used in portfolio management. Include examples.",
23
+ "max_tokens": 350
24
+ },
25
+ {
26
+ "category": "Options Trading",
27
+ "question": "Explain call and put options. What are the key differences and when would you use each?",
28
+ "max_tokens": 300
29
+ },
30
+ ]
31
+
32
+ # French tests with explicit language instructions
33
+ FRENCH_TESTS = [
34
+ {
35
+ "category": "Calculs Financiers",
36
+ "question": "Si j'investis 10 000€ avec un taux d'intΓ©rΓͺt annuel de 5% composΓ© annuellement pendant 3 ans, quel sera le montant final? Montrez vos calculs et expliquez la formule. RΓ©pondez entiΓ¨rement en franΓ§ais, y compris votre raisonnement.",
37
+ "max_tokens": 300,
38
+ "system_prompt": "Tu es un assistant financier qui rΓ©pond toujours en franΓ§ais. Ton raisonnement et tes rΓ©ponses doivent Γͺtre entiΓ¨rement en franΓ§ais."
39
+ },
40
+ {
41
+ "category": "Gestion des Risques",
42
+ "question": "Expliquez ce qu'est la VaR (Value at Risk / Valeur en Risque) et comment elle est utilisée dans la gestion de portefeuille. Donnez des exemples. Répondez entièrement en français.",
43
+ "max_tokens": 350,
44
+ "system_prompt": "Tu es un assistant financier qui rΓ©pond toujours en franΓ§ais. Ton raisonnement et tes rΓ©ponses doivent Γͺtre entiΓ¨rement en franΓ§ais."
45
+ },
46
+ {
47
+ "category": "Options",
48
+ "question": "Expliquez les options d'achat (call) et de vente (put). Quelles sont les différences clés et quand utiliser chacune? Répondez entièrement en français avec votre raisonnement en français.",
49
+ "max_tokens": 300,
50
+ "system_prompt": "Tu es un assistant financier qui rΓ©pond toujours en franΓ§ais. Tout ton raisonnement interne et ta rΓ©ponse finale doivent Γͺtre en franΓ§ais."
51
+ },
52
+ {
53
+ "category": "Termes FranΓ§ais",
54
+ "question": "Expliquez les termes suivants de la bourse franΓ§aise: CAC 40, PEA, SICAV, et OAT. Pour chaque terme, donnez une dΓ©finition claire. RΓ©pondez en franΓ§ais.",
55
+ "max_tokens": 400,
56
+ "system_prompt": "Tu es un expert en finance française. Réponds entièrement en français, y compris ton raisonnement."
57
+ },
58
+ ]
59
+
60
+ def run_test(test: Dict[str, Any], language: str = "English") -> Dict[str, Any]:
61
+ """Run a single test."""
62
+ print(f"\n{'='*80}")
63
+ print(f"{'CatΓ©gorie' if language == 'French' else 'Category'}: {test['category']}")
64
+ print(f"Question: {test['question'][:100]}...")
65
+ print(f"Max Tokens: {test.get('max_tokens', 300)}")
66
+ print(f"{'='*80}")
67
+
68
+ messages = [{"role": "user", "content": test["question"]}]
69
+
70
+ # Add system prompt for French tests
71
+ if "system_prompt" in test:
72
+ messages.insert(0, {"role": "system", "content": test["system_prompt"]})
73
+
74
+ payload = {
75
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
76
+ "messages": messages,
77
+ "temperature": 0.3,
78
+ "max_tokens": test.get('max_tokens', 300)
79
+ }
80
+
81
+ start_time = time.time()
82
+
83
+ try:
84
+ response = httpx.post(
85
+ f"{BASE_URL}/v1/chat/completions",
86
+ json=payload,
87
+ timeout=90.0
88
+ )
89
+
90
+ elapsed = time.time() - start_time
91
+
92
+ if response.status_code == 200:
93
+ data = response.json()
94
+ answer = data['choices'][0]['message']['content']
95
+ usage = data.get('usage', {})
96
+ finish_reason = data['choices'][0].get('finish_reason', 'unknown')
97
+
98
+ print(f"\nπŸ’¬ Answer:")
99
+ print(answer)
100
+
101
+ print(f"\nπŸ“Š Stats:")
102
+ print(f" ⏱️ Time: {elapsed:.2f}s")
103
+ print(f" πŸ“ Tokens: {usage.get('completion_tokens', 'N/A')}/{test.get('max_tokens', 300)}")
104
+ print(f" 🏁 Finish: {finish_reason}")
105
+
106
+ # Check if answer was complete
107
+ is_complete = finish_reason == "stop"
108
+ has_thinking = "<think>" in answer.lower()
109
+
110
+ # For French tests, check if thinking is in French
111
+ if language == "French":
112
+ # Simple heuristic: check for French words in thinking section
113
+ if has_thinking:
114
+ thinking_section = answer.split("</think>")[0].lower()
115
+ french_indicators = ["je", "le", "la", "est", "sont", "dans", "avec", "pour"]
116
+ english_indicators = ["the", "is", "are", "with", "for", "that"]
117
+
118
+ french_count = sum(1 for word in french_indicators if word in thinking_section)
119
+ english_count = sum(1 for word in english_indicators if word in thinking_section)
120
+
121
+ thinking_in_french = french_count > english_count
122
+ print(f" πŸ‡«πŸ‡· Thinking in French: {'βœ…' if thinking_in_french else '❌ (in English)'}")
123
+
124
+ print(f"\nπŸ“ˆ Quality:")
125
+ print(f" {'βœ…' if is_complete else '⚠️ TRUNCATED'} Answer status: {finish_reason}")
126
+ print(f" {'βœ…' if has_thinking else 'βž–'} Shows reasoning: {has_thinking}")
127
+
128
+ return {
129
+ "success": True,
130
+ "category": test['category'],
131
+ "time": elapsed,
132
+ "tokens_used": usage.get('completion_tokens', 0),
133
+ "complete": is_complete,
134
+ "has_reasoning": has_thinking
135
+ }
136
+ else:
137
+ print(f"❌ Error: HTTP {response.status_code}")
138
+ return {"success": False, "category": test['category'], "error": str(response.status_code)}
139
+
140
+ except Exception as e:
141
+ print(f"❌ Error: {e}")
142
+ return {"success": False, "category": test['category'], "error": str(e)}
143
+
144
+ def print_summary(results: List[Dict[str, Any]], language: str):
145
+ """Print test summary."""
146
+ print("\n" + "="*80)
147
+ print("RÉSUMÉ" if language == "French" else "SUMMARY")
148
+ print("="*80)
149
+
150
+ successful = [r for r in results if r.get('success')]
151
+ failed = [r for r in results if not r.get('success')]
152
+ complete = [r for r in successful if r.get('complete')]
153
+
154
+ print(f"\nβœ… Successful: {len(successful)}/{len(results)}")
155
+ print(f"βœ… Complete answers: {len(complete)}/{len(successful)} ({100*len(complete)/len(successful) if successful else 0:.1f}%)")
156
+ print(f"❌ Failed: {len(failed)}/{len(results)}")
157
+
158
+ if successful:
159
+ avg_time = sum(r['time'] for r in successful) / len(successful)
160
+ avg_tokens = sum(r['tokens_used'] for r in successful) / len(successful)
161
+
162
+ print(f"\nπŸ“Š Metrics:")
163
+ print(f" ⏱️ Average time: {avg_time:.2f}s")
164
+ print(f" πŸ“ Average tokens: {avg_tokens:.0f}")
165
+ print(f" πŸš€ Speed: {avg_tokens/avg_time:.2f} tokens/s")
166
+
167
+ def main():
168
+ """Run all tests."""
169
+ print("="*80)
170
+ print("FINAL FINANCE LLM TESTS")
171
+ print("="*80)
172
+ print("Testing with proper token limits and language support")
173
+
174
+ # English tests
175
+ print("\n" + "="*80)
176
+ print("ENGLISH TESTS")
177
+ print("="*80)
178
+
179
+ english_results = []
180
+ for i, test in enumerate(ENGLISH_TESTS, 1):
181
+ print(f"\n[Test {i}/{len(ENGLISH_TESTS)}]")
182
+ result = run_test(test, "English")
183
+ english_results.append(result)
184
+ time.sleep(1)
185
+
186
+ print_summary(english_results, "English")
187
+
188
+ # French tests
189
+ print("\n\n" + "="*80)
190
+ print("FRENCH TESTS (with language instructions)")
191
+ print("="*80)
192
+
193
+ french_results = []
194
+ for i, test in enumerate(FRENCH_TESTS, 1):
195
+ print(f"\n[Test {i}/{len(FRENCH_TESTS)}]")
196
+ result = run_test(test, "French")
197
+ french_results.append(result)
198
+ time.sleep(1)
199
+
200
+ print_summary(french_results, "French")
201
+
202
+ # Overall
203
+ print("\n\n" + "="*80)
204
+ print("OVERALL RESULTS")
205
+ print("="*80)
206
+
207
+ all_results = english_results + french_results
208
+ all_successful = [r for r in all_results if r.get('success')]
209
+ all_complete = [r for r in all_successful if r.get('complete')]
210
+
211
+ print(f"\nπŸ“Š Total: {len(all_successful)}/{len(all_results)} successful")
212
+ print(f"βœ… Complete: {len(all_complete)}/{len(all_successful)} ({100*len(all_complete)/len(all_successful) if all_successful else 0:.1f}%)")
213
+ print(f"πŸ‡¬πŸ‡§ English: {len([r for r in english_results if r.get('success')])}/{len(ENGLISH_TESTS)}")
214
+ print(f"πŸ‡«πŸ‡· French: {len([r for r in french_results if r.get('success')])}/{len(FRENCH_TESTS)}")
215
+
216
+ print("\n" + "="*80)
217
+
218
+ if __name__ == "__main__":
219
+ main()
220
+
test_finance_improved.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Improved finance tests with better prompts for concise, complete answers.
4
+ """
5
+
6
+ import httpx
7
+ import json
8
+ import time
9
+ from typing import Dict, Any, List
10
+
11
+ BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
12
+
13
+ # Improved finance tests with prompts that encourage concise but complete answers
14
+ FINANCE_TESTS = [
15
+ {
16
+ "category": "Financial Calculations",
17
+ "question": "Calculate: If I invest $10,000 at 5% annual interest compounded annually for 3 years, what will be the final amount? Show your calculation steps briefly.",
18
+ "max_tokens": 150
19
+ },
20
+ {
21
+ "category": "Risk Management",
22
+ "question": "Define Value at Risk (VaR) and explain its main use in portfolio management. Be concise but complete.",
23
+ "max_tokens": 200
24
+ },
25
+ {
26
+ "category": "Financial Instruments",
27
+ "question": "Explain the key difference between call and put options in 2-3 sentences.",
28
+ "max_tokens": 100
29
+ },
30
+ {
31
+ "category": "Market Analysis",
32
+ "question": "List 5 key factors that influence stock market volatility and briefly explain each.",
33
+ "max_tokens": 250
34
+ },
35
+ {
36
+ "category": "Corporate Finance",
37
+ "question": "Compare EBITDA vs Net Income: What's included in each and why does the difference matter?",
38
+ "max_tokens": 200
39
+ },
40
+ {
41
+ "category": "Investment Strategy",
42
+ "question": "Explain portfolio diversification and why it's important. Give a concrete example.",
43
+ "max_tokens": 200
44
+ },
45
+ {
46
+ "category": "Financial Ratios",
47
+ "question": "How do you calculate P/E ratio? What does a high vs low P/E tell you about a stock?",
48
+ "max_tokens": 150
49
+ },
50
+ {
51
+ "category": "Fixed Income",
52
+ "question": "Explain the inverse relationship between bond prices and interest rates. Why does this occur?",
53
+ "max_tokens": 150
54
+ },
55
+ ]
56
+
57
+ # French finance tests with proper French terminology
58
+ FRENCH_FINANCE_TESTS = [
59
+ {
60
+ "category": "Calculs Financiers",
61
+ "question": "Si j'investis 10 000€ avec un taux d'intΓ©rΓͺt annuel de 5% composΓ© annuellement pendant 3 ans, quel sera le montant final? Montrez vos calculs.",
62
+ "max_tokens": 150
63
+ },
64
+ {
65
+ "category": "Gestion des Risques",
66
+ "question": "Expliquez ce qu'est la VaR (Value at Risk / Valeur en Risque) et son utilisation dans la gestion de portefeuille.",
67
+ "max_tokens": 200
68
+ },
69
+ {
70
+ "category": "Instruments Financiers",
71
+ "question": "Quelle est la diffΓ©rence entre une option d'achat (call) et une option de vente (put)?",
72
+ "max_tokens": 150
73
+ },
74
+ {
75
+ "category": "Analyse Boursière",
76
+ "question": "Quels sont les principaux facteurs qui influencent la volatilitΓ© des marchΓ©s boursiers?",
77
+ "max_tokens": 200
78
+ },
79
+ {
80
+ "category": "Finance d'Entreprise",
81
+ "question": "Expliquez la diffΓ©rence entre l'EBITDA (BΓ©nΓ©fice avant intΓ©rΓͺts, impΓ΄ts, dΓ©prΓ©ciation et amortissement) et le rΓ©sultat net.",
82
+ "max_tokens": 200
83
+ },
84
+ {
85
+ "category": "StratΓ©gie d'Investissement",
86
+ "question": "Qu'est-ce que la diversification d'un portefeuille et pourquoi est-elle importante?",
87
+ "max_tokens": 200
88
+ },
89
+ {
90
+ "category": "Ratios Financiers",
91
+ "question": "Comment calculer le ratio cours/bΓ©nΓ©fice (PER) et comment l'interprΓ©ter?",
92
+ "max_tokens": 150
93
+ },
94
+ {
95
+ "category": "Obligations",
96
+ "question": "Pourquoi les prix des obligations baissent-ils lorsque les taux d'intΓ©rΓͺt augmentent?",
97
+ "max_tokens": 150
98
+ },
99
+ {
100
+ "category": "Analyse Technique (Termes FranΓ§ais)",
101
+ "question": "Expliquez les termes suivants utilisΓ©s en bourse franΓ§aise: CAC 40, PEA, sicav, et OAT.",
102
+ "max_tokens": 200
103
+ },
104
+ {
105
+ "category": "FiscalitΓ© (France)",
106
+ "question": "Quelle est la différence entre la Flat Tax et le barème progressif pour l'imposition des revenus de capitaux mobiliers en France?",
107
+ "max_tokens": 200
108
+ },
109
+ ]
110
+
111
+ def run_test(test: Dict[str, Any], language: str = "English") -> Dict[str, Any]:
112
+ """Run a single test."""
113
+ print(f"\n{'─'*80}")
114
+ print(f"CatΓ©gorie: {test['category']}" if language == "French" else f"Category: {test['category']}")
115
+ print(f"Question: {test['question']}")
116
+ print(f"Max Tokens: {test.get('max_tokens', 200)}")
117
+ print(f"{'─'*80}")
118
+
119
+ payload = {
120
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
121
+ "messages": [
122
+ {"role": "user", "content": test["question"]}
123
+ ],
124
+ "temperature": 0.2, # Lower for more focused answers
125
+ "max_tokens": test.get('max_tokens', 200)
126
+ }
127
+
128
+ start_time = time.time()
129
+
130
+ try:
131
+ response = httpx.post(
132
+ f"{BASE_URL}/v1/chat/completions",
133
+ json=payload,
134
+ timeout=60.0
135
+ )
136
+
137
+ elapsed = time.time() - start_time
138
+
139
+ if response.status_code == 200:
140
+ data = response.json()
141
+ answer = data['choices'][0]['message']['content']
142
+ usage = data.get('usage', {})
143
+ finish_reason = data['choices'][0].get('finish_reason', 'unknown')
144
+
145
+ print(f"\nπŸ“Š Stats:")
146
+ print(f" ⏱️ Time: {elapsed:.2f}s")
147
+ print(f" πŸ“ Tokens: {usage.get('completion_tokens', 'N/A')}/{test.get('max_tokens', 200)}")
148
+ print(f" 🏁 Finish: {finish_reason}")
149
+
150
+ print(f"\nπŸ’¬ Answer:\n{answer}")
151
+
152
+ # Evaluate answer quality
153
+ is_complete = finish_reason == "stop"
154
+ has_thinking = "<think>" in answer
155
+ answer_content = answer.split("</think>")[-1].strip() if has_thinking else answer
156
+
157
+ print(f"\nπŸ“ˆ Quality:")
158
+ print(f" {'βœ…' if is_complete else '⚠️'} Complete: {is_complete}")
159
+ print(f" {'βœ…' if has_thinking else 'βž–'} Shows reasoning: {has_thinking}")
160
+ print(f" πŸ“ Answer length: {len(answer_content)} chars")
161
+
162
+ return {
163
+ "success": True,
164
+ "category": test['category'],
165
+ "time": elapsed,
166
+ "tokens_used": usage.get('completion_tokens', 0),
167
+ "tokens_limit": test.get('max_tokens', 200),
168
+ "complete": is_complete,
169
+ "has_reasoning": has_thinking
170
+ }
171
+ else:
172
+ print(f"❌ Error: HTTP {response.status_code}")
173
+ return {"success": False, "category": test['category'], "error": str(response.status_code)}
174
+
175
+ except Exception as e:
176
+ print(f"❌ Error: {e}")
177
+ return {"success": False, "category": test['category'], "error": str(e)}
178
+
179
+ def print_summary(results: List[Dict[str, Any]], language: str):
180
+ """Print test summary."""
181
+ print("\n" + "="*80)
182
+ print("RÉSUMÉ DES TESTS" if language == "French" else "TEST SUMMARY")
183
+ print("="*80)
184
+
185
+ successful = [r for r in results if r.get('success')]
186
+ failed = [r for r in results if not r.get('success')]
187
+
188
+ print(f"\nβœ… Successful: {len(successful)}/{len(results)}")
189
+ print(f"❌ Failed: {len(failed)}/{len(results)}")
190
+
191
+ if successful:
192
+ avg_time = sum(r['time'] for r in successful) / len(successful)
193
+ avg_tokens = sum(r['tokens_used'] for r in successful) / len(successful)
194
+ complete_count = sum(1 for r in successful if r.get('complete'))
195
+ reasoning_count = sum(1 for r in successful if r.get('has_reasoning'))
196
+
197
+ print(f"\nπŸ“Š Performance Metrics:")
198
+ print(f" ⏱️ Average response time: {avg_time:.2f}s")
199
+ print(f" πŸ“ Average tokens used: {avg_tokens:.0f}")
200
+ print(f" βœ… Complete answers: {complete_count}/{len(successful)} ({100*complete_count/len(successful):.1f}%)")
201
+ print(f" 🧠 Answers with reasoning: {reasoning_count}/{len(successful)} ({100*reasoning_count/len(successful):.1f}%)")
202
+
203
+ # Token efficiency
204
+ total_used = sum(r['tokens_used'] for r in successful)
205
+ total_limit = sum(r['tokens_limit'] for r in successful)
206
+ print(f" πŸ’° Token efficiency: {total_used}/{total_limit} ({100*total_used/total_limit:.1f}% utilization)")
207
+
208
+ def main():
209
+ """Run all tests."""
210
+ print("="*80)
211
+ print("IMPROVED FINANCE LLM TESTING")
212
+ print("="*80)
213
+ print(f"Target: {BASE_URL}")
214
+
215
+ # Test English questions
216
+ print("\n" + "="*80)
217
+ print("ENGLISH FINANCE TESTS (Improved Prompts)")
218
+ print("="*80)
219
+
220
+ english_results = []
221
+ for i, test in enumerate(FINANCE_TESTS, 1):
222
+ print(f"\n[Test {i}/{len(FINANCE_TESTS)}]")
223
+ result = run_test(test, "English")
224
+ english_results.append(result)
225
+ if i < len(FINANCE_TESTS):
226
+ time.sleep(1)
227
+
228
+ print_summary(english_results, "English")
229
+
230
+ # Test French questions
231
+ print("\n\n" + "="*80)
232
+ print("FRENCH FINANCE TESTS (Questions en FranΓ§ais)")
233
+ print("="*80)
234
+ print("Testing with French finance terminology...")
235
+
236
+ french_results = []
237
+ for i, test in enumerate(FRENCH_FINANCE_TESTS, 1):
238
+ print(f"\n[Test {i}/{len(FRENCH_FINANCE_TESTS)}]")
239
+ result = run_test(test, "French")
240
+ french_results.append(result)
241
+ if i < len(FRENCH_FINANCE_TESTS):
242
+ time.sleep(1)
243
+
244
+ print_summary(french_results, "French")
245
+
246
+ # Overall summary
247
+ print("\n\n" + "="*80)
248
+ print("OVERALL SUMMARY")
249
+ print("="*80)
250
+
251
+ total_tests = len(english_results) + len(french_results)
252
+ total_success = sum(1 for r in english_results + french_results if r.get('success'))
253
+
254
+ print(f"\nπŸ“Š Total Tests: {total_tests}")
255
+ print(f"βœ… Total Successful: {total_success}/{total_tests} ({100*total_success/total_tests:.1f}%)")
256
+ print(f"πŸ‡¬πŸ‡§ English: {len([r for r in english_results if r.get('success')])}/{len(english_results)}")
257
+ print(f"πŸ‡«πŸ‡· French: {len([r for r in french_results if r.get('success')])}/{len(french_results)}")
258
+
259
+ print("\n" + "="*80)
260
+ print("TESTING COMPLETE")
261
+ print("="*80)
262
+
263
+ if __name__ == "__main__":
264
+ main()
265
+
test_finance_queries.py ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test the deployed finance LLM with various finance-specific questions.
4
+ """
5
+
6
+ import httpx
7
+ import json
8
+ import time
9
+ from typing import Dict, Any, List
10
+
11
+ BASE_URL = "https://jeanbaptdzd-open-finance-llm-8b.hf.space"
12
+
13
+ # Finance test questions covering different domains
14
+ FINANCE_TESTS = [
15
+ {
16
+ "category": "Financial Calculations",
17
+ "question": "If I invest $10,000 at an annual interest rate of 5% compounded annually, how much will I have after 3 years?",
18
+ "expected_topics": ["compound interest", "10000", "5%", "3 years"]
19
+ },
20
+ {
21
+ "category": "Risk Management",
22
+ "question": "What is Value at Risk (VaR) and how is it used in portfolio management?",
23
+ "expected_topics": ["VaR", "risk", "portfolio", "loss"]
24
+ },
25
+ {
26
+ "category": "Financial Instruments",
27
+ "question": "Explain the difference between a call option and a put option.",
28
+ "expected_topics": ["call", "put", "option", "buy", "sell"]
29
+ },
30
+ {
31
+ "category": "Market Analysis",
32
+ "question": "What factors typically influence stock market volatility?",
33
+ "expected_topics": ["volatility", "market", "uncertainty", "factors"]
34
+ },
35
+ {
36
+ "category": "Corporate Finance",
37
+ "question": "What is the difference between EBITDA and net income?",
38
+ "expected_topics": ["EBITDA", "net income", "earnings", "depreciation"]
39
+ },
40
+ {
41
+ "category": "Investment Strategy",
42
+ "question": "What is diversification and why is it important in investing?",
43
+ "expected_topics": ["diversification", "risk", "portfolio", "assets"]
44
+ },
45
+ {
46
+ "category": "Financial Ratios",
47
+ "question": "How do you calculate and interpret the Price-to-Earnings (P/E) ratio?",
48
+ "expected_topics": ["P/E", "price", "earnings", "ratio", "valuation"]
49
+ },
50
+ {
51
+ "category": "Fixed Income",
52
+ "question": "What happens to bond prices when interest rates rise?",
53
+ "expected_topics": ["bond", "interest rate", "price", "inverse"]
54
+ },
55
+ ]
56
+
57
+ def test_endpoint_availability():
58
+ """Test if the endpoint is available."""
59
+ print("\n" + "="*80)
60
+ print("TESTING ENDPOINT AVAILABILITY")
61
+ print("="*80)
62
+
63
+ try:
64
+ response = httpx.get(f"{BASE_URL}/", timeout=30.0)
65
+ data = response.json()
66
+ print(f"βœ… Status: {response.status_code}")
67
+ print(f"βœ… Backend: {data.get('backend')}")
68
+ print(f"βœ… Model: {data.get('model')}")
69
+ print(f"βœ… Service: {data.get('service')}")
70
+ return True
71
+ except Exception as e:
72
+ print(f"❌ Error: {e}")
73
+ return False
74
+
75
+ def test_models_endpoint():
76
+ """Test the /v1/models endpoint."""
77
+ print("\n" + "="*80)
78
+ print("TESTING MODELS ENDPOINT")
79
+ print("="*80)
80
+
81
+ try:
82
+ response = httpx.get(f"{BASE_URL}/v1/models", timeout=30.0)
83
+ data = response.json()
84
+ print(f"βœ… Status: {response.status_code}")
85
+ print(f"βœ… Available models: {len(data.get('data', []))}")
86
+ for model in data.get('data', []):
87
+ print(f" - {model.get('id')}")
88
+ return True
89
+ except Exception as e:
90
+ print(f"❌ Error: {e}")
91
+ return False
92
+
93
+ def run_finance_test(test: Dict[str, Any], max_tokens: int = 200) -> Dict[str, Any]:
94
+ """Run a single finance test question."""
95
+ print(f"\n{'─'*80}")
96
+ print(f"Category: {test['category']}")
97
+ print(f"Question: {test['question']}")
98
+ print(f"{'─'*80}")
99
+
100
+ payload = {
101
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
102
+ "messages": [
103
+ {"role": "user", "content": test["question"]}
104
+ ],
105
+ "temperature": 0.3,
106
+ "max_tokens": max_tokens
107
+ }
108
+
109
+ start_time = time.time()
110
+
111
+ try:
112
+ response = httpx.post(
113
+ f"{BASE_URL}/v1/chat/completions",
114
+ json=payload,
115
+ timeout=60.0
116
+ )
117
+
118
+ elapsed = time.time() - start_time
119
+
120
+ if response.status_code == 200:
121
+ data = response.json()
122
+ answer = data['choices'][0]['message']['content']
123
+ usage = data.get('usage', {})
124
+
125
+ print(f"\nπŸ“Š Response Stats:")
126
+ print(f" ⏱️ Time: {elapsed:.2f}s")
127
+ print(f" πŸ“ Tokens: {usage.get('total_tokens', 'N/A')} "
128
+ f"(prompt: {usage.get('prompt_tokens', 'N/A')}, "
129
+ f"completion: {usage.get('completion_tokens', 'N/A')})")
130
+
131
+ print(f"\nπŸ’¬ Answer:\n{answer}")
132
+
133
+ # Check if expected topics are mentioned
134
+ answer_lower = answer.lower()
135
+ topics_found = [topic for topic in test.get('expected_topics', [])
136
+ if topic.lower() in answer_lower]
137
+
138
+ if topics_found:
139
+ print(f"\nβœ… Relevant topics found: {', '.join(topics_found)}")
140
+
141
+ return {
142
+ "success": True,
143
+ "category": test['category'],
144
+ "time": elapsed,
145
+ "tokens": usage.get('total_tokens', 0),
146
+ "topics_found": len(topics_found),
147
+ "topics_expected": len(test.get('expected_topics', []))
148
+ }
149
+ else:
150
+ print(f"❌ Error: HTTP {response.status_code}")
151
+ print(f" {response.text}")
152
+ return {
153
+ "success": False,
154
+ "category": test['category'],
155
+ "error": f"HTTP {response.status_code}"
156
+ }
157
+
158
+ except Exception as e:
159
+ elapsed = time.time() - start_time
160
+ print(f"❌ Error after {elapsed:.2f}s: {e}")
161
+ return {
162
+ "success": False,
163
+ "category": test['category'],
164
+ "error": str(e)
165
+ }
166
+
167
+ def print_summary(results: List[Dict[str, Any]]):
168
+ """Print test summary."""
169
+ print("\n" + "="*80)
170
+ print("TEST SUMMARY")
171
+ print("="*80)
172
+
173
+ successful = [r for r in results if r.get('success')]
174
+ failed = [r for r in results if not r.get('success')]
175
+
176
+ print(f"\nβœ… Successful: {len(successful)}/{len(results)}")
177
+ print(f"❌ Failed: {len(failed)}/{len(results)}")
178
+
179
+ if successful:
180
+ avg_time = sum(r['time'] for r in successful) / len(successful)
181
+ avg_tokens = sum(r['tokens'] for r in successful) / len(successful)
182
+ total_topics = sum(r['topics_found'] for r in successful)
183
+ expected_topics = sum(r['topics_expected'] for r in successful)
184
+
185
+ print(f"\nπŸ“Š Performance Metrics:")
186
+ print(f" ⏱️ Average response time: {avg_time:.2f}s")
187
+ print(f" πŸ“ Average tokens: {avg_tokens:.0f}")
188
+ print(f" 🎯 Topic coverage: {total_topics}/{expected_topics} "
189
+ f"({100*total_topics/expected_topics if expected_topics > 0 else 0:.1f}%)")
190
+
191
+ if failed:
192
+ print(f"\n❌ Failed Tests:")
193
+ for r in failed:
194
+ print(f" - {r['category']}: {r.get('error', 'Unknown error')}")
195
+
196
+ def main():
197
+ """Run all finance tests."""
198
+ print("="*80)
199
+ print("FINANCE LLM TESTING SUITE")
200
+ print("="*80)
201
+ print(f"Target: {BASE_URL}")
202
+ print(f"Total tests: {len(FINANCE_TESTS)}")
203
+
204
+ # Test endpoint availability
205
+ if not test_endpoint_availability():
206
+ print("\n❌ Endpoint not available. Exiting.")
207
+ return
208
+
209
+ # Test models endpoint
210
+ if not test_models_endpoint():
211
+ print("\n⚠️ Models endpoint not available, but continuing...")
212
+
213
+ # Run finance tests
214
+ print("\n" + "="*80)
215
+ print("RUNNING FINANCE TESTS")
216
+ print("="*80)
217
+
218
+ results = []
219
+ for i, test in enumerate(FINANCE_TESTS, 1):
220
+ print(f"\n[Test {i}/{len(FINANCE_TESTS)}]")
221
+ result = run_finance_test(test)
222
+ results.append(result)
223
+
224
+ # Small delay between requests
225
+ if i < len(FINANCE_TESTS):
226
+ time.sleep(1)
227
+
228
+ # Print summary
229
+ print_summary(results)
230
+
231
+ print("\n" + "="*80)
232
+ print("TESTING COMPLETE")
233
+ print("="*80)
234
+
235
+ if __name__ == "__main__":
236
+ main()
237
+
test_results.txt ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ================================================================================
2
+ IMPROVED FINANCE LLM TESTING
3
+ ================================================================================
4
+ Target: https://jeanbaptdzd-open-finance-llm-8b.hf.space
5
+
6
+ ================================================================================
7
+ ENGLISH FINANCE TESTS (Improved Prompts)
8
+ ================================================================================
9
+
10
+ [Test 1/8]
11
+
12
+ ────────────────────────────────────────────────────────────────────────────────
13
+ Category: Financial Calculations
14
+ Question: Calculate: If I invest $10,000 at 5% annual interest compounded annually for 3 years, what will be the final amount? Show your calculation steps briefly.
15
+ Max Tokens: 150
16
+ ────────────────────────────────────────────────────────────────────────────────
17
+
18
+ πŸ“Š Stats:
19
+ ⏱️ Time: 10.23s
20
+ πŸ“ Tokens: 150/150
21
+ 🏁 Finish: stop
22
+
23
+ πŸ’¬ Answer:
24
+ <think>
25
+ Okay, let's see. The user wants to calculate the final amount after investing $10,000 at 5% annual interest compounded annually for 3 years. Hmm, I remember that compound interest formula is A = P(1 + r/n)^(nt). But wait, since it's compounded annually, n should be 1. So the formula simplifies to A = P(1 + r)^t.
26
+
27
+ Alright, so P is the principal, which is $10,000. The rate r is 5%, so that's 0.05 as a decimal. The time t is 3 years. Plugging those into the formula: A = 10000
28
+
29
+ πŸ“ˆ Quality:
30
+ βœ… Complete: True
31
+ βœ… Shows reasoning: True
32
+ πŸ“ Answer length: 485 chars
33
+
34
+ [Test 2/8]
35
+
36
+ ────────────────────────────────────────────────────────────────────────────────
37
+ Category: Risk Management
38
+ Question: Define Value at Risk (VaR) and explain its main use in portfolio management. Be concise but complete.
39
+ Max Tokens: 200
40
+ ────────────────────────────────────────────────────────────────────────────────
41
+
42
+ πŸ“Š Stats:
43
+ ⏱️ Time: 13.46s
44
+ πŸ“ Tokens: 200/200
45
+ 🏁 Finish: stop
46
+
47
+ πŸ’¬ Answer:
48
+ <think>
49
+ Okay, the user is asking about Value at Risk (VaR) and its main use in portfolio management. Let me start by recalling what VaR is. From what I remember, VaR is a statistical measure used to estimate the maximum potential loss in a portfolio over a specific time horizon at a given confidence level. So, for example, a 95% VaR of $1 million means there's a 5% chance the portfolio will lose more than $1 million in a day.
50
+
51
+ Now, the user wants a concise but complete explanation. I need to make sure I cover the definition clearly. Maybe start by stating that VaR quantifies the risk of loss. Then mention the parameters: time horizon and confidence level. It's important to note that VaR doesn't tell the full story because it doesn't specify the magnitude of losses beyond the threshold. That's why it's often paired with other metrics like Expected Shortfall.
52
+
53
+ In portfolio management, VaR is used
54
+
55
+ πŸ“ˆ Quality:
56
+ βœ… Complete: True
57
+ βœ… Shows reasoning: True
58
+ πŸ“ Answer length: 907 chars
59
+
60
+ [Test 3/8]
61
+
62
+ ────────────────────────────────────────────────────────────────────────────────
63
+ Category: Financial Instruments
64
+ Question: Explain the key difference between call and put options in 2-3 sentences.
65
+ Max Tokens: 100
66
+ ────────────────────────────────────────────────────────────────────────────────
67
+
68
+ πŸ“Š Stats:
69
+ ⏱️ Time: 6.81s
70
+ πŸ“ Tokens: 100/100
71
+ 🏁 Finish: stop
72
+
73
+ πŸ’¬ Answer:
74
+ <think>
75
+ Okay, the user is asking about the key difference between call and put options. Let me start by recalling what I know about options.
76
+
77
+ A call option gives the holder the right to buy an asset at a set price. So, if the market price goes up, the call option becomes valuable. The buyer profits if the stock price rises above the strike price plus the premium paid.
78
+
79
+ On the other hand, a put option allows the holder to sell the asset at the strike price. This
80
+
81
+ πŸ“ˆ Quality:
82
+ βœ… Complete: True
83
+ βœ… Shows reasoning: True
84
+ πŸ“ Answer length: 466 chars
85
+
86
+ [Test 4/8]
87
+
88
+ ────────────────────────────────────────────────────────────────────────────────
89
+ Category: Market Analysis
90
+ Question: List 5 key factors that influence stock market volatility and briefly explain each.
91
+ Max Tokens: 250
92
+ ────────────────────────────────────────────────────────────────────────────────
93
+
94
+ πŸ“Š Stats:
95
+ ⏱️ Time: 16.68s
96
+ πŸ“ Tokens: 250/250
97
+ 🏁 Finish: stop
98
+
99
+ πŸ’¬ Answer:
100
+ <think>
101
+ Okay, the user is asking for five key factors that influence stock market volatility. Let me start by recalling what I know about stock market volatility. Volatility refers to the degree of variation in trading prices over time. High volatility means prices swing wildly, while low volatility means they're more stable.
102
+
103
+ First, I should think about the main drivers. Economic indicators come to mind. Things like GDP growth, inflation rates, unemployment data. If the economy is doing well, maybe stocks are more stable, but if there's uncertainty, that could increase volatility. For example, if inflation is rising faster than expected, central banks might raise interest rates, which can affect stock prices.
104
+
105
+ Then there's geopolitical events. Wars, elections, trade disputes. These can create uncertainty. Like during the pandemic, markets were volatile because of lockdowns and economic shutdowns. Geopolitical tensions can lead to sudden market reactions.
106
+
107
+ Corporate earnings are another factor. If companies report better-than-expected earnings, their stocks might rise, but if they miss, it could cause a drop. Earnings season is a time when volatility tends to be higher because investors are reacting to actual results.
108
+
109
+ Interest rates are crucial. When interest rates change, especially by central banks, it affects
110
+
111
+ πŸ“ˆ Quality:
112
+ βœ… Complete: True
113
+ βœ… Shows reasoning: True
114
+ πŸ“ Answer length: 1334 chars
115
+
116
+ [Test 5/8]
117
+
118
+ ────────────────────────────────────────────────────────────────────────────────
119
+ Category: Corporate Finance
120
+ Question: Compare EBITDA vs Net Income: What's included in each and why does the difference matter?
121
+ Max Tokens: 200
122
+ ────────────────────────────────────────────────────────────────────────────────
123
+
124
+ πŸ“Š Stats:
125
+ ⏱️ Time: 13.34s
126
+ πŸ“ Tokens: 200/200
127
+ 🏁 Finish: stop
128
+
129
+ πŸ’¬ Answer:
130
+ <think>
131
+ Okay, the user is asking about the difference between EBITDA and Net Income. Let me start by recalling what each term stands for. EBITDA is Earnings Before Interest, Taxes, Depreciation, and Amortization. Net Income is the actual profit after all expenses, including those mentioned in EBITDA, plus interest, taxes, and other costs.
132
+
133
+ So, the user wants to know what's included in each. For EBITDA, it's revenues minus operating expenses, excluding non-operating items like interest and taxes. Net Income includes all those, so it's a more comprehensive measure. But why does the difference matter? Well, EBITDA is often used to assess a company's operational performance because it strips out the effects of financing, accounting, and tax policies. That makes it useful for comparing companies with different capital structures or tax rates.
134
+
135
+ On the other hand, Net Income is the bottom line, showing the actual profit. Investors and analysts use both
136
+
137
+ πŸ“ˆ Quality:
138
+ βœ… Complete: True
139
+ βœ… Shows reasoning: True
140
+ πŸ“ Answer length: 959 chars
141
+
142
+ [Test 6/8]
143
+
144
+ ────────────────────────────────────────────────────────────────────────────────
145
+ Category: Investment Strategy
146
+ Question: Explain portfolio diversification and why it's important. Give a concrete example.
147
+ Max Tokens: 200
148
+ ────────────────────────────────────────────────────────────────────────────────
149
+
150
+ πŸ“Š Stats:
151
+ ⏱️ Time: 13.34s
152
+ πŸ“ Tokens: 200/200
153
+ 🏁 Finish: stop
154
+
155
+ πŸ’¬ Answer:
156
+ <think>
157
+ Okay, the user is asking about portfolio diversification and why it's important, with a concrete example. Let me start by recalling what I know about diversification. It's a strategy to reduce risk by spreading investments across different assets. The main idea is that different assets don't move in the same direction at the same time, so losses in one area might be offset by gains in another.
158
+
159
+ First, I need to explain the concept clearly. Maybe start by defining diversification. Then, explain the importanceβ€”like reducing unsystematic risk, which is specific to individual assets. Also, mention that it doesn't eliminate all risk, just reduces it. The example part is crucial. Let me think of a simple example. Suppose someone invests all their money in one stock. If that company does poorly, the whole portfolio loses value. But if they spread their money across different stocks, sectors, or asset classes, the impact of a single bad investment is lessened.
160
+
161
+ Wait, maybe
162
+
163
+ πŸ“ˆ Quality:
164
+ βœ… Complete: True
165
+ βœ… Shows reasoning: True
166
+ πŸ“ Answer length: 987 chars
167
+
168
+ [Test 7/8]
169
+
170
+ ────────────────────────────────────────────────────────────────────────────────
171
+ Category: Financial Ratios
172
+ Question: How do you calculate P/E ratio? What does a high vs low P/E tell you about a stock?
173
+ Max Tokens: 150
174
+ ────────────────────────────────────────────────────────────────────────────────
175
+
176
+ πŸ“Š Stats:
177
+ ⏱️ Time: 10.02s
178
+ πŸ“ Tokens: 150/150
179
+ 🏁 Finish: stop
180
+
181
+ πŸ’¬ Answer:
182
+ <think>
183
+ Okay, the user is asking about how to calculate the P/E ratio and what high or low P/E ratios indicate. Let me start by recalling the formula. The P/E ratio is calculated by dividing the market price per share by the earnings per share (EPS). So, P/E = Market Price per Share / EPS. I need to make sure I explain that clearly.
184
+
185
+ Now, what does a high P/E ratio mean? High P/E usually suggests that investors expect higher earnings growth in the future. It could also mean the stock is overvalued if the growth isn't justified. Conversely, a low P/E might indicate that the stock is undervalued or that the market expects lower growth. But I should mention that context matters
186
+
187
+ πŸ“ˆ Quality:
188
+ βœ… Complete: True
189
+ βœ… Shows reasoning: True
190
+ πŸ“ Answer length: 683 chars
191
+
192
+ [Test 8/8]
193
+
194
+ ────────────────────────────────────────────────────────────────────────────────
195
+ Category: Fixed Income
196
+ Question: Explain the inverse relationship between bond prices and interest rates. Why does this occur?
197
+ Max Tokens: 150
198
+ ────────────────────────────────────────────────────────────────────────────────
199
+
200
+ πŸ“Š Stats:
201
+ ⏱️ Time: 10.06s
202
+ πŸ“ Tokens: 150/150
203
+ 🏁 Finish: stop
204
+
205
+ πŸ’¬ Answer:
206
+ <think>
207
+ Okay, so I need to explain why bond prices and interest rates have an inverse relationship. Hmm, let me start by recalling what I know about bonds. Bonds are essentially loans that investors make to the government or corporations. When you buy a bond, you're lending money to the issuer, and in return, they pay you interest over time and return the principal at maturity.
208
+
209
+ Now, interest rates... when the central bank changes the interest rates, that affects the cost of borrowing money. If interest rates go up, new bonds are issued with higher coupon rates to attract investors. So existing bonds with lower coupon rates become less attractive compared to the new ones. That should make the price of existing bonds drop because investors would want a higher return,
210
+
211
+ πŸ“ˆ Quality:
212
+ βœ… Complete: True
213
+ βœ… Shows reasoning: True
214
+ πŸ“ Answer length: 776 chars
215
+
216
+ ================================================================================
217
+ TEST SUMMARY
218
+ ================================================================================
219
+
220
+ βœ… Successful: 8/8
221
+ ❌ Failed: 0/8
222
+
223
+ πŸ“Š Performance Metrics:
224
+ ⏱️ Average response time: 11.74s
225
+ πŸ“ Average tokens used: 175
226
+ βœ… Complete answers: 8/8 (100.0%)
227
+ 🧠 Answers with reasoning: 8/8 (100.0%)
228
+ πŸ’° Token efficiency: 1400/1400 (100.0% utilization)
229
+
230
+
231
+ ================================================================================
232
+ FRENCH FINANCE TESTS (Questions en FranΓ§ais)
233
+ ================================================================================
234
+ Testing with French finance terminology...
235
+
236
+ [Test 1/10]
237
+
238
+ ────────────────────────────────────────────────────────────────────────────────
239
+ CatΓ©gorie: Calculs Financiers
240
+ Question: Si j'investis 10 000€ avec un taux d'intΓ©rΓͺt annuel de 5% composΓ© annuellement pendant 3 ans, quel sera le montant final? Montrez vos calculs.
241
+ Max Tokens: 150
242
+ ────────────────────────────────────────────────────────────────────────────────
243
+
244
+ πŸ“Š Stats:
245
+ ⏱️ Time: 10.08s
246
+ πŸ“ Tokens: 150/150
247
+ 🏁 Finish: stop
248
+
249
+ πŸ’¬ Answer:
250
+ <think>
251
+ Okay, let's see. The user wants to know the final amount after investing 10,000€ with a 5% annual interest rate compounded annually over 3 years. Hmm, I remember that compound interest formula is A = P(1 + r)^t. Let me make sure I have that right. P is the principal, which is 10,000€ here. The rate r is 5%, so that's 0.05 as a decimal. Time t is 3 years.
252
+
253
+ So plugging in the numbers: A = 10,000 * (1 + 0.05)^3. Let me calculate that step by step. First
254
+
255
+ πŸ“ˆ Quality:
256
+ βœ… Complete: True
257
+ βœ… Shows reasoning: True
258
+ πŸ“ Answer length: 463 chars
259
+
260
+ [Test 2/10]
261
+
262
+ ────────────────────────────────────────────────────────────────────────────────
263
+ CatΓ©gorie: Gestion des Risques
264
+ Question: Expliquez ce qu'est la VaR (Value at Risk / Valeur en Risque) et son utilisation dans la gestion de portefeuille.
265
+ Max Tokens: 200
266
+ ────────────────────────────────────────────────────────────────────────────────
267
+
268
+ πŸ“Š Stats:
269
+ ⏱️ Time: 13.34s
270
+ πŸ“ Tokens: 200/200
271
+ 🏁 Finish: stop
272
+
273
+ πŸ’¬ Answer:
274
+ <think>
275
+ Okay, the user is asking about VaR and its use in portfolio management. Let me start by recalling what VaR is. VaR stands for Value at Risk. It's a statistical measure that estimates the maximum potential loss in value of a portfolio over a specified time period for a given confidence interval. For example, a 95% VaR of $1 million means there's a 5% chance the portfolio will lose more than $1 million in a day.
276
+
277
+ I should explain the different methods to calculate VaR. The basic methods are variance-covariance, historical simulation, and Monte Carlo simulation. The variance-covariance method uses the standard deviation and correlation of assets. Historical simulation looks at past returns to estimate future losses. Monte Carlo uses random scenarios to model possible outcomes.
278
+
279
+ Then, the user might want to know how VaR is used in portfolio management. It helps in risk assessment by quantifying potential losses, which is crucial for setting risk limits. It's
280
+
281
+ πŸ“ˆ Quality:
282
+ βœ… Complete: True
283
+ βœ… Shows reasoning: True
284
+ πŸ“ Answer length: 976 chars
285
+
286
+ [Test 3/10]
287
+
288
+ ────────────────────────────────────────────────────────────────────────────────
289
+ CatΓ©gorie: Instruments Financiers
290
+ Question: Quelle est la diffΓ©rence entre une option d'achat (call) et une option de vente (put)?
291
+ Max Tokens: 150
292
+ ────────────────────────────────────────────────────────────────────────────────
293
+
294
+ πŸ“Š Stats:
295
+ ⏱️ Time: 10.01s
296
+ πŸ“ Tokens: 150/150
297
+ 🏁 Finish: stop
298
+
299
+ πŸ’¬ Answer:
300
+ <think>
301
+ Okay, the user is asking about the difference between a call option and a put option. Let me start by recalling what I know about options.
302
+
303
+ A call option gives the holder the right to buy an asset at a certain price before expiration. So, if I have a call option on a stock, I can decide to buy that stock at the strike price, even if the market price is higher. The profit would be the difference between the market price and the strike price, minus the premium paid.
304
+
305
+ On the other hand, a put option allows the holder to sell the asset at the strike price before expiration. If the market price drops below the strike price, the put option becomes valuable. The profit here would be the difference between the
306
+
307
+ πŸ“ˆ Quality:
308
+ βœ… Complete: True
309
+ βœ… Shows reasoning: True
310
+ πŸ“ Answer length: 720 chars
311
+
312
+ [Test 4/10]
313
+
314
+ ────────────────────────────────────────────────────────────────────────────────
315
+ Catégorie: Analyse Boursière
316
+ Question: Quels sont les principaux facteurs qui influencent la volatilitΓ© des marchΓ©s boursiers?
317
+ Max Tokens: 200
318
+ ────────────────────────────────────────────────────────────────────────────────
319
+
320
+ πŸ“Š Stats:
321
+ ⏱️ Time: 13.34s
322
+ πŸ“ Tokens: 200/200
323
+ 🏁 Finish: stop
324
+
325
+ πŸ’¬ Answer:
326
+ <think>
327
+ Okay, the user is asking about the main factors that influence stock market volatility. Let me start by recalling what I know. Volatility refers to the degree of variation in trading prices over time. High volatility means prices swing wildly, while low volatility means they're more stable.
328
+
329
+ First, I should think about macroeconomic factors. Interest rates come to mind. When central banks change rates, it affects borrowing costs and investor sentiment. For example, if rates rise, borrowing becomes more expensive, which can slow down the economy and lead to lower stock prices. That would increase volatility.
330
+
331
+ Then there's inflation. High inflation can erode purchasing power, leading to uncertainty. Investors might worry about central banks raising rates to combat inflation, which again affects stock prices. Geopolitical events like wars or elections can also create uncertainty. For instance, a trade war between countries can disrupt supply chains and affect companies' profits, leading to market swings.
332
+
333
+ Corporate earnings are another factor. If companies report better-than-
334
+
335
+ πŸ“ˆ Quality:
336
+ βœ… Complete: True
337
+ βœ… Shows reasoning: True
338
+ πŸ“ Answer length: 1097 chars
339
+
340
+ [Test 5/10]
341
+
342
+ ────────────────────────────────────────────────────────────────────────────────
343
+ CatΓ©gorie: Finance d'Entreprise
344
+ Question: Expliquez la diffΓ©rence entre l'EBITDA (BΓ©nΓ©fice avant intΓ©rΓͺts, impΓ΄ts, dΓ©prΓ©ciation et amortissement) et le rΓ©sultat net.
345
+ Max Tokens: 200
346
+ ────────────────────────────────────────────────────────────────────────────────
347
+
348
+ πŸ“Š Stats:
349
+ ⏱️ Time: 13.35s
350
+ πŸ“ Tokens: 200/200
351
+ 🏁 Finish: stop
352
+
353
+ πŸ’¬ Answer:
354
+ <think>
355
+ Okay, the user is asking about the difference between EBITDA and net income. Let me start by recalling what each term means. EBITDA stands for Earnings Before Interest, Taxes, Depreciation, and Amortization. It's a measure of a company's operating performance, essentially showing how much profit a company makes from its core operations before accounting for non-operating expenses and non-cash charges.
356
+
357
+ Net income, on the other hand, is the total profit after all expenses, including interest, taxes, depreciation, amortization, and other costs. So, the main difference is that EBITDA excludes interest, taxes, depreciation, and amortization, while net income includes all of these.
358
+
359
+ Wait, but why would someone use EBITDA over net income? Maybe because EBITDA gives a clearer picture of the company's operational efficiency without the effects of financing decisions, tax environments, or accounting policies related to depreciation. Investors might use EBITDA to compare
360
+
361
+ πŸ“ˆ Quality:
362
+ βœ… Complete: True
363
+ βœ… Shows reasoning: True
364
+ πŸ“ Answer length: 984 chars
365
+
366
+ [Test 6/10]
367
+
368
+ ────────────────────────────────────────────────────────────────────────────────
369
+ CatΓ©gorie: StratΓ©gie d'Investissement
370
+ Question: Qu'est-ce que la diversification d'un portefeuille et pourquoi est-elle importante?
371
+ Max Tokens: 200
372
+ ────────────────────────────────────────────────────────────────────────────────
373
+
374
+ πŸ“Š Stats:
375
+ ⏱️ Time: 13.35s
376
+ πŸ“ Tokens: 200/200
377
+ 🏁 Finish: stop
378
+
379
+ πŸ’¬ Answer:
380
+ <think>
381
+ Okay, the user is asking about portfolio diversification and why it's important. Let me start by defining what diversification means. It's about spreading investments across different asset classes, sectors, or regions to reduce risk. I should mention that it's a fundamental principle in investing.
382
+
383
+ Now, why is it important? The main point is risk reduction. By not putting all eggs in one basket, losses in one area might be offset by gains in another. I should explain that diversification doesn't eliminate risk entirely but helps manage it. Maybe use an example like stocks, bonds, real estate, etc.
384
+
385
+ Also, different asset classes perform differently under various market conditions. For instance, when stocks are down, bonds might hold their value or even go up. That's a key point to highlight.
386
+
387
+ I should also touch on the concept of correlation. Assets with low or negative correlation can enhance diversification benefits. Maybe mention that diversification can lead to more stable returns over time, which is important
388
+
389
+ πŸ“ˆ Quality:
390
+ βœ… Complete: True
391
+ βœ… Shows reasoning: True
392
+ πŸ“ Answer length: 1037 chars
393
+
394
+ [Test 7/10]
395
+
396
+ ───────────────────────────────────────────────────────────────────────────────���
397
+ CatΓ©gorie: Ratios Financiers
398
+ Question: Comment calculer le ratio cours/bΓ©nΓ©fice (PER) et comment l'interprΓ©ter?
399
+ Max Tokens: 150
400
+ ────────────────────────────────────────────────────────────────────────────────
401
+
402
+ πŸ“Š Stats:
403
+ ⏱️ Time: 10.08s
404
+ πŸ“ Tokens: 150/150
405
+ 🏁 Finish: stop
406
+
407
+ πŸ’¬ Answer:
408
+ <think>
409
+ Okay, the user is asking how to calculate the P/E ratio and how to interpret it. Let me start by recalling the formula. The P/E ratio is calculated by dividing the market price per share by the earnings per share (EPS). So, P/E = Market Price / EPS. I need to make sure I explain each component clearly.
410
+
411
+ First, the market price per share is straightforwardβ€”it's the current stock price. But EPS might be a bit trickier. EPS is calculated as net income divided by the number of outstanding shares. Wait, but sometimes companies have different methods, like diluted EPS. Should I mention that? Maybe, but the user might not need the complexities unless they ask. Let's keep it simple for now.
412
+
413
+
414
+
415
+ πŸ“ˆ Quality:
416
+ βœ… Complete: True
417
+ βœ… Shows reasoning: True
418
+ πŸ“ Answer length: 699 chars
419
+
420
+ [Test 8/10]
421
+
422
+ ────────────────────────────────────────────────────────────────────────────────
423
+ CatΓ©gorie: Obligations
424
+ Question: Pourquoi les prix des obligations baissent-ils lorsque les taux d'intΓ©rΓͺt augmentent?
425
+ Max Tokens: 150
426
+ ────────────────────────────────────────────────────────────────────────────────
427
+
428
+ πŸ“Š Stats:
429
+ ⏱️ Time: 10.07s
430
+ πŸ“ Tokens: 150/150
431
+ 🏁 Finish: stop
432
+
433
+ πŸ’¬ Answer:
434
+ <think>
435
+ Okay, the user is asking why bond prices fall when interest rates rise. Let me start by recalling what I know about bonds and interest rates. So, bonds are essentially loans that investors give to governments or corporations. When you buy a bond, you're lending money for a certain period, and in return, the issuer pays you interest, usually in the form of coupon payments.
436
+
437
+ Now, when interest rates go up, new bonds are issued with higher coupon rates because the market demands a higher return. So, if I have an old bond with a lower coupon rate, it's less attractive compared to the new bonds that offer higher returns. That makes the price of my old bond drop because investors won't want to pay as much for it
438
+
439
+ πŸ“ˆ Quality:
440
+ βœ… Complete: True
441
+ βœ… Shows reasoning: True
442
+ πŸ“ Answer length: 723 chars
443
+
444
+ [Test 9/10]
445
+
446
+ ────────────────────────────────────────────────────────────────────────────────
447
+ CatΓ©gorie: Analyse Technique (Termes FranΓ§ais)
448
+ Question: Expliquez les termes suivants utilisΓ©s en bourse franΓ§aise: CAC 40, PEA, sicav, et OAT.
449
+ Max Tokens: 200
450
+ ────────────────────────────────────────────────────────────────────────────────
451
+
452
+ πŸ“Š Stats:
453
+ ⏱️ Time: 13.35s
454
+ πŸ“ Tokens: 200/200
455
+ 🏁 Finish: stop
456
+
457
+ πŸ’¬ Answer:
458
+ <think>
459
+ Okay, the user is asking me to explain the terms CAC 40, PEA, sicav, and OAT in the context of French stock trading. Let me start by recalling each of these terms.
460
+
461
+ First, CAC 40. I know that's the main stock index for the Paris Stock Exchange. It includes the 40 largest companies listed there. I should mention that it's a benchmark for the French economy and that it's used to track market performance. Maybe also note that it's calculated in real-time and adjusted for dividends.
462
+
463
+ Next, PEA. That's the Plan d'Γ‰pargne en Actions. It's a tax-advantaged investment account for holding stocks and certain funds. I need to explain that it's designed for long-term investment, has tax benefits after a certain period, and the holding period requirement. Also, mention the annual contribution limits and that it's available to residents in France.
464
+
465
+ Sicav. That stands
466
+
467
+ πŸ“ˆ Quality:
468
+ βœ… Complete: True
469
+ βœ… Shows reasoning: True
470
+ πŸ“ Answer length: 873 chars
471
+
472
+ [Test 10/10]
473
+
474
+ ────────────────────────────────────────────────────────────────────────────────
475
+ CatΓ©gorie: FiscalitΓ© (France)
476
+ Question: Quelle est la différence entre la Flat Tax et le barème progressif pour l'imposition des revenus de capitaux mobiliers en France?
477
+ Max Tokens: 200
478
+ ────────────────────────────────────────────────────────────────────────────────
479
+
480
+ πŸ“Š Stats:
481
+ ⏱️ Time: 13.35s
482
+ πŸ“ Tokens: 200/200
483
+ 🏁 Finish: stop
484
+
485
+ πŸ’¬ Answer:
486
+ <think>
487
+ Okay, the user is asking about the difference between the Flat Tax and the progressive tax bracket system for taxing capital gains in France. Let me start by recalling what I know about these two systems.
488
+
489
+ First, the Flat Tax. I remember that in France, there's a Flat Tax of 30% on capital gains from the sale of securities. But wait, there's also a 12.8% tax rate for certain types of investments, like those in the PEA (Plan d'Γ‰pargne en Actions). So maybe the Flat Tax applies to most capital gains, but there are exceptions. Also, there's the notion of 'abattement' or deduction, which reduces the taxable base. For example, after a certain period of holding the asset, you might get a 50% deduction. So the effective tax rate could be lower than 30%.
490
+
491
+ Then there's the progressive tax bracket system. I think this applies to other types of income
492
+
493
+ πŸ“ˆ Quality:
494
+ βœ… Complete: True
495
+ βœ… Shows reasoning: True
496
+ πŸ“ Answer length: 860 chars
497
+
498
+ ================================================================================
499
+ RÉSUMÉ DES TESTS
500
+ ================================================================================
501
+
502
+ βœ… Successful: 10/10
503
+ ❌ Failed: 0/10
504
+
505
+ πŸ“Š Performance Metrics:
506
+ ⏱️ Average response time: 12.03s
507
+ πŸ“ Average tokens used: 180
508
+ βœ… Complete answers: 10/10 (100.0%)
509
+ 🧠 Answers with reasoning: 10/10 (100.0%)
510
+ πŸ’° Token efficiency: 1800/1800 (100.0% utilization)
511
+
512
+
513
+ ================================================================================
514
+ OVERALL SUMMARY
515
+ ================================================================================
516
+
517
+ πŸ“Š Total Tests: 18
518
+ βœ… Total Successful: 18/18 (100.0%)
519
+ πŸ‡¬πŸ‡§ English: 8/8
520
+ πŸ‡«πŸ‡· French: 10/10
521
+
522
+ ================================================================================
523
+ TESTING COMPLETE
524
+ ================================================================================