Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

jeanbaptdzd commited on Nov 2

Commit

a750766

1 Parent(s): da484d7

Update to vLLM 0.9.2 with Qwen3 support, remove PRIIPS functionality, add HF Space validation hook

- Upgraded vLLM from 0.6.5 to 0.9.2 for Qwen3ForCausalLM support
- Removed all PRIIPS-related code and files
- Added pre-commit hook for README.md validation
- Updated README.md with red dragon theme
- Fixed Space URL references in test scripts
- Cleaned up unnecessary markdown files and scripts

Files changed (30) hide show

LICENSE +1 -1
OPTIMIZATION_EVALUATION.md +0 -137
README.md +12 -54
VLLM_COMPATIBILITY.md +0 -152
VLLM_UPGRADE_ANALYSIS.md +0 -191
app/main.py +4 -5
app/models/priips.py +0 -41
app/routers/extract.py +0 -32
app/services/extract_service.py +0 -86
app/utils/json_guard.py +0 -21
app/utils/pdf.py +0 -34
eval_results/FINANCIAL_REASONING_RESULTS.md +0 -211
eval_results/financial_reasoning_eval_20251028_163244.txt +0 -98
scripts/README.md +28 -0
scripts/check_vllm_compatibility.py +0 -258
scripts/eval_financial_reasoning.sh +0 -52
scripts/extract_priips.py +0 -182
scripts/test_model_access.py +0 -321
scripts/validate_hf_readme.py +159 -0
test_service.py +3 -3
tests/performance/README.md +0 -277
tests/performance/benchmark.py +9 -9
tests/performance/test_inference_speed.py +6 -6
tests/performance/test_openai_compatibility.py +8 -8
tests/test_config.py +1 -1
tests/test_extract_route.py +0 -50
tests/test_extract_service.py +0 -125
tests/test_json_guard.py +0 -56
tests/test_pdf_utils.py +0 -105
tests/test_priips_models.py +0 -163

LICENSE CHANGED Viewed

@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
-   Copyright 2025 PRIIPs LLM Service
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

       same "printed page" as the copyright notice for easier
       identification within third-party archives.
+   Copyright 2025 LLM Pro Finance API
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

OPTIMIZATION_EVALUATION.md DELETED Viewed

@@ -1,137 +0,0 @@
-# vLLM Optimization Mode Evaluation
-## Current Setup: Eager Mode
-**Configuration:**
-- `enforce_eager=True` - Disables CUDA graphs
-- `VLLM_USE_V1=0` - Uses v0 engine (stable)
-**Trade-offs:**
-- ✅ **Pros:** More stable, easier debugging, fewer compatibility issues
-- ❌ **Cons:** Lower performance, higher latency, reduced throughput
-## Optimized Mode: CUDA Graphs Enabled
-**Proposed Configuration:**
-- `enforce_eager=False` - Enables CUDA graphs (default)
-- `VLLM_USE_V1=0` - Still use v0 engine for stability
-**Expected Benefits:**
-- 🚀 **Performance:** 2-3x faster inference
-- 🚀 **Throughput:** Higher tokens/second
-- 🚀 **Latency:** Lower time-to-first-token (TTFT)
-**Potential Risks:**
-- ⚠️ **Compatibility:** Qwen3 may have CUDA graph issues in vLLM 0.6.5
-- ⚠️ **Memory:** Slightly higher memory overhead
-- ⚠️ **Stability:** Possible crashes with unsupported operations
-## Evaluation Criteria
-### Can We Use Optimized Mode?
-**Factors to Consider:**
-1. **Model Architecture Support**
-   - Qwen3 in vLLM 0.6.5 may or may not fully support CUDA graphs
-   - Need to test on actual deployment
-2. **Hardware Compatibility**
-   - L4 GPU: 24GB VRAM ✅
-   - CUDA 12.4: Full CUDA graph support ✅
-   - PyTorch 2.4.0: CUDA graph support ✅
-3. **vLLM Version**
-   - v0.6.5: CUDA graphs should work for supported architectures
-   - Qwen3 support may vary
-4. **Memory Constraints**
-   - Current: `gpu_memory_utilization=0.85`
-   - CUDA graphs add ~100-200MB overhead
-   - Should still fit within L4 limits
-## Recommendation: Try Optimized Mode with Fallback
-**Strategy:** Attempt optimized mode, fall back to eager if errors occur
-### Implementation Approach
-```python
-# Try optimized mode first
-try:
-    llm_engine = LLM(
-        model=model_name,
-        trust_remote_code=True,
-        dtype="bfloat16",
-        enforce_eager=False,  # Enable CUDA graphs
-        # ... other params
-    )
-except Exception as e:
-    # Fall back to eager mode
-    logger.warning(f"CUDA graphs failed, falling back to eager mode: {e}")
-    llm_engine = LLM(
-        model=model_name,
-        trust_remote_code=True,
-        dtype="bfloat16",
-        enforce_eager=True,  # Safe fallback
-        # ... other params
-    )
-```
-## Testing Plan
-### 1. Initial Test (Optimized Mode)
-- Deploy with `enforce_eager=False`
-- Monitor startup logs
-- Check for CUDA graph compilation errors
-### 2. Performance Benchmark
-If optimized mode works:
-- Measure: tokens/second, latency, throughput
-- Compare with eager mode baseline
-### 3. Stability Test
-- Run multiple requests
-- Check for crashes or errors
-- Monitor memory usage
-### 4. Fallback Verification
-- Ensure eager mode still works as backup
-- Document any issues found
-## Expected Outcomes
-### Best Case (Optimized Works)
-- ✅ CUDA graphs compile successfully
-- ✅ 2-3x performance improvement
-- ✅ Stable operation
-- **Action:** Keep optimized mode
-### Worst Case (Optimized Fails)
-- ❌ CUDA graph compilation errors
-- ❌ Runtime crashes
-- ✅ Eager mode fallback works
-- **Action:** Stay in eager mode, consider upgrading vLLM
-### Middle Case (Partial Support)
-- ⚠️ CUDA graphs work but with warnings
-- ⚠️ Some operations fall back to eager
-- ✅ Still better than full eager mode
-- **Action:** Monitor and optimize further
-## Monitoring
-Track these metrics:
-- Model loading time
-- CUDA graph compilation time
-- Inference latency
-- Throughput (tokens/sec)
-- Memory usage
-- Error rates
-## Conclusion
-**Recommendation:** **TRY OPTIMIZED MODE** with automatic fallback
-The L4 GPU and CUDA 12.4 setup should support CUDA graphs. Qwen3 compatibility is the main unknown. With automatic fallback to eager mode, we can safely test optimized mode without risking service availability.

README.md CHANGED Viewed

@@ -1,24 +1,22 @@
 ---
-title: Qwen Open Finance R 8B Inference
-emoji: 📊
-colorFrom: blue
-colorTo: purple
 sdk: docker
 pinned: false
-license: apache-2.0
 app_port: 7860
-hardware: l4
 ---
-# Qwen Open Finance R 8B Inference
-OpenAI-compatible API and financial document processor powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
 ## 🚀 Quick Start
 This service provides:
 - **OpenAI-compatible API** at `/v1/models` and `/v1/chat/completions`
-- **PRIIPs extraction** at `/extract-priips` for structured financial document parsing
 - **Streaming support** for real-time completions
 - **Provider abstraction** for easy integration with PydanticAI/DSPy
@@ -28,12 +26,12 @@ This service provides:
 #### List Models
 ```bash
-curl -X GET "https://your-space-url.hf.space/v1/models"
 ```
 #### Chat Completions
 ```bash
-curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
     "model": "DragonLLM/qwen3-8b-fin-v1.0",
@@ -45,7 +43,7 @@ curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
 #### Streaming Chat Completions
 ```bash
-curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
     "model": "DragonLLM/qwen3-8b-fin-v1.0",
@@ -54,44 +52,6 @@ curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
   }'
 ```
-### PRIIPs Extraction
-#### Extract Structured Data from PDFs
-```bash
-curl -X POST "https://your-space-url.hf.space/extract-priips" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "sources": ["https://example.com/priips-document.pdf"],
-    "options": {"language": "en", "ocr": false}
-  }'
-```
-**Response:**
-```json
-{
-  "product_name": "Example Investment Fund",
-  "manufacturer": "Example Asset Management",
-  "isin": "DE0001234567",
-  "sri": 3,
-  "recommended_holding_period": "5 years",
-  "costs": {
-    "entry_cost_pct": 2.5,
-    "ongoing_cost_pct": 1.2,
-    "exit_cost_pct": 0.5
-  },
-  "performance_scenarios": [
-    {
-      "name": "Bull Market",
-      "description": "Optimistic scenario",
-      "return_pct": 15.5
-    }
-  ],
-  "date": "2024-01-01",
-  "language": "en",
-  "source_url": "https://example.com/priips-document.pdf"
-}
-```
 ## 🔧 Configuration
 The service uses these environment variables:
@@ -128,7 +88,7 @@ from pydantic_ai.models.openai import OpenAIModel
 model = OpenAIModel(
     "DragonLLM/qwen3-8b-fin-v1.0",
-    base_url="https://your-space-url.hf.space/v1"
 )
 agent = Agent(model=model)
@@ -140,14 +100,13 @@ import dspy
 lm = dspy.OpenAI(
     model="DragonLLM/qwen3-8b-fin-v1.0",
-    api_base="https://your-space-url.hf.space/v1"
 )
 ```
 ## 📊 Features
 - ✅ **OpenAI-compatible API** - Drop-in replacement for OpenAI API
-- ✅ **PRIIPs document extraction** - Structured JSON from financial PDFs
 - ✅ **Provider abstraction** - Easy to swap backends
 - ✅ **Streaming support** - Real-time chat completions
 - ✅ **Error handling** - Robust error handling and validation
@@ -192,4 +151,3 @@ MIT License - see LICENSE file for details.
 - **vLLM:** 0.9.2 (upgraded from 0.6.5 - July 2025 release)
 - **PyTorch:** 2.5.0+ (CUDA 12.4)
 - **CUDA:** 12.4
-- See `VLLM_UPGRADE_ANALYSIS.md` for upgrade details

 ---
+title: Open Finance LLM 8B
+emoji: 🐉
+colorFrom: red
+colorTo: red
 sdk: docker
 pinned: false
 app_port: 7860
+suggested_hardware: l4x1
 ---
+# Open Finance LLM 8B
+OpenAI-compatible API powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
 ## 🚀 Quick Start
 This service provides:
 - **OpenAI-compatible API** at `/v1/models` and `/v1/chat/completions`
 - **Streaming support** for real-time completions
 - **Provider abstraction** for easy integration with PydanticAI/DSPy
 #### List Models
 ```bash
+curl -X GET "https://your-username-open-finance-llm-8b.hf.space/v1/models"
 ```
 #### Chat Completions
 ```bash
+curl -X POST "https://your-username-open-finance-llm-8b.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
     "model": "DragonLLM/qwen3-8b-fin-v1.0",
 #### Streaming Chat Completions
 ```bash
+curl -X POST "https://your-username-open-finance-llm-8b.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
     "model": "DragonLLM/qwen3-8b-fin-v1.0",
   }'
 ```
 ## 🔧 Configuration
 The service uses these environment variables:
 model = OpenAIModel(
     "DragonLLM/qwen3-8b-fin-v1.0",
+    base_url="https://your-username-open-finance-llm-8b.hf.space/v1"
 )
 agent = Agent(model=model)
 lm = dspy.OpenAI(
     model="DragonLLM/qwen3-8b-fin-v1.0",
+    api_base="https://your-username-open-finance-llm-8b.hf.space/v1"
 )
 ```
 ## 📊 Features
 - ✅ **OpenAI-compatible API** - Drop-in replacement for OpenAI API
 - ✅ **Provider abstraction** - Easy to swap backends
 - ✅ **Streaming support** - Real-time chat completions
 - ✅ **Error handling** - Robust error handling and validation
 - **vLLM:** 0.9.2 (upgraded from 0.6.5 - July 2025 release)
 - **PyTorch:** 2.5.0+ (CUDA 12.4)
 - **CUDA:** 12.4

VLLM_COMPATIBILITY.md DELETED Viewed

@@ -1,152 +0,0 @@
-# vLLM 0.6.5 + DragonLLM/qwen3-8b-fin-v1.0 Compatibility Analysis
-## Summary
-✅ **Status: LIKELY COMPATIBLE** - Configuration matches Qwen3 requirements
-## Current Configuration
-- **vLLM Version:** 0.9.2 ✅ (upgraded from 0.6.5)
-- **Model:** DragonLLM/qwen3-8b-fin-v1.0
-- **Architecture:** Qwen3
-- **PyTorch:** 2.5.0+cu124 (CUDA 12.4)
-- **Model Parameters:** ~8B (308.2K according to HF, but this seems like a reporting issue)
-**Upgrade Status:** Upgraded to vLLM 0.9.2 (July 2025) - provides significant improvements over 0.6.5 while maintaining CUDA 12.4 compatibility.
-## Compatibility Factors
-### ✅ Positive Indicators
-1. **Architecture Support**
-   - Model uses `qwen3` architecture
-   - Qwen models are generally well-supported in vLLM
-   - Code comment indicates: "vLLM: v0.6.5 (Qwen3 support + VLLM_USE_V1=0 for stability)"
-2. **Configuration Matches Requirements**
-   ```python
-   dtype="bfloat16"          # ✅ Required for Qwen3
-   trust_remote_code=True    # ✅ Required for custom architectures
-   enforce_eager=True        # ✅ Avoids CUDA graph issues
-   ```
-3. **Model Repository Info**
-   - Tags include: `text-generation-inference`, `endpoints_compatible`
-   - These tags suggest vLLM/TGI compatibility
-   - Uses `transformers` + `safetensors` format (vLLM compatible)
-4. **Environment Setup**
-   - `VLLM_USE_V1=0` - Using stable v0 engine
-   - Proper HF token authentication configured
-   - CUDA 12.4 with PyTorch 2.4.0
-### ⚠️ Potential Concerns
-1. **vLLM 0.6.5 Release Date**
-   - vLLM 0.6.5 was released in September 2024
-   - Qwen3 models may have been added in later versions
-   - **Action:** Monitor for compatibility issues during model loading
-2. **Model Size Reporting**
-   - HF shows "308.2K parameters" which seems incorrect for an 8B model
-   - This is likely a metadata issue, not a compatibility issue
-3. **Private Model Access**
-   - Model is private (requires authentication)
-   - Authentication is properly configured
-   - Must accept model terms on HF
-## Configuration Verification
-### Current vLLM Initialization
-```python
-llm_engine = LLM(
-    model="DragonLLM/qwen3-8b-fin-v1.0",
-    trust_remote_code=True,      # ✅ Required
-    dtype="bfloat16",            # ✅ Required for Qwen3
-    max_model_len=4096,          # ✅ Reasonable for L4 GPU
-    gpu_memory_utilization=0.85, # ✅ Good utilization
-    tensor_parallel_size=1,      # ✅ Single GPU
-    download_dir="/tmp/huggingface",
-    tokenizer_mode="auto",
-    enforce_eager=True,          # ✅ Stability
-    disable_log_stats=False,     # ✅ Debugging enabled
-)
-```
-### Environment Variables
-```bash
-VLLM_USE_V1=0                    # ✅ Use stable v0 engine
-CUDA_VISIBLE_DEVICES=0           # ✅ Single GPU
-HF_TOKEN (via HF_TOKEN_LC2)      # ✅ Authentication
-```
-## Testing Recommendations
-### 1. Test Model Loading
-```bash
-# Run the service and monitor startup logs
-# Check for these success indicators:
-- "✅ vLLM engine initialized successfully"
-- No architecture mismatch errors
-- Model loads without errors
-```
-### 2. Test Inference
-```python
-# Simple test request
-{
-    "model": "DragonLLM/qwen3-8b-fin-v1.0",
-    "messages": [{"role": "user", "content": "Hello"}],
-    "max_tokens": 50
-}
-```
-### 3. Monitor for Errors
-**If you see:**
-- `AttributeError: 'LlamaForCausalLM' object has no attribute 'qwen'`
-- `Model architecture not supported`
-- `dtype mismatch errors`
-**Then:** vLLM 0.6.5 may not fully support Qwen3, upgrade to vLLM 0.6.6+ or 0.7.0+
-## Upgrade Path (if needed)
-If compatibility issues occur:
-### Option 1: Upgrade vLLM (Recommended)
-```dockerfile
-# In Dockerfile, change:
-RUN pip install --no-cache-dir vllm==0.6.6
-# or
-RUN pip install --no-cache-dir vllm==0.7.0
-```
-### Option 2: Test with Latest
-```dockerfile
-RUN pip install --no-cache-dir vllm>=0.7.0
-```
-## Verification Checklist
-- [x] Model architecture: Qwen3 ✅
-- [x] dtype: bfloat16 ✅
-- [x] trust_remote_code: True ✅
-- [x] Authentication configured ✅
-- [x] PyTorch 2.4.0 with CUDA 12.4 ✅
-- [ ] Model loads successfully (test on deployment)
-- [ ] Inference works correctly (test on deployment)
-## Conclusion
-Based on the configuration and model metadata, **DragonLLM/qwen3-8b-fin-v1.0 should be compatible with vLLM 0.6.5**. The configuration follows best practices for Qwen models.
-**However**, since Qwen3 is a relatively new architecture, monitor the first deployment closely. If you encounter any architecture-related errors, upgrading to vLLM 0.6.6+ or 0.7.0+ is recommended.
-## References
-- Model: https://huggingface.co/DragonLLM/qwen3-8b-fin-v1.0
-- vLLM Docs: https://docs.vllm.ai/en/stable/models/supported_models.html
-- Qwen3 Architecture: Uses bfloat16, requires trust_remote_code

VLLM_UPGRADE_ANALYSIS.md DELETED Viewed

@@ -1,191 +0,0 @@
-# vLLM Upgrade Analysis: 0.6.5 → Latest
-## Current Status
-- **Current Version:** vLLM 0.6.5 (September 2024)
-- **Latest Version:** vLLM 0.10.2 (October 2025) or 0.9.2
-- **Version Gap:** ~14+ months of updates
-## Latest Version Information
-### vLLM 0.10.2 (Latest - October 2025)
-- **CUDA Support:** CUDA 13.0.2
-- **PyTorch:** Likely requires newer PyTorch version
-- **New Features:**
-  - Multi-node configurations
-  - FP8 precision support (Hopper+ GPUs)
-  - NVFP4 format (Blackwell GPUs)
-  - DeepSeek-R1 and Llama-3.1-8B-Instruct support
-  - RTX PRO 6000 Blackwell Server Edition support
-### vLLM 0.9.2 (Stable - October 2025)
-- More stable release track
-- Improved GPU architecture support
-- Better memory management
-- Likely better Qwen3 support
-## Current Setup Requirements
-### Our Current Configuration
-- **CUDA:** 12.4
-- **PyTorch:** 2.4.0+cu124
-- **Python:** 3.11
-- **GPU:** L4 (24GB VRAM)
-- **Model:** Qwen3-8B
-## Compatibility Considerations
-### ⚠️ Potential Issues Upgrading to 0.10.x
-1. **CUDA 13.0.2 Requirement**
-   - vLLM 0.10.2 supports CUDA 13.0.2
-   - We're on CUDA 12.4
-   - **Solution:** May need CUDA 13 base image OR use vLLM 0.9.x which likely supports CUDA 12.x
-2. **PyTorch Version**
-   - Newer vLLM may require PyTorch 2.5+
-   - Current: PyTorch 2.4.0
-   - **Action:** Check vLLM 0.9.x requirements
-3. **Python Version**
-   - vLLM 0.9+ may require Python 3.11+
-   - Current: Python 3.11 ✅
-   - **Status:** Compatible
-### ✅ Benefits of Upgrading
-1. **Better Qwen3 Support**
-   - Newer versions likely have improved Qwen3 compatibility
-   - Better CUDA graph support
-   - More stable inference
-2. **Performance Improvements**
-   - Better memory management
-   - Optimized kernels
-   - Improved throughput
-3. **Bug Fixes**
-   - 14+ months of bug fixes
-   - Security updates
-   - Stability improvements
-4. **Feature Updates**
-   - Better streaming support
-   - Improved API compatibility
-   - New optimizations
-## Recommended Upgrade Path
-### Option 1: Upgrade to vLLM 0.9.x (Recommended)
-**Why:**
-- Better balance of features and stability
-- Likely still supports CUDA 12.4
-- Better Qwen3 support than 0.6.5
-- Not as bleeding edge as 0.10.x
-**Changes Needed:**
-```dockerfile
-# Update Dockerfile
-RUN pip install --no-cache-dir vllm>=0.9.0,<0.10.0
-# May need to update PyTorch:
-RUN pip install --no-cache-dir \
-    torch>=2.5.0 \
-    --index-url https://download.pytorch.org/whl/cu124
-```
-### Option 2: Upgrade to vLLM 0.10.x (If CUDA 13 available)
-**Why:**
-- Latest features and optimizations
-- Best performance improvements
-**Changes Needed:**
-```dockerfile
-# Update base image to CUDA 13
-FROM nvidia/cuda:13.0.2-devel-ubuntu22.04
-# Update PyTorch for CUDA 13
-RUN pip install --no-cache-dir \
-    torch>=2.5.0 \
-    --index-url https://download.pytorch.org/whl/cu130
-# Install latest vLLM
-RUN pip install --no-cache-dir vllm>=0.10.0
-```
-### Option 3: Gradual Upgrade (Safest)
-1. **First:** Upgrade to vLLM 0.7.x or 0.8.x
-   - Test Qwen3 compatibility
-   - Verify performance
-2. **Then:** Move to 0.9.x
-   - Test thoroughly
-   - Monitor stability
-3. **Finally:** Consider 0.10.x if needed
-## Code Changes Required
-### Minimal Changes Expected
-1. **Environment Variables**
-   - `VLLM_USE_V1=0` may no longer be needed (v1 engine is default in newer versions)
-   - May need to update or remove
-2. **API Changes**
-   - LLM initialization likely compatible
-   - Some parameters may be deprecated
-   - Check release notes
-3. **Streaming**
-   - Better streaming support in newer versions
-   - May need to update streaming implementation
-## Testing Checklist
-After upgrading:
-- [ ] Model loads successfully
-- [ ] Qwen3 architecture works
-- [ ] CUDA graphs work (optimized mode)
-- [ ] Inference produces correct results
-- [ ] Streaming works
-- [ ] Memory usage acceptable
-- [ ] Performance improved/stable
-- [ ] No regressions in API compatibility
-## Recommendations
-### Immediate Action: Upgrade to vLLM 0.9.x
-**Reasoning:**
-1. Still supports CUDA 12.4 (no base image change needed)
-2. Much better than 0.6.5
-3. Better Qwen3 support
-4. More stable than 0.10.x
-5. Significant improvements without breaking changes
-**Steps:**
-1. Update Dockerfile to use vLLM 0.9.2
-2. Update PyTorch to 2.5+ (may be needed)
-3. Test on deployment
-4. Monitor for issues
-### Future Consideration: vLLM 0.10.x
-Only if:
-- CUDA 13 becomes available
-- Need specific 0.10.x features
-- 0.9.x proves insufficient
-## Summary
-**Current:** vLLM 0.6.5 (old, but working)
-**Recommended:** vLLM 0.9.2 (good balance)
-**Latest:** vLLM 0.10.2 (requires CUDA 13)
-**Action:** Upgrade to vLLM 0.9.2 for best compatibility with current setup while gaining significant improvements.

app/main.py CHANGED Viewed

@@ -1,17 +1,16 @@
 from fastapi import FastAPI
 from app.middleware import api_key_guard
-from app.routers import openai_api, extract
 import logging
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-app = FastAPI(title="PRIIPs LLM Service (vLLM)")
 # Mount routers
 app.include_router(openai_api.router, prefix="/v1")
-app.include_router(extract.router)
 # Optional API key middleware
 app.middleware("http")(api_key_guard)
@@ -20,7 +19,7 @@ app.middleware("http")(api_key_guard)
 async def startup_event():
     """Startup event - initialize model in background"""
     import threading
-    logger.info("Starting PRIIPs LLM Service...")
     logger.info("Initializing model in background thread...")
     def load_model():
@@ -44,6 +43,6 @@ async def root():
 @app.get("/health")
 async def health():
-    return {"status": "healthy", "service": "PRIIPs LLM Service"}

 from fastapi import FastAPI
 from app.middleware import api_key_guard
+from app.routers import openai_api
 import logging
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
+app = FastAPI(title="LLM Pro Finance API (vLLM)")
 # Mount routers
 app.include_router(openai_api.router, prefix="/v1")
 # Optional API key middleware
 app.middleware("http")(api_key_guard)
 async def startup_event():
     """Startup event - initialize model in background"""
     import threading
+    logger.info("Starting LLM Pro Finance API...")
     logger.info("Initializing model in background thread...")
     def load_model():
 @app.get("/health")
 async def health():
+    return {"status": "healthy", "service": "LLM Pro Finance API"}

app/models/priips.py DELETED Viewed

@@ -1,41 +0,0 @@
-from typing import List, Optional
-from pydantic import BaseModel
-class PerformanceScenario(BaseModel):
-    name: str
-    description: Optional[str] = None
-    return_pct: Optional[float] = None
-class Costs(BaseModel):
-    entry_cost_pct: Optional[float] = None
-    ongoing_cost_pct: Optional[float] = None
-    exit_cost_pct: Optional[float] = None
-class PriipsFields(BaseModel):
-    product_name: Optional[str] = None
-    manufacturer: Optional[str] = None
-    isin: Optional[str] = None
-    sri: Optional[int] = None
-    recommended_holding_period: Optional[str] = None
-    costs: Optional[Costs] = None
-    performance_scenarios: Optional[List[PerformanceScenario]] = None
-    date: Optional[str] = None
-    language: Optional[str] = None
-    source_url: Optional[str] = None
-class ExtractRequest(BaseModel):
-    sources: List[str]
-    options: Optional[dict] = None
-class ExtractResult(BaseModel):
-    source: str
-    success: bool
-    data: Optional[PriipsFields] = None
-    error: Optional[str] = None

app/routers/extract.py DELETED Viewed

@@ -1,32 +0,0 @@
-from fastapi import APIRouter, UploadFile, File
-from pathlib import Path
-import tempfile
-import os
-from app.models.priips import ExtractRequest
-from app.services import extract_service
-router = APIRouter()
-@router.post("/extract-priips")
-async def extract_priips(file: UploadFile = File(...)):
-    """Extract PRIIPS fields from uploaded PDF"""
-    # Save uploaded file to temporary location
-    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
-        content = await file.read()
-        tmp_file.write(content)
-        tmp_path = tmp_file.name
-    try:
-        # Process the file using the extract service
-        req = ExtractRequest(sources=[tmp_path])
-        results = await extract_service.extract(req)
-        return results[0] if results else {"success": False, "error": "No results"}
-    finally:
-        # Clean up temp file
-        if os.path.exists(tmp_path):
-            os.remove(tmp_path)

app/services/extract_service.py DELETED Viewed

@@ -1,86 +0,0 @@
-import json
-from pathlib import Path
-from typing import List
-from app.config import settings
-from app.models.priips import ExtractRequest, ExtractResult, PriipsFields
-from app.providers import vllm
-from app.utils.pdf import download_to_tmp, extract_text_from_pdf
-from app.utils.json_guard import try_parse_json
-def build_prompt(text: str) -> str:
-    schema = {
-        "product_name": "string",
-        "manufacturer": "string",
-        "isin": "string",
-        "sri": "integer (1-7)",
-        "recommended_holding_period": "string",
-        "costs": {
-            "entry_cost_pct": "number?",
-            "ongoing_cost_pct": "number?",
-            "exit_cost_pct": "number?",
-        },
-        "performance_scenarios": [
-            {"name": "string", "description": "string?", "return_pct": "number?"}
-        ],
-        "date": "string?",
-        "language": "string?",
-        "source_url": "string?",
-    }
-    instruction = (
-        "You are an expert financial document parser. "
-        "Extract the requested PRIIPs fields as STRICT JSON only, no extra text. "
-        f"JSON schema keys: {list(schema.keys())}."
-    )
-    return f"{instruction}\n\nDocument:\n{text[:15000]}"
-async def process_source(src: str) -> ExtractResult:
-    try:
-        path: Path
-        if src.lower().startswith("http"):
-            path = await download_to_tmp(src, Path(".tmp"))
-        else:
-            path = Path(src)
-        text = extract_text_from_pdf(path)
-        prompt = build_prompt(text)
-        payload = {
-            "model": settings.model,
-            "messages": [
-                {"role": "system", "content": "You output JSON only."},
-                {"role": "user", "content": prompt},
-            ],
-            "temperature": 0.1,
-            "max_tokens": 800,
-            "stream": False,
-        }
-        data = await vllm.chat(payload, stream=False)
-        # vLLM OpenAI response
-        content = (
-            data.get("choices", [{}])[0]
-            .get("message", {})
-            .get("content", "")
-            if isinstance(data, dict)
-            else ""
-        )
-        ok, parsed = try_parse_json(content)
-        if not ok:
-            return ExtractResult(source=src, success=False, error=str(parsed))
-        model_data = PriipsFields(**parsed)
-        model_data.source_url = src
-        return ExtractResult(source=src, success=True, data=model_data)
-    except Exception as e:
-        return ExtractResult(source=src, success=False, error=str(e))
-async def extract(req: ExtractRequest) -> List[ExtractResult]:
-    results: List[ExtractResult] = []
-    for src in req.sources:
-        results.append(await process_source(src))
-    return results

app/utils/json_guard.py DELETED Viewed

@@ -1,21 +0,0 @@
-import json
-from typing import Any, Tuple
-def try_parse_json(text: str) -> Tuple[bool, Any]:
-    if text is None:
-        return False, "Input is None"
-    try:
-        return True, json.loads(text)
-    except Exception:
-        # naive repair: strip markdown fences if present
-        t = text.strip()
-        if t.startswith("```") and t.endswith("```"):
-            t = t.strip("`\n ")
-        try:
-            return True, json.loads(t)
-        except Exception as e:
-            return False, str(e)

app/utils/pdf.py DELETED Viewed

@@ -1,34 +0,0 @@
-from pathlib import Path
-from typing import Optional
-import httpx
-async def download_to_tmp(url: str, tmp_dir: Path) -> Path:
-    tmp_dir.mkdir(parents=True, exist_ok=True)
-    filename = url.split("/")[-1] or "document.pdf"
-    target = tmp_dir / filename
-    async with httpx.AsyncClient(timeout=60) as client:
-        r = await client.get(url)
-        r.raise_for_status()
-        target.write_bytes(r.content)
-    return target
-def extract_text_from_pdf(path: Path) -> str:
-    # Lazy import to avoid hard dependency during tests unless used
-    try:
-        import fitz  # PyMuPDF
-    except Exception as e:
-        raise RuntimeError("PyMuPDF (fitz) is required to extract PDF text") from e
-    doc = fitz.open(path)
-    try:
-        texts: list[str] = []
-        for page in doc:
-            texts.append(page.get_text("text"))
-        return "\n".join(texts).strip()
-    finally:
-        doc.close()

eval_results/FINANCIAL_REASONING_RESULTS.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Financial Reasoning Evaluation Results
-**Model:** DragonLLM/qwen3-8b-fin-v1.0
-**Date:** October 28, 2025
-**Hardware:** L4 GPU (24GB VRAM)
-**Configuration:** Eager mode, 4096 context, temperature 0.3
----
-## Executive Summary
-The model demonstrated **strong financial reasoning capabilities** across multiple complex scenarios:
-- ✅ **Multi-step calculations** with clear methodology
-- ✅ **Risk assessment** considering client suitability
-- ✅ **Cost-benefit analysis** with comparative evaluation
-- ✅ **Regulatory knowledge** of PRIIPS requirements
-- ✅ **Practical recommendations** with justification
-**Overall Grade: A- (Excellent)**
----
-## Task Results
-### Task 1: Investment Return Calculation ✅
-**Scenario:** Calculate total and percentage return on stock investment with dividends.
-**Performance:**
-- ✅ Correctly identified initial investment (€5,000)
-- ✅ Calculated sale proceeds (€6,500)
-- ✅ Included dividends (€200) in total proceeds
-- ✅ Computed total return: **€1,700 (34%)**
-- ✅ Showed clear step-by-step reasoning
-- ✅ Verified calculations for accuracy
-**Strengths:**
-- Systematic approach to calculation
-- Clear articulation of each step
-- Self-verification of results
-**Score: 100%**
----
-### Task 2: Risk Suitability Assessment ✅
-**Scenario:** Evaluate if high-risk product (SRI 6/7) is suitable for conservative client needing liquidity in 2 years.
-**Performance:**
-- ✅ Understood SRI rating system
-- ✅ Identified **time horizon mismatch** (5-year holding vs 2-year need)
-- ✅ Recognized **risk tolerance conflict** (-45% max loss vs low risk tolerance)
-- ✅ **Recommended against investment** with clear reasoning
-- ✅ Suggested considering alternative investments
-**Strengths:**
-- Multi-factor analysis (time, risk, liquidity)
-- Client-centric recommendation
-- Clear reasoning for decision
-**Score: 95%**
----
-### Task 3: Fund Cost Comparison ✅
-**Scenario:** Compare two funds with different fee structures over 10 years.
-**Performance:**
-- ✅ Identified key cost components (entry fee, annual fees)
-- ✅ Recognized compounding effect of fees on returns
-- ✅ Started calculating fees for both funds
-- ⚠️ Response truncated before completing full calculation
-**Strengths:**
-- Understood fee impact on compounding
-- Systematic approach to comparison
-- Recognized complexity of calculation
-**Improvements:**
-- Complete numerical comparison needed
-- Final recommendation was cut off
-**Score: 75%** (would be 100% with complete response)
----
-### Task 4: Portfolio Rebalancing Decision ✅
-**Scenario:** Decide if portfolio should be rebalanced considering costs and taxes.
-**Performance:**
-- ✅ Calculated allocation drift (60/40 → 64.1/35.9)
-- ✅ Identified relevant costs (0.5% transaction + 30% tax)
-- ✅ Analyzed **pros and cons** of rebalancing
-- ✅ **Recommended against rebalancing** due to tax inefficiency
-- ✅ Considered practical implications
-**Strengths:**
-- Balanced analysis of multiple factors
-- Tax-aware recommendation
-- Practical decision-making
-**Score: 90%**
----
-### Task 5: PRIIPS Complexity Analysis ✅
-**Scenario:** Identify challenges in creating PRIIPS KID for complex structured product.
-**Performance:**
-- ✅ Systematically addressed each product feature:
-  - 3 indices → correlation and risk management
-  - 80% capital protection → return trade-offs
-  - 3-year lock-in → suitability for investor horizon
-  - Multiple cost layers → transparency requirements
-- ✅ Demonstrated **regulatory knowledge**
-- ✅ Considered investor protection aspects
-- ⚠️ Response truncated before conclusion
-**Strengths:**
-- Comprehensive coverage of challenges
-- Regulatory awareness
-- Investor-centric perspective
-**Score: 85%**
----
-## Key Observations
-### Strengths
-1. **Mathematical Accuracy:** Correct calculations with clear methodology
-2. **Multi-step Reasoning:** Breaks down complex problems systematically
-3. **Risk Awareness:** Considers multiple risk factors in recommendations
-4. **Regulatory Knowledge:** Demonstrates understanding of PRIIPS framework
-5. **Client-Centric:** Recommendations prioritize client suitability
-6. **Self-Verification:** Checks own work for accuracy
-### Areas for Enhancement
-1. **Response Completion:** Some answers truncated due to token limits
-2. **Quantitative Depth:** Could show more detailed numerical analysis
-3. **Comparative Analysis:** More explicit side-by-side comparisons
----
-## Reasoning Capabilities Assessment
-### ✅ Demonstrated Capabilities
-| Capability | Evidence | Score |
-|-----------|----------|-------|
-| **Step-by-step reasoning** | Clear calculation steps in Task 1 | 100% |
-| **Multi-factor analysis** | Considered time/risk/liquidity in Task 2 | 95% |
-| **Trade-off evaluation** | Weighed costs vs benefits in Tasks 3 & 4 | 85% |
-| **Regulatory knowledge** | PRIIPS framework understanding in Tasks 2 & 5 | 90% |
-| **Client suitability** | Appropriate recommendations based on profile | 95% |
-| **Practical judgment** | Tax-efficient recommendations in Task 4 | 90% |
-**Average Reasoning Score: 92.5% (A-)**
----
-## Recommendations for Production Use
-### ✅ **Suitable For:**
-- Investment return calculations
-- Risk suitability assessments
-- PRIIPS document analysis
-- Client advisory support
-- Compliance review assistance
-### ⚠️ **Enhancements Needed:**
-- Increase max_tokens for complex analyses (600-800)
-- Implement multi-turn conversations for detailed Q&A
-- Add structured output formats for quantitative results
-- Include citation/source tracking for regulatory statements
-### 🎯 **Optimal Use Cases:**
-1. **PRIIPS KID Analysis** - Extract and explain key information
-2. **Investment Suitability** - Assess product-client fit
-3. **Cost Comparison** - Evaluate fee structures
-4. **Risk Explanation** - Break down complex risk profiles
-5. **Regulatory Guidance** - Explain compliance requirements
----
-## Conclusion
-The DragonLLM/qwen3-8b-fin-v1.0 model demonstrates **excellent financial reasoning capabilities** suitable for professional financial advisory applications.
-The model:
-- ✅ Shows systematic, multi-step reasoning
-- ✅ Makes appropriate recommendations
-- ✅ Considers regulatory requirements
-- ✅ Prioritizes client suitability
-With minor enhancements (longer context for complex analyses), this model is **production-ready** for PRIIPS document extraction, investment analysis, and client advisory support.
-**Recommendation: Approved for deployment with RAG integration** ✅
----
-*Evaluation conducted: October 28, 2025*
-*API: https://jeanbaptdzd-priips-llm-service.hf.space*

eval_results/financial_reasoning_eval_20251028_163244.txt DELETED Viewed

@@ -1,98 +0,0 @@
-Financial Reasoning Evaluation Results
-Start time: Tue Oct 28 16:32:44 CET 2025
---------------------------------------------------------------------------------
-Task 1: Investment Return Analysis
---------------------------------------------------------------------------------
-Prompt: An investor purchased 100 shares of a stock at €50 per share. After 2 years, they received €200 in dividends (€100 per year) and sold all shares at €65 per share.
-Calculate:
-1. The total return in euros
-2. The percentage return
-3. The annualized return (CAGR)
-Show all calculation steps and explain your reasoning.
-ERROR: Failed to get response
-Time: 1s
---------------------------------------------------------------------------------
-Task 2: PRIIPS Risk Assessment
---------------------------------------------------------------------------------
-Prompt: A PRIIPS KID document shows the following information for an investment product:
-- Summary Risk Indicator (SRI): 6 out of 7
-- Recommended holding period: 5 years
-- Maximum loss scenario: -45% of invested capital
-- Likely scenario: +5% per year
-You have a client who:
-- Is 28 years old
-- Has €10,000 to invest
-- Needs the money in 2 years for a house down payment
-- Has low risk tolerance
-Should they invest in this product? Explain your reasoning step by step.
-ERROR: Failed to get response
-Time: 0s
---------------------------------------------------------------------------------
-Task 3: Investment Cost Analysis
---------------------------------------------------------------------------------
-Prompt: Compare two investment funds:
-Fund A:
-- Entry fee: 5% (one-time)
-- Annual management fee: 0.5%
-- Expected annual return: 8%
-Fund B:
-- Entry fee: 0%
-- Annual management fee: 2.0%
-- Expected annual return: 8%
-For a €10,000 investment over 10 years, calculate the final value for each fund and recommend which is better. Show your calculations.
-ERROR: Failed to get response
-Time: 0s
---------------------------------------------------------------------------------
-Task 4: Portfolio Rebalancing Decision
---------------------------------------------------------------------------------
-Prompt: A client has a portfolio that was initially 60% stocks (€60,000) and 40% bonds (€40,000). After 1 year:
-- Stocks grew to €75,000 (25% gain)
-- Bonds grew to €42,000 (5% gain)
-The allocation is now 64.1% stocks and 35.9% bonds.
-Should the client rebalance back to 60/40? Consider:
-- Transaction costs: 0.5% on trades
-- Capital gains tax: 30% on profits
-- Client's risk tolerance hasn't changed
-Analyze and provide a recommendation with reasoning.
-ERROR: Failed to get response
-Time: 1s
---------------------------------------------------------------------------------
-Task 5: PRIIPS Disclosure Requirements
---------------------------------------------------------------------------------
-Prompt: A financial institution is creating a PRIIPS KID for a complex structured product with:
-- Payoff linked to 3 different indices
-- Partial capital protection (80% at maturity)
-- Lock-in period of 3 years
-- Multiple cost layers
-What are the key challenges in creating the PRIIPS KID? Explain your reasoning.
-ERROR: Failed to get response
-Time: 0s
-================================================================================
-SUMMARY
-================================================================================
-Total tasks: 5
-Successful: 0
-Failed: 5
-End time: Tue Oct 28 16:32:46 CET 2025

scripts/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+# Scripts
+## `validate_hf_readme.py`
+Validates that `README.md` is properly formatted for Hugging Face Spaces.
+### Usage
+```bash
+# Run manually
+python3 scripts/validate_hf_readme.py
+# Automatically runs on git commit (via pre-commit hook)
+git commit -m "Update README"
+```
+### What it validates
+- ✅ YAML frontmatter exists and is properly formatted
+- ✅ Required fields for Docker SDK (`sdk`, `app_port`)
+- ✅ Valid values for `sdk`, `colorFrom`, `colorTo`, `suggested_hardware`
+- ✅ Warns about deprecated fields (e.g., `hardware` → `suggested_hardware`)
+- ✅ Recommends including `emoji` and `title` fields
+### Pre-commit hook
+The script is automatically run as a git pre-commit hook. If validation fails, the commit is aborted with error messages.

scripts/check_vllm_compatibility.py DELETED Viewed

@@ -1,258 +0,0 @@
-#!/usr/bin/env python3
-"""
-Check compatibility between DragonLLM/qwen3-8b-fin-v1.0 and vLLM 0.6.5
-This script verifies:
-1. vLLM version installed
-2. Model architecture support
-3. Configuration compatibility
-4. Known issues or limitations
-"""
-import sys
-import subprocess
-from pathlib import Path
-# Add parent directory to path
-sys.path.insert(0, str(Path(__file__).parent.parent))
-try:
-    import vllm
-    from vllm import LLM
-    from vllm.model_executor.models import MODEL_REGISTRY
-except ImportError:
-    print("❌ Error: vLLM not installed")
-    print("   Install it with: pip install vllm==0.6.5")
-    sys.exit(1)
-try:
-    from huggingface_hub import model_info
-    from huggingface_hub.utils import HfHubHTTPError
-except ImportError:
-    print("⚠️  Warning: huggingface_hub not installed")
-    print("   Some checks will be skipped")
-    model_info = None
-MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
-VLLM_VERSION = "0.6.5"
-def check_vllm_version():
-    """Check installed vLLM version"""
-    print("\n" + "="*70)
-    print("CHECK 1: vLLM Version")
-    print("="*70)
-    installed_version = vllm.__version__
-    print(f"Installed vLLM version: {installed_version}")
-    print(f"Expected version: {VLLM_VERSION}")
-    if installed_version == VLLM_VERSION:
-        print("✅ Version matches!")
-        return True
-    elif installed_version.startswith("0.6"):
-        print(f"⚠️  Version mismatch: {installed_version} (expected {VLLM_VERSION})")
-        print("   This should be compatible but may have differences")
-        return True
-    else:
-        print(f"❌ Version mismatch: {installed_version}")
-        print(f"   This may cause compatibility issues")
-        return False
-def check_model_registry():
-    """Check if Qwen3 is in vLLM's model registry"""
-    print("\n" + "="*70)
-    print("CHECK 2: Model Architecture Support")
-    print("="*70)
-    # Get all registered models
-    registered_models = list(MODEL_REGISTRY.keys())
-    # Look for Qwen variants
-    qwen_models = [m for m in registered_models if 'qwen' in m.lower()]
-    print(f"Total models in registry: {len(registered_models)}")
-    print(f"Qwen-related models found: {len(qwen_models)}")
-    if qwen_models:
-        print("\n✅ Qwen models found in registry:")
-        for model in sorted(qwen_models):
-            print(f"   - {model}")
-        # Check specifically for Qwen3
-        qwen3_models = [m for m in qwen_models if 'qwen3' in m.lower() or '3' in m]
-        if qwen3_models:
-            print("\n✅ Qwen3 support detected!")
-            for model in qwen3_models:
-                print(f"   - {model}")
-            return True
-        else:
-            print("\n⚠️  Qwen models found but Qwen3 specifically not detected")
-            print("   Qwen3 might be handled by a generic Qwen loader")
-            return True  # Still likely compatible
-    else:
-        print("\n❌ No Qwen models found in registry")
-        print("   This suggests Qwen3 may not be supported")
-        return False
-def check_model_info():
-    """Check model information from Hugging Face"""
-    print("\n" + "="*70)
-    print("CHECK 3: Model Information")
-    print("="*70)
-    if not model_info:
-        print("⚠️  Skipping (huggingface_hub not available)")
-        return None
-    try:
-        info = model_info(MODEL_NAME, token=True)
-        print(f"Model: {MODEL_NAME}")
-        print(f"Architecture: {info.config.get('architectures', ['Unknown'])[0] if hasattr(info, 'config') else 'qwen3'}")
-        # Check model config
-        if hasattr(info, 'config') and info.config:
-            config = info.config
-            print(f"\nModel Configuration:")
-            # Check for Qwen-specific config
-            if 'qwen' in str(config).lower():
-                print("   ✅ Qwen architecture detected in config")
-            # Check for required fields
-            if hasattr(config, 'torch_dtype') or 'torch_dtype' in str(config):
-                print(f"   ✅ torch_dtype found")
-            if 'bfloat16' in str(config).lower():
-                print(f"   ✅ bfloat16 support confirmed")
-        return True
-    except HfHubHTTPError as e:
-        if e.response.status_code == 401:
-            print(f"❌ Unauthorized: Need to accept model terms")
-            print(f"   Visit: https://huggingface.co/{MODEL_NAME}")
-            return False
-        else:
-            print(f"❌ Error accessing model: {e}")
-            return False
-    except Exception as e:
-        print(f"⚠️  Could not fetch model info: {e}")
-        return None
-def check_configuration():
-    """Check if the configuration used is compatible"""
-    print("\n" + "="*70)
-    print("CHECK 4: Configuration Compatibility")
-    print("="*70)
-    print("Current configuration:")
-    print(f"   - dtype: bfloat16")
-    print(f"   - trust_remote_code: True")
-    print(f"   - enforce_eager: True")
-    print(f"   - max_model_len: 4096")
-    # Check if bfloat16 is supported
-    try:
-        import torch
-        if torch.cuda.is_bf16_supported():
-            print("   ✅ CUDA supports bfloat16")
-        else:
-            print("   ⚠️  CUDA may not fully support bfloat16")
-    except Exception:
-        pass
-    print("\n✅ Configuration looks compatible")
-    print("   - bfloat16: Required for Qwen3")
-    print("   - trust_remote_code: Required for custom architectures")
-    print("   - enforce_eager: Recommended for stability")
-    return True
-def check_known_issues():
-    """Check for known compatibility issues"""
-    print("\n" + "="*70)
-    print("CHECK 5: Known Issues / Compatibility Notes")
-    print("="*70)
-    print("Known considerations for Qwen3 + vLLM 0.6.5:")
-    print("   ✅ VLLM_USE_V1=0: Using v0 engine (more stable)")
-    print("   ✅ enforce_eager=True: Avoids CUDA graph issues")
-    print("   ✅ bfloat16: Required dtype for Qwen3")
-    print("   ✅ trust_remote_code: Required for custom tokenizers")
-    print("\n⚠️  Potential Issues:")
-    print("   - Qwen3 may require newer vLLM version (check if issues occur)")
-    print("   - If model fails to load, may need vLLM 0.6.6+ or 0.7.0+")
-    print("   - Monitor for tokenizer compatibility issues")
-    return True
-def main():
-    """Run all compatibility checks"""
-    print("\n" + "#"*70)
-    print("# vLLM 0.6.5 + DragonLLM/qwen3-8b-fin-v1.0 Compatibility Check")
-    print("#"*70)
-    results = {}
-    # Check 1: Version
-    results['version'] = check_vllm_version()
-    # Check 2: Model registry
-    results['registry'] = check_model_registry()
-    # Check 3: Model info
-    results['model_info'] = check_model_info()
-    # Check 4: Configuration
-    results['configuration'] = check_configuration()
-    # Check 5: Known issues
-    results['known_issues'] = check_known_issues()
-    # Summary
-    print("\n" + "="*70)
-    print("SUMMARY")
-    print("="*70)
-    for check_name, success in results.items():
-        if success is None:
-            status = "⚠️  SKIP"
-        else:
-            status = "✅ PASS" if success else "❌ FAIL"
-        check_display = check_name.replace('_', ' ').title()
-        print(f"{status} - {check_display}")
-    passed = sum(1 for v in results.values() if v is True)
-    total = sum(1 for v in results.values() if v is not None)
-    print(f"\nResults: {passed}/{total} checks passed")
-    if results.get('version') and results.get('registry'):
-        print("\n✅ Basic compatibility looks good!")
-        print("   The model should work with vLLM 0.6.5")
-        print("\n   If you encounter issues:")
-        print("   1. Ensure HF_TOKEN_LC2 is set")
-        print("   2. Check model repository access")
-        print("   3. Verify CUDA/bfloat16 support")
-        print("   4. Consider upgrading to vLLM 0.6.6+ if problems persist")
-    elif results.get('registry') == False:
-        print("\n⚠️  Qwen3 may not be explicitly supported in vLLM 0.6.5")
-        print("   Consider:")
-        print("   1. Testing with the model anyway (might still work)")
-        print("   2. Upgrading to vLLM 0.6.6 or 0.7.0+")
-        print("   3. Using a different model if compatibility issues occur")
-    else:
-        print("\n⚠️  Some compatibility concerns detected")
-        print("   Review the checks above for details")
-if __name__ == "__main__":
-    main()

scripts/eval_financial_reasoning.sh DELETED Viewed

@@ -1,52 +0,0 @@
-#!/bin/bash
-# Simplified Financial Reasoning Evaluation
-BASE_URL="https://jeanbaptdzd-priips-llm-service.hf.space"
-query_model() {
-    local prompt="$1"
-    echo "Query: $prompt" | head -c 80
-    echo "..."
-    # Use printf with %s for proper JSON escaping
-    json_prompt=$(printf '%s' "$prompt" | jq -Rs .)
-    curl -s -X POST "$BASE_URL/v1/chat/completions" \
-      -H "Content-Type: application/json" \
-      -d "{\"model\":\"DragonLLM/qwen3-8b-fin-v1.0\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a financial expert. Show your reasoning step by step.\"},{\"role\":\"user\",\"content\":$json_prompt}],\"max_tokens\":500,\"temperature\":0.3}" \
-      --max-time 60 | python3 -c "import sys, json; data=json.load(sys.stdin); print('\n' + data['choices'][0]['message']['content'] + '\n')" 2>/dev/null || echo "Error"
-}
-echo "=========================================="
-echo "Financial Reasoning Evaluation"
-echo "=========================================="
-echo ""
-echo "Task 1: Investment Return Calculation"
-echo "--------------------------------------"
-query_model "Calculate: An investor bought 100 shares at €50, received €200 dividends over 2 years, sold at €65. What is the total return in euros and percentage? Show steps."
-echo ""
-echo "Task 2: Risk Suitability Assessment"
-echo "------------------------------------"
-query_model "A product has SRI 6/7, 5-year holding period, max loss -45%. Client: 28 years old, needs money in 2 years, low risk tolerance. Should they invest? Explain why."
-echo ""
-echo "Task 3: Fund Cost Comparison"
-echo "-----------------------------"
-query_model "Fund A: 5% entry fee, 0.5% annual fee. Fund B: 0% entry, 2% annual fee. Both return 8%. Which is better for €10,000 over 10 years? Calculate."
-echo ""
-echo "Task 4: Portfolio Rebalancing"
-echo "------------------------------"
-query_model "Portfolio was 60/40 stocks/bonds. Now 64.1/35.9 after gains. Transaction cost 0.5%, tax 30%. Should client rebalance? Consider pros/cons."
-echo ""
-echo "Task 5: PRIIPS Complexity"
-echo "-------------------------"
-query_model "What are the key challenges in creating a PRIIPS KID for a structured product with: 3 indices, 80% capital protection, 3-year lock-in, multiple cost layers?"
-echo ""
-echo "=========================================="
-echo "Evaluation complete!"
-echo "=========================================="

scripts/extract_priips.py DELETED Viewed

@@ -1,182 +0,0 @@
-#!/usr/bin/env python3
-"""
-PRIIPS Document Extraction Script
-Extracts text from PRIIPS KID PDFs and processes them for RAG context.
-"""
-import sys
-import json
-from pathlib import Path
-from datetime import datetime
-import argparse
-# Add parent directory to path
-sys.path.insert(0, str(Path(__file__).parent.parent))
-from app.utils.pdf import extract_text_from_pdf
-def extract_priips_document(pdf_path: Path, output_dir: Path) -> dict:
-    """
-    Extract content from a PRIIPS KID PDF.
-    Args:
-        pdf_path: Path to the PDF file
-        output_dir: Directory to save extracted content
-    Returns:
-        Dictionary with extracted content
-    """
-    print(f"📄 Processing: {pdf_path.name}")
-    # Extract text from PDF
-    try:
-        raw_text = extract_text_from_pdf(pdf_path)
-        print(f"✅ Extracted {len(raw_text)} characters")
-    except Exception as e:
-        print(f"❌ Error extracting PDF: {e}")
-        return None
-    # Parse filename for metadata
-    filename_parts = pdf_path.stem.split("_")
-    isin = filename_parts[0] if len(filename_parts) > 0 else "UNKNOWN"
-    product_name = filename_parts[1] if len(filename_parts) > 1 else pdf_path.stem
-    # Create structured output
-    extracted_data = {
-        "metadata": {
-            "filename": pdf_path.name,
-            "extraction_date": datetime.now().isoformat(),
-            "isin": isin,
-            "product_name": product_name,
-            "file_size_bytes": pdf_path.stat().st_size,
-            "text_length": len(raw_text)
-        },
-        "raw_text": raw_text,
-        "sections": extract_sections(raw_text)
-    }
-    # Save to JSON
-    output_path = output_dir / f"{pdf_path.stem}_extracted.json"
-    with open(output_path, "w", encoding="utf-8") as f:
-        json.dump(extracted_data, f, indent=2, ensure_ascii=False)
-    print(f"💾 Saved to: {output_path}")
-    return extracted_data
-def extract_sections(text: str) -> dict:
-    """
-    Extract common PRIIPS KID sections from text.
-    This is a simple implementation. Can be enhanced with LLM-based extraction.
-    """
-    sections = {}
-    # Common PRIIPS section keywords
-    keywords = {
-        "summary": ["what is this product", "summary"],
-        "objectives": ["objectives", "investment objectives"],
-        "risk_indicator": ["risk indicator", "sri", "summary risk"],
-        "performance_scenarios": ["performance scenarios", "what could i get"],
-        "costs": ["what are the costs", "costs"],
-        "holding_period": ["recommended holding period", "holding period"]
-    }
-    text_lower = text.lower()
-    for section_name, search_terms in keywords.items():
-        for term in search_terms:
-            if term in text_lower:
-                # Extract a snippet around the keyword
-                start_idx = text_lower.find(term)
-                # Get 500 chars after the keyword
-                snippet = text[start_idx:start_idx + 500].strip()
-                sections[section_name] = snippet
-                break
-    return sections
-def batch_process_directory(input_dir: Path, output_dir: Path):
-    """Process all PDFs in a directory."""
-    pdf_files = list(input_dir.glob("*.pdf"))
-    if not pdf_files:
-        print(f"⚠️  No PDF files found in {input_dir}")
-        return
-    print(f"📦 Found {len(pdf_files)} PDF files to process\n")
-    output_dir.mkdir(parents=True, exist_ok=True)
-    results = []
-    for pdf_path in pdf_files:
-        result = extract_priips_document(pdf_path, output_dir)
-        if result:
-            results.append(result)
-        print()  # Blank line between files
-    # Save summary
-    summary_path = output_dir / "_extraction_summary.json"
-    summary = {
-        "extraction_date": datetime.now().isoformat(),
-        "total_processed": len(results),
-        "total_failed": len(pdf_files) - len(results),
-        "files": [r["metadata"] for r in results]
-    }
-    with open(summary_path, "w", encoding="utf-8") as f:
-        json.dump(summary, f, indent=2)
-    print(f"\n✅ Processed {len(results)}/{len(pdf_files)} files successfully")
-    print(f"📊 Summary saved to: {summary_path}")
-def main():
-    parser = argparse.ArgumentParser(
-        description="Extract PRIIPS KID documents for RAG context"
-    )
-    parser.add_argument(
-        "input",
-        type=str,
-        help="Input PDF file or directory containing PDFs"
-    )
-    parser.add_argument(
-        "--output",
-        type=str,
-        default=None,
-        help="Output directory (default: priips_documents/extracted/)"
-    )
-    args = parser.parse_args()
-    # Setup paths
-    workspace_root = Path(__file__).parent.parent
-    input_path = Path(args.input)
-    if not input_path.is_absolute():
-        input_path = workspace_root / input_path
-    if args.output:
-        output_dir = Path(args.output)
-        if not output_dir.is_absolute():
-            output_dir = workspace_root / output_dir
-    else:
-        output_dir = workspace_root / "priips_documents" / "extracted"
-    # Process
-    if input_path.is_file():
-        output_dir.mkdir(parents=True, exist_ok=True)
-        extract_priips_document(input_path, output_dir)
-    elif input_path.is_dir():
-        batch_process_directory(input_path, output_dir)
-    else:
-        print(f"❌ Error: {input_path} does not exist")
-        sys.exit(1)
-if __name__ == "__main__":
-    main()

scripts/test_model_access.py DELETED Viewed

@@ -1,321 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test script to verify access to DragonLLM models using Hugging Face Hub.
-This script tests:
-1. Token detection and authentication
-2. Model repository access
-3. Model information retrieval
-4. Token permissions
-Note: You can also use the HF MCP server if available:
-  - Uses huggingface_hub library directly
-  - Compatible with MCP server setup
-Run with: python scripts/test_model_access.py
-"""
-import os
-import sys
-from pathlib import Path
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent.parent))
-try:
-    from huggingface_hub import login, whoami, HfApi, model_info, get_token
-    from huggingface_hub.utils import HfHubHTTPError
-except ImportError:
-    print("❌ Error: huggingface_hub not installed")
-    print("   Install it with: pip install huggingface-hub")
-    sys.exit(1)
-# Model to test access to
-MODEL_NAME = "DragonLLM/qwen3-8b-fin-v1.0"
-def get_hf_token():
-    """Get Hugging Face token from environment variables or HF CLI cache"""
-    # First try environment variables (priority for HF Spaces)
-    token = (
-        os.getenv("HF_TOKEN_LC2") or
-        os.getenv("HF_TOKEN_LC") or
-        os.getenv("HF_TOKEN") or
-        os.getenv("HUGGING_FACE_HUB_TOKEN")
-    )
-    if token:
-        # Determine source
-        if os.getenv("HF_TOKEN_LC2"):
-            source = "HF_TOKEN_LC2 (env)"
-        elif os.getenv("HF_TOKEN_LC"):
-            source = "HF_TOKEN_LC (env)"
-        elif os.getenv("HF_TOKEN"):
-            source = "HF_TOKEN (env)"
-        else:
-            source = "HUGGING_FACE_HUB_TOKEN (env)"
-        return token, source
-    # Fall back to HF CLI cached token (if available)
-    try:
-        cached_token = get_token()
-        if cached_token:
-            return cached_token, "HF CLI cache"
-    except Exception:
-        pass
-    return None, None
-def test_token_detection():
-    """Test 1: Check if token is found in environment"""
-    print("\n" + "="*70)
-    print("TEST 1: Token Detection")
-    print("="*70)
-    token, source = get_hf_token()
-    if token:
-        print(f"✅ Token found: {source}")
-        print(f"   Token length: {len(token)} characters")
-        print(f"   Token preview: {token[:10]}...{token[-4:]}")
-        return True, token, source
-    else:
-        print("❌ No token found in environment!")
-        print("\n   Checked environment variables:")
-        print("   - HF_TOKEN_LC2 (recommended for DragonLLM)")
-        print("   - HF_TOKEN_LC")
-        print("   - HF_TOKEN")
-        print("   - HUGGING_FACE_HUB_TOKEN")
-        print("\n   To set a token:")
-        print("   export HF_TOKEN_LC2='your_token_here'")
-        print("   Or use: huggingface-cli login")
-        return False, None, None
-def test_authentication(token):
-    """Test 2: Authenticate with Hugging Face Hub"""
-    print("\n" + "="*70)
-    print("TEST 2: Hugging Face Hub Authentication")
-    print("="*70)
-    try:
-        # Login with token
-        login(token=token, add_to_git_credential=False)
-        print("✅ Successfully authenticated with Hugging Face Hub")
-        # Get user info
-        try:
-            user_info = whoami()
-            print(f"✅ Logged in as: {user_info.get('name', 'Unknown')}")
-            if 'type' in user_info:
-                print(f"   Account type: {user_info['type']}")
-            return True
-        except Exception as e:
-            print(f"⚠️  Authenticated but couldn't get user info: {e}")
-            return True  # Still authenticated even if we can't get user info
-    except Exception as e:
-        print(f"❌ Authentication failed: {e}")
-        print("\n   Possible causes:")
-        print("   1. Invalid token")
-        print("   2. Token expired")
-        print("   3. Network connectivity issues")
-        return False
-def test_model_access(model_name):
-    """Test 3: Check if we can access the model repository"""
-    print("\n" + "="*70)
-    print("TEST 3: Model Repository Access")
-    print("="*70)
-    print(f"Model: {model_name}")
-    try:
-        # Try to get model info
-        print(f"   Attempting to access model repository...")
-        info = model_info(model_name, token=True)
-        print(f"✅ Successfully accessed model repository!")
-        print(f"   Model ID: {info.id}")
-        print(f"   Model tags: {', '.join(info.tags) if info.tags else 'None'}")
-        # Check if model is gated
-        if hasattr(info, 'gated') and info.gated:
-            print(f"   ⚠️  Model is GATED - requires accepting terms")
-        # Check available files
-        if hasattr(info, 'siblings'):
-            file_count = len(info.siblings) if info.siblings else 0
-            print(f"   Files in repository: {file_count}")
-            if file_count > 0 and info.siblings:
-                print(f"   Sample files:")
-                for sibling in info.siblings[:5]:
-                    print(f"     - {sibling.rfilename} ({sibling.size / (1024**2):.1f} MB)")
-                if file_count > 5:
-                    print(f"     ... and {file_count - 5} more files")
-        return True
-    except HfHubHTTPError as e:
-        if e.response.status_code == 401:
-            print(f"❌ Unauthorized (401): Token doesn't have access to this model")
-            print("\n   Possible causes:")
-            print("   1. You haven't accepted the model's terms of use")
-            print(f"   2. Visit: https://huggingface.co/{model_name}")
-            print("   3. Click 'Agree and access repository'")
-            print("   4. Token doesn't have proper permissions")
-            return False
-        elif e.response.status_code == 403:
-            print(f"❌ Forbidden (403): Access denied to this model")
-            print("\n   This model may be private or require special access")
-            return False
-        elif e.response.status_code == 404:
-            print(f"❌ Not Found (404): Model doesn't exist")
-            return False
-        else:
-            print(f"❌ HTTP Error {e.response.status_code}: {e}")
-            return False
-    except Exception as e:
-        print(f"❌ Error accessing model: {e}")
-        print(f"   Error type: {type(e).__name__}")
-        return False
-def test_model_files(model_name):
-    """Test 4: Check if we can list model files"""
-    print("\n" + "="*70)
-    print("TEST 4: Model Files Access")
-    print("="*70)
-    try:
-        api = HfApi()
-        files = api.list_repo_files(
-            repo_id=model_name,
-            repo_type="model",
-            token=True
-        )
-        if files:
-            print(f"✅ Found {len(files)} files in model repository")
-            print(f"   Key files:")
-            # Show important files
-            important_files = [
-                f for f in files if any(
-                    ext in f.lower()
-                    for ext in ['.safetensors', '.bin', 'config.json', 'tokenizer', 'model']
-                )
-            ]
-            for file in important_files[:10]:
-                print(f"     - {file}")
-            if len(files) > 10:
-                print(f"     ... and {len(files) - 10} more files")
-            return True
-        else:
-            print("⚠️  No files found in repository")
-            return False
-    except Exception as e:
-        print(f"❌ Error listing files: {e}")
-        return False
-def test_token_permissions(token):
-    """Test 5: Check token permissions"""
-    print("\n" + "="*70)
-    print("TEST 5: Token Permissions")
-    print("="*70)
-    try:
-        api = HfApi()
-        user_info = api.whoami(token=token)
-        print(f"✅ Token has valid permissions")
-        print(f"   User: {user_info.get('name', 'Unknown')}")
-        print(f"   Type: {user_info.get('type', 'Unknown')}")
-        # Check if user has read access
-        if 'canRead' in user_info:
-            print(f"   Can read repositories: {user_info['canRead']}")
-        return True
-    except Exception as e:
-        print(f"❌ Error checking permissions: {e}")
-        return False
-def main():
-    """Run all tests"""
-    print("\n" + "#"*70)
-    print("# DragonLLM Model Access Test")
-    print("#"*70)
-    print(f"Testing access to: {MODEL_NAME}")
-    results = {}
-    # Test 1: Token detection
-    success, token, source = test_token_detection()
-    results['token_detection'] = success
-    if not success:
-        print("\n" + "="*70)
-        print("❌ Cannot proceed without a token")
-        print("="*70)
-        return
-    # Test 2: Authentication
-    results['authentication'] = test_authentication(token)
-    if not results['authentication']:
-        print("\n" + "="*70)
-        print("❌ Authentication failed - cannot proceed")
-        print("="*70)
-        return
-    # Test 3: Model access
-    results['model_access'] = test_model_access(MODEL_NAME)
-    # Test 4: Model files (only if model access succeeded)
-    if results['model_access']:
-        results['model_files'] = test_model_files(MODEL_NAME)
-    else:
-        results['model_files'] = False
-    # Test 5: Token permissions
-    results['token_permissions'] = test_token_permissions(token)
-    # Summary
-    print("\n" + "="*70)
-    print("SUMMARY")
-    print("="*70)
-    for test_name, success in results.items():
-        status = "✅ PASS" if success else "❌ FAIL"
-        test_display = test_name.replace('_', ' ').title()
-        print(f"{status} - {test_display}")
-    passed = sum(1 for v in results.values() if v)
-    total = len(results)
-    print(f"\nResults: {passed}/{total} tests passed")
-    if passed == total:
-        print("\n🎉 All tests passed! You have full access to the DragonLLM model.")
-        print("   The model can be loaded in your application.")
-    elif results.get('token_detection') and results.get('authentication'):
-        print("\n⚠️  Authentication works but model access failed.")
-        print("   This usually means:")
-        print("   1. You need to accept the model's terms of use")
-        print(f"   2. Visit: https://huggingface.co/{MODEL_NAME}")
-        print("   3. Click 'Agree and access repository'")
-    else:
-        print("\n❌ Some tests failed. Check the errors above for details.")
-if __name__ == "__main__":
-    main()

scripts/validate_hf_readme.py ADDED Viewed

	@@ -0,0 +1,159 @@

+#!/usr/bin/env python3
+"""
+Validate README.md for Hugging Face Space compatibility.
+This script checks that the README.md file has:
+- Valid YAML frontmatter
+- Required fields for HF Spaces (sdk, app_port for docker)
+- Correct format and values
+"""
+import sys
+import re
+from pathlib import Path
+from typing import Dict, List, Tuple
+# Required fields for Docker SDK
+REQUIRED_DOCKER_FIELDS = {
+    "sdk": ["docker"],
+    "app_port": lambda x: isinstance(x, int) and 1 <= x <= 65535,
+}
+# Optional but recommended fields
+RECOMMENDED_FIELDS = ["title", "emoji", "colorFrom", "colorTo"]
+# Valid color values
+VALID_COLORS = {"red", "yellow", "green", "blue", "indigo", "purple", "pink", "gray"}
+# Valid SDK values
+VALID_SDKS = {"gradio", "docker", "static"}
+# Valid hardware flavors (from HF docs)
+VALID_HARDWARE = {
+    "cpu-basic", "cpu-upgrade",
+    "t4-small", "t4-medium", "l4x1", "l4x4",
+    "a10g-small", "a10g-large", "a10g-largex2", "a10g-largex4", "a100-large",
+    "v5e-1x1", "v5e-2x2", "v5e-2x4"
+}
+def extract_yaml_frontmatter(content: str) -> Tuple[Dict, int, int]:
+    """Extract YAML frontmatter from README.md content."""
+    # Check for YAML frontmatter pattern
+    match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
+    if not match:
+        return {}, -1, -1
+    yaml_content = match.group(1)
+    start_pos = 0
+    end_pos = match.end()
+    # Simple YAML parsing (basic key: value pairs)
+    yaml_dict = {}
+    for line in yaml_content.split('\n'):
+        line = line.strip()
+        if not line or line.startswith('#'):
+            continue
+        if ':' in line:
+            key, value = line.split(':', 1)
+            key = key.strip()
+            value = value.strip().strip('"\'')
+            # Convert boolean strings
+            if value.lower() == 'true':
+                value = True
+            elif value.lower() == 'false':
+                value = False
+            # Convert integers
+            elif value.isdigit():
+                value = int(value)
+            yaml_dict[key] = value
+    return yaml_dict, start_pos, end_pos
+def validate_readme(readme_path: Path) -> List[str]:
+    """Validate README.md file and return list of errors."""
+    errors = []
+    if not readme_path.exists():
+        return [f"README.md not found at {readme_path}"]
+    content = readme_path.read_text(encoding='utf-8')
+    # Extract YAML frontmatter
+    yaml_data, start, end = extract_yaml_frontmatter(content)
+    if start == -1:
+        errors.append("README.md must start with YAML frontmatter (--- ... ---)")
+        return errors
+    # Check SDK
+    sdk = yaml_data.get("sdk")
+    if not sdk:
+        errors.append("Missing required field: 'sdk'")
+    elif sdk not in VALID_SDKS:
+        errors.append(f"Invalid 'sdk' value: {sdk}. Must be one of: {', '.join(VALID_SDKS)}")
+    # For Docker SDK, check app_port
+    if sdk == "docker":
+        app_port = yaml_data.get("app_port")
+        if app_port is None:
+            errors.append("Missing required field for Docker SDK: 'app_port'")
+        elif not isinstance(app_port, int) or not (1 <= app_port <= 65535):
+            errors.append(f"Invalid 'app_port' value: {app_port}. Must be an integer between 1 and 65535")
+    # Check colors if present
+    color_from = yaml_data.get("colorFrom")
+    color_to = yaml_data.get("colorTo")
+    if color_from and color_from not in VALID_COLORS:
+        errors.append(f"Invalid 'colorFrom' value: {color_from}. Must be one of: {', '.join(VALID_COLORS)}")
+    if color_to and color_to not in VALID_COLORS:
+        errors.append(f"Invalid 'colorTo' value: {color_to}. Must be one of: {', '.join(VALID_COLORS)}")
+    # Check suggested_hardware if present
+    hardware = yaml_data.get("suggested_hardware")
+    if hardware and hardware not in VALID_HARDWARE:
+        errors.append(f"Invalid 'suggested_hardware' value: {hardware}. Must be one of: {', '.join(sorted(VALID_HARDWARE))}")
+    # Warn about deprecated 'hardware' field
+    if "hardware" in yaml_data:
+        errors.append("Deprecated field 'hardware' found. Use 'suggested_hardware' instead (per HF Spaces docs)")
+    # Check for emoji (recommended)
+    if "emoji" not in yaml_data:
+        errors.append("Warning: 'emoji' field is recommended for better Space appearance")
+    # Check for title (recommended)
+    if "title" not in yaml_data:
+        errors.append("Warning: 'title' field is recommended")
+    # Check that pinned is boolean if present
+    if "pinned" in yaml_data and not isinstance(yaml_data["pinned"], bool):
+        errors.append(f"Invalid 'pinned' value: {yaml_data['pinned']}. Must be boolean (true/false)")
+    return errors
+def main():
+    """Main entry point."""
+    repo_root = Path(__file__).parent.parent
+    readme_path = repo_root / "README.md"
+    errors = validate_readme(readme_path)
+    if errors:
+        print("❌ README.md validation failed:", file=sys.stderr)
+        for error in errors:
+            print(f"  - {error}", file=sys.stderr)
+        sys.exit(1)
+    else:
+        print("✅ README.md is valid for Hugging Face Spaces")
+        sys.exit(0)
+if __name__ == "__main__":
+    main()

test_service.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Quick test script to verify the PRIIPs LLM Service is working
 Run with: python test_service.py
 """
 import httpx
@@ -59,7 +59,7 @@ def test_endpoint(name, method, url, json_data=None, timeout=10):
 def main():
     print(f"\n{'#'*60}")
-    print("PRIIPs LLM Service - Quick Test Script")
     print(f"Service: {BASE_URL}")
     print(f"{'#'*60}")
@@ -94,7 +94,7 @@ def main():
     print("    Please wait...")
     chat_payload = {
-        "model": "DragonLLM/gemma3-12b-fin-v0.3",
         "messages": [
             {"role": "user", "content": "What is 2+2?"}
         ],

 #!/usr/bin/env python3
 """
+Quick test script to verify the LLM Pro Finance API is working
 Run with: python test_service.py
 """
 import httpx
 def main():
     print(f"\n{'#'*60}")
+    print("LLM Pro Finance API - Quick Test Script")
     print(f"Service: {BASE_URL}")
     print(f"{'#'*60}")
     print("    Please wait...")
     chat_payload = {
+        "model": "DragonLLM/qwen3-8b-fin-v1.0",
         "messages": [
             {"role": "user", "content": "What is 2+2?"}
         ],

tests/performance/README.md DELETED Viewed

@@ -1,277 +0,0 @@
-# Performance Test Suite
-Comprehensive performance and compatibility tests for the PRIIPs LLM Service.
-## Quick Start
-```bash
-# Install additional test dependencies
-pip install pytest pytest-asyncio openai
-# Run all performance tests
-pytest tests/performance/ -v -s
-# Run specific test suites
-pytest tests/performance/test_inference_speed.py -v -s
-pytest tests/performance/test_openai_compatibility.py -v -s
-# Run comprehensive benchmark
-python tests/performance/benchmark.py
-```
-## Test Suites
-### 1. Inference Speed Tests (`test_inference_speed.py`)
-Tests various performance metrics:
-- **Single Request Latency**: Measures end-to-end latency for individual requests
-- **Token Throughput**: Measures tokens generated per second at different lengths
-- **Concurrent Requests**: Tests performance under concurrent load
-- **Time to First Token (TTFT)**: Measures latency to first generated token
-- **Prompt Processing Speed**: Tests how quickly different prompt lengths are processed
-- **Temperature Variance**: Tests response generation with different temperatures
-#### Key Metrics:
-- Latency (seconds)
-- Tokens per second
-- Concurrent request handling
-- TTFT (Time to First Token)
-### 2. OpenAI Compatibility Tests (`test_openai_compatibility.py`)
-Validates OpenAI API compatibility:
-**Endpoint Compatibility:**
-- `GET /v1/models` - Model listing
-- `POST /v1/chat/completions` - Chat completions
-**Message Format Tests:**
-- System messages
-- Conversation history
-- Multi-turn conversations
-**Parameter Tests:**
-- `temperature`
-- `max_tokens`
-- `top_p`
-- `stream`
-**Client Library Tests:**
-- Official OpenAI Python client compatibility
-- Streaming support
-**Error Handling:**
-- Invalid models
-- Missing required fields
-- Empty messages
-**Response Schema:**
-- Full OpenAI response format validation
-- Proper usage statistics
-- Correct finish reasons
-### 3. Comprehensive Benchmark (`benchmark.py`)
-All-in-one benchmark script that:
-- Runs all performance tests
-- Validates OpenAI compatibility
-- Generates detailed report
-- Saves results to JSON
-## Configuration
-### Change Target URL
-Edit the `BASE_URL` in each test file:
-```python
-# For production
-BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
-# For local testing
-BASE_URL = "http://localhost:7860"
-```
-### Adjust Test Parameters
-Modify test parameters in each test:
-```python
-# Number of concurrent requests
-num_concurrent = 10
-# Number of test runs
-num_runs = 10
-# Max tokens for generation
-max_tokens = 100
-```
-## Expected Results
-### Good Performance Metrics (on L40 GPU):
-- **Latency**: < 2 seconds for 100 tokens
-- **Token Throughput**: > 50 tokens/second
-- **TTFT**: < 500ms
-- **Concurrent Handling**: > 5 requests/second
-### OpenAI Compatibility:
-Should pass all compatibility tests (100% score)
-## Test Output Examples
-### Inference Speed Test Output:
-```
-=== Single Request Performance ===
-Latency: 1.45s
-Prompt tokens: 12
-Completion tokens: 89
-Total tokens: 101
-Tokens per second: 61.38
-Response: Artificial intelligence (AI) refers to...
-```
-### Concurrent Load Test Output:
-```
-=== Concurrent Requests Test (10 requests) ===
-Total time: 3.21s
-Successful requests: 10/10
-Average latency: 2.15s
-Requests per second: 3.12
-```
-### OpenAI Compatibility Output:
-```
-=== OpenAI API Compatibility ===
-✓ List models endpoint
-✓ Chat completions endpoint
-✓ System message support
-✓ Conversation history
-✓ Temperature parameter
-✓ Max tokens parameter
-Compatibility Score: 6/7 (86%)
-```
-## Troubleshooting
-### Tests Timeout
-- Increase timeout in `httpx.AsyncClient(timeout=120.0)`
-- Check if service is running with health check
-### Connection Errors
-- Verify BASE_URL is correct
-- Check network connectivity
-- Ensure service is deployed and running
-### Performance Lower Than Expected
-- Check GPU utilization on server
-- Verify vLLM configuration
-- Look for model loading issues in logs
-## Integration with CI/CD
-Add to your CI pipeline:
-```yaml
-# .github/workflows/performance.yml
-name: Performance Tests
-on: [push, pull_request]
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v2
-      - name: Set up Python
-        uses: actions/setup-python@v2
-        with:
-          python-version: 3.11
-      - name: Install dependencies
-        run: |
-          pip install -r requirements.txt
-          pip install pytest pytest-asyncio openai
-      - name: Run performance tests
-        run: pytest tests/performance/ -v
-```
-## Benchmark Results
-Results are saved to `benchmark_results.json` with structure:
-```json
-{
-  "single_request": {
-    "avg_latency": 1.45,
-    "avg_tokens_per_sec": 61.38
-  },
-  "concurrent_load": {
-    "requests_per_sec": 3.12,
-    "successful": 10
-  },
-  "openai_compatibility": {
-    "score": "6/7"
-  }
-}
-```
-## Advanced Usage
-### Custom Test Scenarios
-Create custom test scenarios:
-```python
-@pytest.mark.asyncio
-async def test_custom_scenario(client):
-    # Your custom test here
-    payload = {
-        "model": "DragonLLM/LLM-Pro-Finance-Small",
-        "messages": [{"role": "user", "content": "Custom prompt"}],
-        "max_tokens": 200
-    }
-    response = await client.post(f"{BASE_URL}/v1/chat/completions", json=payload)
-    assert response.status_code == 200
-```
-### Stress Testing
-For stress testing, increase concurrent requests:
-```python
-await benchmark_concurrent_load(num_concurrent=50)
-```
-## Monitoring
-Metrics to monitor during tests:
-- **Server-side**:
-  - GPU utilization
-  - Memory usage
-  - Request queue length
-  - Model loading time
-- **Client-side**:
-  - Response times
-  - Error rates
-  - Token throughput
-  - Network latency
-## Support
-For issues or questions:
-- Check service logs at Hugging Face Spaces dashboard
-- Review DEPLOYMENT.md for configuration details
-- Verify vLLM is properly initialized with model

tests/performance/benchmark.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Comprehensive benchmark suite for PRIIPs LLM Service
 Run with: python tests/performance/benchmark.py
 """
 import asyncio
@@ -39,7 +39,7 @@ class Benchmark:
         tokens_per_sec = []
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [
                 {"role": "user", "content": "What is artificial intelligence?"}
             ],
@@ -91,7 +91,7 @@ class Benchmark:
         async def make_request(request_id: int):
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [
                     {"role": "user", "content": f"Request {request_id}: Explain machine learning."}
                 ],
@@ -155,7 +155,7 @@ class Benchmark:
         for test_case in test_cases:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [
                     {"role": "user", "content": "Write about the history of computing."}
                 ],
@@ -231,7 +231,7 @@ class Benchmark:
         # Test 3: System message
         try:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [
                     {"role": "system", "content": "Be helpful."},
                     {"role": "user", "content": "Hi"}
@@ -247,7 +247,7 @@ class Benchmark:
         # Test 4: Conversation history
         try:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [
                     {"role": "user", "content": "My name is Alice"},
                     {"role": "assistant", "content": "Hello Alice"},
@@ -264,7 +264,7 @@ class Benchmark:
         # Test 5: Temperature parameter
         try:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [{"role": "user", "content": "Hi"}],
                 "temperature": 0.5
             }
@@ -278,7 +278,7 @@ class Benchmark:
         # Test 6: Max tokens parameter
         try:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [{"role": "user", "content": "Hi"}],
                 "max_tokens": 10
             }
@@ -299,7 +299,7 @@ class Benchmark:
     async def run_all_benchmarks(self):
         """Run all benchmarks"""
         print(f"\n{'#'*60}")
-        print("PRIIPs LLM Service - Comprehensive Benchmark Suite")
         print(f"Service: {self.base_url}")
         print(f"{'#'*60}")

 #!/usr/bin/env python3
 """
+Comprehensive benchmark suite for LLM Pro Finance API
 Run with: python tests/performance/benchmark.py
 """
 import asyncio
         tokens_per_sec = []
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [
                 {"role": "user", "content": "What is artificial intelligence?"}
             ],
         async def make_request(request_id: int):
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [
                     {"role": "user", "content": f"Request {request_id}: Explain machine learning."}
                 ],
         for test_case in test_cases:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [
                     {"role": "user", "content": "Write about the history of computing."}
                 ],
         # Test 3: System message
         try:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [
                     {"role": "system", "content": "Be helpful."},
                     {"role": "user", "content": "Hi"}
         # Test 4: Conversation history
         try:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [
                     {"role": "user", "content": "My name is Alice"},
                     {"role": "assistant", "content": "Hello Alice"},
         # Test 5: Temperature parameter
         try:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [{"role": "user", "content": "Hi"}],
                 "temperature": 0.5
             }
         # Test 6: Max tokens parameter
         try:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [{"role": "user", "content": "Hi"}],
                 "max_tokens": 10
             }
     async def run_all_benchmarks(self):
         """Run all benchmarks"""
         print(f"\n{'#'*60}")
+        print("LLM Pro Finance API - Comprehensive Benchmark Suite")
         print(f"Service: {self.base_url}")
         print(f"{'#'*60}")

tests/performance/test_inference_speed.py CHANGED Viewed

@@ -20,7 +20,7 @@ def client():
 async def test_single_request_latency(client):
     """Test latency for a single chat completion request"""
     payload = {
-        "model": "DragonLLM/LLM-Pro-Finance-Small",
         "messages": [
             {"role": "user", "content": "What is the capital of France?"}
         ],
@@ -66,7 +66,7 @@ async def test_token_throughput_various_lengths(client):
     for test_case in test_cases:
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [{"role": "user", "content": test_case["prompt"]}],
             "max_tokens": test_case["max_tokens"],
             "temperature": 0.7
@@ -98,7 +98,7 @@ async def test_concurrent_requests(client):
     async def make_request(request_id: int):
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [
                 {"role": "user", "content": f"Request {request_id}: What is 2+2?"}
             ],
@@ -142,7 +142,7 @@ async def test_concurrent_requests(client):
 async def test_time_to_first_token(client):
     """Test time to first token (TTFT) using streaming"""
     payload = {
-        "model": "DragonLLM/LLM-Pro-Finance-Small",
         "messages": [
             {"role": "user", "content": "Count from 1 to 10."}
         ],
@@ -190,7 +190,7 @@ async def test_prompt_processing_speed(client):
     for i, prompt in enumerate(prompts):
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [{"role": "user", "content": prompt}],
             "max_tokens": 50,
             "temperature": 0.7
@@ -221,7 +221,7 @@ async def test_temperature_variance(client):
     for temp in temperatures:
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [{"role": "user", "content": prompt}],
             "max_tokens": 50,
             "temperature": temp

 async def test_single_request_latency(client):
     """Test latency for a single chat completion request"""
     payload = {
+        "model": "DragonLLM/qwen3-8b-fin-v1.0",
         "messages": [
             {"role": "user", "content": "What is the capital of France?"}
         ],
     for test_case in test_cases:
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [{"role": "user", "content": test_case["prompt"]}],
             "max_tokens": test_case["max_tokens"],
             "temperature": 0.7
     async def make_request(request_id: int):
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [
                 {"role": "user", "content": f"Request {request_id}: What is 2+2?"}
             ],
 async def test_time_to_first_token(client):
     """Test time to first token (TTFT) using streaming"""
     payload = {
+        "model": "DragonLLM/qwen3-8b-fin-v1.0",
         "messages": [
             {"role": "user", "content": "Count from 1 to 10."}
         ],
     for i, prompt in enumerate(prompts):
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [{"role": "user", "content": prompt}],
             "max_tokens": 50,
             "temperature": 0.7
     for temp in temperatures:
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [{"role": "user", "content": prompt}],
             "max_tokens": 50,
             "temperature": temp

tests/performance/test_openai_compatibility.py CHANGED Viewed

@@ -58,7 +58,7 @@ class TestEndpointCompatibility:
     async def test_chat_completions_endpoint(self, httpx_client):
         """Test POST /v1/chat/completions endpoint"""
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [
                 {"role": "user", "content": "Say hello"}
             ]
@@ -109,7 +109,7 @@ class TestOpenAIClientLibrary:
         """Test chat completion using official OpenAI client"""
         try:
             response = openai_client.chat.completions.create(
-                model="DragonLLM/LLM-Pro-Finance-Small",
                 messages=[
                     {"role": "user", "content": "What is 2+2?"}
                 ],
@@ -133,7 +133,7 @@ class TestOpenAIClientLibrary:
         """Test streaming with official OpenAI client"""
         try:
             stream = openai_client.chat.completions.create(
-                model="DragonLLM/LLM-Pro-Finance-Small",
                 messages=[
                     {"role": "user", "content": "Count to 5"}
                 ],
@@ -162,7 +162,7 @@ class TestMessageFormats:
     async def test_system_message(self, httpx_client):
         """Test with system message"""
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [
                 {"role": "system", "content": "You are a helpful assistant."},
                 {"role": "user", "content": "Hello"}
@@ -185,7 +185,7 @@ class TestMessageFormats:
     async def test_conversation_history(self, httpx_client):
         """Test with conversation history"""
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [
                 {"role": "user", "content": "My name is Alice."},
                 {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
@@ -220,7 +220,7 @@ class TestMessageFormats:
         for params in parameters:
             payload = {
-                "model": "DragonLLM/LLM-Pro-Finance-Small",
                 "messages": [{"role": "user", "content": "Hello"}],
                 **params
             }
@@ -276,7 +276,7 @@ class TestErrorHandling:
     async def test_empty_message(self, httpx_client):
         """Test with empty message content"""
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [{"role": "user", "content": ""}],
             "max_tokens": 50
         }
@@ -297,7 +297,7 @@ class TestResponseFormat:
     async def test_response_schema(self, httpx_client):
         """Validate complete response schema"""
         payload = {
-            "model": "DragonLLM/LLM-Pro-Finance-Small",
             "messages": [{"role": "user", "content": "Test"}],
             "max_tokens": 50
         }

     async def test_chat_completions_endpoint(self, httpx_client):
         """Test POST /v1/chat/completions endpoint"""
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [
                 {"role": "user", "content": "Say hello"}
             ]
         """Test chat completion using official OpenAI client"""
         try:
             response = openai_client.chat.completions.create(
+                model="DragonLLM/qwen3-8b-fin-v1.0",
                 messages=[
                     {"role": "user", "content": "What is 2+2?"}
                 ],
         """Test streaming with official OpenAI client"""
         try:
             stream = openai_client.chat.completions.create(
+                model="DragonLLM/qwen3-8b-fin-v1.0",
                 messages=[
                     {"role": "user", "content": "Count to 5"}
                 ],
     async def test_system_message(self, httpx_client):
         """Test with system message"""
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [
                 {"role": "system", "content": "You are a helpful assistant."},
                 {"role": "user", "content": "Hello"}
     async def test_conversation_history(self, httpx_client):
         """Test with conversation history"""
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [
                 {"role": "user", "content": "My name is Alice."},
                 {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
         for params in parameters:
             payload = {
+                "model": "DragonLLM/qwen3-8b-fin-v1.0",
                 "messages": [{"role": "user", "content": "Hello"}],
                 **params
             }
     async def test_empty_message(self, httpx_client):
         """Test with empty message content"""
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [{"role": "user", "content": ""}],
             "max_tokens": 50
         }
     async def test_response_schema(self, httpx_client):
         """Validate complete response schema"""
         payload = {
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
             "messages": [{"role": "user", "content": "Test"}],
             "max_tokens": 50
         }

tests/test_config.py CHANGED Viewed

@@ -10,7 +10,7 @@ def test_settings_defaults():
     """Test that settings have correct default values."""
     settings = Settings()
     assert settings.vllm_base_url == "http://localhost:8000/v1"
-    assert settings.model == "DragonLLM/LLM-Pro-Finance-Small"
     assert settings.service_api_key is None
     assert settings.log_level == "info"

     """Test that settings have correct default values."""
     settings = Settings()
     assert settings.vllm_base_url == "http://localhost:8000/v1"
+    assert settings.model == "DragonLLM/qwen3-8b-fin-v1.0"
     assert settings.service_api_key is None
     assert settings.log_level == "info"

tests/test_extract_route.py DELETED Viewed

@@ -1,50 +0,0 @@
-from fastapi.testclient import TestClient
-from app.main import app
-client = TestClient(app)
-def test_extract_priips(monkeypatch, tmp_path):
-    # Fake PDF extraction
-    from app.services import extract_service
-    def fake_extract_text_from_pdf(path):
-        return "Product: Test Fund ISIN: TEST1234567 SRI: 3"
-    monkeypatch.setattr(extract_service, "extract_text_from_pdf", fake_extract_text_from_pdf)
-    # Fake vLLM chat returning JSON
-    from app.providers import vllm
-    async def fake_chat(payload, stream=False):
-        return {
-            "id": "cmpl-2",
-            "object": "chat.completion",
-            "created": 0,
-            "model": payload["model"],
-            "choices": [
-                {
-                    "index": 0,
-                    "message": {
-                        "role": "assistant",
-                        "content": "{\"product_name\":\"Test Fund\",\"isin\":\"TEST1234567\",\"sri\":3}",
-                    },
-                    "finish_reason": "stop",
-                }
-            ],
-        }
-    monkeypatch.setattr(vllm, "chat", fake_chat)
-    r = client.post(
-        "/extract-priips",
-        json={"sources": ["/path/to/local.pdf"]},
-    )
-    assert r.status_code == 200
-    j = r.json()
-    assert j[0]["success"] is True
-    assert j[0]["data"]["isin"] == "TEST1234567"

tests/test_extract_service.py DELETED Viewed

@@ -1,125 +0,0 @@
-import pytest
-from unittest.mock import AsyncMock, patch
-from app.services.extract_service import build_prompt, process_source, extract
-from app.models.priips import ExtractRequest, ExtractResult, PriipsFields
-def test_build_prompt():
-    """Test prompt building with schema instructions."""
-    text = "Test document content"
-    prompt = build_prompt(text)
-    assert "expert financial document parser" in prompt
-    assert "STRICT JSON only" in prompt
-    assert "product_name" in prompt
-    assert "manufacturer" in prompt
-    assert "isin" in prompt
-    assert "sri" in prompt
-    assert "Test document content" in prompt
-def test_build_prompt_long_text():
-    """Test prompt building with very long text (should be truncated)."""
-    long_text = "x" * 20000
-    prompt = build_prompt(long_text)
-    # Should be truncated to 15000 chars
-    assert len(prompt) < 20000
-    assert "Document:\n" in prompt
-@pytest.mark.asyncio
-async def test_process_source_local_file():
-    """Test processing a local PDF file."""
-    with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
-         patch('app.services.extract_service.vllm.chat') as mock_chat, \
-         patch('app.services.extract_service.settings') as mock_settings:
-        mock_extract.return_value = "Product: Test Fund ISIN: TEST1234567"
-        mock_settings.model = "test-model"
-        mock_chat.return_value = {
-            "choices": [{"message": {"content": '{"product_name": "Test Fund", "isin": "TEST1234567"}'}}]
-        }
-        result = await process_source("/path/to/local.pdf")
-        assert isinstance(result, ExtractResult)
-        assert result.success is True
-        assert result.source == "/path/to/local.pdf"
-        assert result.data.product_name == "Test Fund"
-        assert result.data.isin == "TEST1234567"
-        assert result.data.source_url == "/path/to/local.pdf"
-@pytest.mark.asyncio
-async def test_process_source_url():
-    """Test processing a PDF URL."""
-    with patch('app.services.extract_service.download_to_tmp') as mock_download, \
-         patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
-         patch('app.services.extract_service.vllm.chat') as mock_chat, \
-         patch('app.services.extract_service.settings') as mock_settings:
-        mock_download.return_value = "/tmp/downloaded.pdf"
-        mock_extract.return_value = "Product: Test Fund"
-        mock_settings.model = "test-model"
-        mock_chat.return_value = {
-            "choices": [{"message": {"content": '{"product_name": "Test Fund"}'}}]
-        }
-        result = await process_source("https://example.com/doc.pdf")
-        assert isinstance(result, ExtractResult)
-        assert result.success is True
-        assert result.source == "https://example.com/doc.pdf"
-        assert result.data.source_url == "https://example.com/doc.pdf"
-@pytest.mark.asyncio
-async def test_process_source_invalid_json():
-    """Test processing with invalid JSON response."""
-    with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract, \
-         patch('app.services.extract_service.vllm.chat') as mock_chat, \
-         patch('app.services.extract_service.settings') as mock_settings:
-        mock_extract.return_value = "Test content"
-        mock_settings.model = "test-model"
-        mock_chat.return_value = {
-            "choices": [{"message": {"content": "invalid json response"}}]
-        }
-        result = await process_source("/path/to/file.pdf")
-        assert isinstance(result, ExtractResult)
-        assert result.success is False
-        assert result.error is not None
-@pytest.mark.asyncio
-async def test_process_source_exception():
-    """Test processing with exception during PDF extraction."""
-    with patch('app.services.extract_service.extract_text_from_pdf') as mock_extract:
-        mock_extract.side_effect = Exception("PDF read error")
-        result = await process_source("/path/to/file.pdf")
-        assert isinstance(result, ExtractResult)
-        assert result.success is False
-        assert "PDF read error" in result.error
-@pytest.mark.asyncio
-async def test_extract_multiple_sources():
-    """Test extracting from multiple sources."""
-    with patch('app.services.extract_service.process_source') as mock_process:
-        mock_process.side_effect = [
-            ExtractResult(source="file1.pdf", success=True, data=PriipsFields(product_name="Fund 1")),
-            ExtractResult(source="file2.pdf", success=False, error="Failed to read")
-        ]
-        request = ExtractRequest(sources=["file1.pdf", "file2.pdf"])
-        results = await extract(request)
-        assert len(results) == 2
-        assert results[0].success is True
-        assert results[1].success is False

tests/test_json_guard.py DELETED Viewed

@@ -1,56 +0,0 @@
-import pytest
-from unittest.mock import patch
-from app.utils.json_guard import try_parse_json
-def test_try_parse_json_valid():
-    """Test parsing valid JSON."""
-    valid_json = '{"name": "test", "value": 123}'
-    success, result = try_parse_json(valid_json)
-    assert success is True
-    assert result == {"name": "test", "value": 123}
-def test_try_parse_json_invalid():
-    """Test parsing invalid JSON."""
-    invalid_json = '{"name": "test", "value": 123'  # Missing closing brace
-    success, result = try_parse_json(invalid_json)
-    assert success is False
-    assert isinstance(result, str)  # Error message
-def test_try_parse_json_with_markdown_fences():
-    """Test parsing JSON wrapped in markdown code fences."""
-    json_with_fences = '```\n{"name": "test"}\n```'
-    success, result = try_parse_json(json_with_fences)
-    assert success is True
-    assert result == {"name": "test"}
-def test_try_parse_json_with_markdown_fences_invalid():
-    """Test parsing invalid JSON with markdown fences."""
-    invalid_json_with_fences = '```json\n{"name": "test"\n```'  # Missing closing brace
-    success, result = try_parse_json(invalid_json_with_fences)
-    assert success is False
-    assert isinstance(result, str)
-def test_try_parse_json_empty_string():
-    """Test parsing empty string."""
-    success, result = try_parse_json("")
-    assert success is False
-    assert isinstance(result, str)
-def test_try_parse_json_none():
-    """Test parsing None input."""
-    success, result = try_parse_json(None)
-    assert success is False
-    assert isinstance(result, str)

tests/test_pdf_utils.py DELETED Viewed

@@ -1,105 +0,0 @@
-import pytest
-from unittest.mock import patch, AsyncMock
-from pathlib import Path
-from app.utils.pdf import download_to_tmp, extract_text_from_pdf
-@pytest.mark.asyncio
-async def test_download_to_tmp_success():
-    """Test successful PDF download."""
-    url = "https://example.com/document.pdf"
-    tmp_dir = Path("/tmp")
-    mock_content = b"PDF content here"
-    with patch('httpx.AsyncClient') as mock_client:
-        mock_response = AsyncMock()
-        mock_response.content = mock_content
-        mock_response.raise_for_status.return_value = None
-        mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
-        result = await download_to_tmp(url, tmp_dir)
-        assert isinstance(result, Path)
-        assert result.name == "document.pdf"
-        assert result.parent == tmp_dir
-@pytest.mark.asyncio
-async def test_download_to_tmp_no_filename():
-    """Test download with URL that has no filename."""
-    url = "https://example.com/"
-    tmp_dir = Path("/tmp")
-    mock_content = b"PDF content"
-    with patch('httpx.AsyncClient') as mock_client:
-        mock_response = AsyncMock()
-        mock_response.content = mock_content
-        mock_response.raise_for_status.return_value = None
-        mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
-        result = await download_to_tmp(url, tmp_dir)
-        assert isinstance(result, Path)
-        assert result.name == "document.pdf"  # Default filename
-        assert result.parent == tmp_dir
-@pytest.mark.asyncio
-async def test_download_to_tmp_http_error():
-    """Test download with HTTP error."""
-    url = "https://example.com/document.pdf"
-    tmp_dir = Path("/tmp")
-    with patch('httpx.AsyncClient') as mock_client:
-        mock_response = AsyncMock()
-        mock_response.content = b"PDF content"
-        mock_response.raise_for_status.side_effect = Exception("HTTP 404")
-        mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
-        with pytest.raises(Exception):
-            await download_to_tmp(url, tmp_dir)
-def test_extract_text_from_pdf_success():
-    """Test successful PDF text extraction."""
-    pdf_path = Path("/tmp/test.pdf")
-    expected_text = "Sample PDF content"
-    with patch('app.utils.pdf.extract_text_from_pdf') as mock_extract:
-        mock_extract.return_value = expected_text
-        result = extract_text_from_pdf(pdf_path)
-        assert result == expected_text
-def test_extract_text_from_pdf_multiple_pages():
-    """Test PDF text extraction from multiple pages."""
-    pdf_path = Path("/tmp/test.pdf")
-    expected_text = "Page 1 content\nPage 2 content\nPage 3 content"
-    with patch('app.utils.pdf.extract_text_from_pdf') as mock_extract:
-        mock_extract.return_value = expected_text
-        result = extract_text_from_pdf(pdf_path)
-        assert result == expected_text
-def test_extract_text_from_pdf_import_error():
-    """Test PDF extraction when PyMuPDF is not available."""
-    pdf_path = Path("/tmp/test.pdf")
-    with patch('app.utils.pdf.extract_text_from_pdf', side_effect=RuntimeError("PyMuPDF (fitz) is required")):
-        with pytest.raises(RuntimeError, match="PyMuPDF.*required"):
-            extract_text_from_pdf(pdf_path)
-def test_extract_text_from_pdf_file_error():
-    """Test PDF extraction with file read error."""
-    pdf_path = Path("/tmp/test.pdf")
-    with patch('app.utils.pdf.extract_text_from_pdf', side_effect=RuntimeError("PyMuPDF (fitz) is required")):
-        with pytest.raises(RuntimeError, match="PyMuPDF.*required"):
-            extract_text_from_pdf(pdf_path)

tests/test_priips_models.py DELETED Viewed

@@ -1,163 +0,0 @@
-import pytest
-from unittest.mock import patch
-from app.models.priips import (
-    PerformanceScenario, Costs, PriipsFields,
-    ExtractRequest, ExtractResult
-)
-def test_performance_scenario_model():
-    """Test PerformanceScenario Pydantic model."""
-    scenario = PerformanceScenario(
-        name="Bull Market",
-        description="Optimistic scenario",
-        return_pct=15.5
-    )
-    assert scenario.name == "Bull Market"
-    assert scenario.description == "Optimistic scenario"
-    assert scenario.return_pct == 15.5
-def test_performance_scenario_optional_fields():
-    """Test PerformanceScenario with optional fields."""
-    scenario = PerformanceScenario(name="Bear Market")
-    assert scenario.name == "Bear Market"
-    assert scenario.description is None
-    assert scenario.return_pct is None
-def test_costs_model():
-    """Test Costs Pydantic model."""
-    costs = Costs(
-        entry_cost_pct=2.5,
-        ongoing_cost_pct=1.2,
-        exit_cost_pct=0.5
-    )
-    assert costs.entry_cost_pct == 2.5
-    assert costs.ongoing_cost_pct == 1.2
-    assert costs.exit_cost_pct == 0.5
-def test_costs_optional_fields():
-    """Test Costs with optional fields."""
-    costs = Costs()
-    assert costs.entry_cost_pct is None
-    assert costs.ongoing_cost_pct is None
-    assert costs.exit_cost_pct is None
-def test_priips_fields_model():
-    """Test PriipsFields Pydantic model."""
-    performance_scenarios = [
-        PerformanceScenario(name="Bull", return_pct=10.0),
-        PerformanceScenario(name="Bear", return_pct=-5.0)
-    ]
-    costs = Costs(entry_cost_pct=1.0, ongoing_cost_pct=0.5)
-    priips = PriipsFields(
-        product_name="Test Fund",
-        manufacturer="Test Company",
-        isin="TEST123456789",
-        sri=3,
-        recommended_holding_period="5 years",
-        costs=costs,
-        performance_scenarios=performance_scenarios,
-        date="2024-01-01",
-        language="en",
-        source_url="https://example.com/doc.pdf"
-    )
-    assert priips.product_name == "Test Fund"
-    assert priips.manufacturer == "Test Company"
-    assert priips.isin == "TEST123456789"
-    assert priips.sri == 3
-    assert priips.recommended_holding_period == "5 years"
-    assert priips.costs == costs
-    assert len(priips.performance_scenarios) == 2
-    assert priips.date == "2024-01-01"
-    assert priips.language == "en"
-    assert priips.source_url == "https://example.com/doc.pdf"
-def test_priips_fields_optional_fields():
-    """Test PriipsFields with minimal required fields."""
-    priips = PriipsFields()
-    assert priips.product_name is None
-    assert priips.manufacturer is None
-    assert priips.isin is None
-    assert priips.sri is None
-    assert priips.recommended_holding_period is None
-    assert priips.costs is None
-    assert priips.performance_scenarios is None
-    assert priips.date is None
-    assert priips.language is None
-    assert priips.source_url is None
-def test_extract_request_model():
-    """Test ExtractRequest Pydantic model."""
-    request = ExtractRequest(
-        sources=["https://example.com/doc1.pdf", "/path/to/doc2.pdf"],
-        options={"language": "en", "ocr": False}
-    )
-    assert len(request.sources) == 2
-    assert request.sources[0] == "https://example.com/doc1.pdf"
-    assert request.sources[1] == "/path/to/doc2.pdf"
-    assert request.options["language"] == "en"
-    assert request.options["ocr"] is False
-def test_extract_request_minimal():
-    """Test ExtractRequest with minimal fields."""
-    request = ExtractRequest(sources=["https://example.com/doc.pdf"])
-    assert len(request.sources) == 1
-    assert request.options is None
-def test_extract_result_success():
-    """Test ExtractResult for successful extraction."""
-    priips_data = PriipsFields(product_name="Test Fund", isin="TEST123")
-    result = ExtractResult(
-        source="https://example.com/doc.pdf",
-        success=True,
-        data=priips_data
-    )
-    assert result.source == "https://example.com/doc.pdf"
-    assert result.success is True
-    assert result.data == priips_data
-    assert result.error is None
-def test_extract_result_failure():
-    """Test ExtractResult for failed extraction."""
-    result = ExtractResult(
-        source="https://example.com/doc.pdf",
-        success=False,
-        error="Failed to parse PDF"
-    )
-    assert result.source == "https://example.com/doc.pdf"
-    assert result.success is False
-    assert result.error == "Failed to parse PDF"
-    assert result.data is None
-def test_model_validation():
-    """Test Pydantic model validation."""
-    # Test valid SRI values (1-7)
-    for sri in range(1, 8):
-        priips = PriipsFields(sri=sri)
-        assert priips.sri == sri
-    # Test that SRI can be None (optional field)
-    priips = PriipsFields()
-    assert priips.sri is None