jeanbaptdzd commited on
Commit
ee07ed2
Β·
1 Parent(s): 772dd21

Deploy PRIIPs LLM Service to HF Spaces + RAG workflow

Browse files

βœ… Successful deployment:
- Model: DragonLLM/qwen3-8b-fin-v1.0 (8B parameters)
- Hardware: L4 GPU (24GB VRAM)
- Backend: vLLM with eager mode (stable)
- Context: 4096 tokens
- API: OpenAI-compatible at https://jeanbaptdzd-priips-llm-service.hf.space

πŸ”§ Configuration updates:
- Updated Dockerfile to CUDA 12.4.0, Python 3.11
- Configured vLLM with enforce_eager=True for L4 stability
- Set max_model_len=4096, gpu_memory_utilization=0.85
- Fixed KV cache memory allocation issues
- Background model initialization to avoid timeouts
- Config: allow extra fields in .env

πŸ“š PRIIPS RAG Workflow:
- Created priips_documents/ directory structure (raw/extracted/processed)
- Added extract_priips.py: PDF β†’ JSON extraction script
- Added query_with_context.py: RAG-powered query system
- Comprehensive documentation in PRIIPS_WORKFLOW.md
- Test service utilities

🎯 Tested and working:
- All API endpoints operational (/, /v1/models, /v1/chat/completions)
- Financial calculations: CAGR, returns
- Risk assessment: market/credit risk concepts
- PRIIPS knowledge: SRI, KID sections
- Information extraction from documents
- Ready for RAG integration with PydanticAI/DSPy

Dockerfile CHANGED
@@ -1,40 +1,57 @@
1
- FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
 
2
 
3
  # Set environment variables
4
- ENV DEBIAN_FRONTEND=noninteractive
5
  ENV PYTHONUNBUFFERED=1
 
6
 
7
- # Install Python and system dependencies
8
  RUN apt-get update && apt-get install -y \
9
  python3.11 \
10
  python3.11-dev \
11
  python3-pip \
12
  git \
13
  curl \
14
- && rm -rf /var/lib/apt/lists/* \
15
- && ln -s /usr/bin/python3.11 /usr/bin/python
 
 
 
16
 
17
  # Upgrade pip
18
- RUN python -m pip install --upgrade pip
19
 
20
  # Set working directory
21
  WORKDIR /app
22
 
23
- # Copy requirements first for better caching
24
- COPY requirements.txt .
25
-
26
- # Install Python dependencies
27
- RUN pip install --no-cache-dir -r requirements.txt
 
 
 
 
 
 
28
 
29
  # Copy application code
30
  COPY app/ ./app/
31
 
32
- # Create a non-root user
33
- RUN useradd -m -u 1000 user && chown -R user:user /app
 
 
 
34
  USER user
35
 
36
- # Set HuggingFace cache directory
37
  ENV HF_HOME=/tmp/huggingface
 
 
 
 
38
 
39
  # Expose port
40
  EXPOSE 7860
 
1
+ # Use NVIDIA CUDA 12.4 base image (12.1 is deprecated)
2
+ FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
3
 
4
  # Set environment variables
 
5
  ENV PYTHONUNBUFFERED=1
6
+ ENV DEBIAN_FRONTEND=noninteractive
7
 
8
+ # Install Python 3.11 and build dependencies
9
  RUN apt-get update && apt-get install -y \
10
  python3.11 \
11
  python3.11-dev \
12
  python3-pip \
13
  git \
14
  curl \
15
+ && rm -rf /var/lib/apt/lists/*
16
+
17
+ # Set Python 3.11 as default
18
+ RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
19
+ update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
20
 
21
  # Upgrade pip
22
+ RUN python3 -m pip install --upgrade pip
23
 
24
  # Set working directory
25
  WORKDIR /app
26
 
27
+ # Install vLLM and dependencies in one layer for efficiency
28
+ RUN pip install --no-cache-dir \
29
+ vllm \
30
+ fastapi>=0.115.0 \
31
+ uvicorn[standard]>=0.30.0 \
32
+ pydantic>=2.8.0 \
33
+ pydantic-settings>=2.4.0 \
34
+ httpx>=0.27.0 \
35
+ python-dotenv>=1.0.1 \
36
+ tenacity>=8.3.0 \
37
+ PyMuPDF>=1.24.0
38
 
39
  # Copy application code
40
  COPY app/ ./app/
41
 
42
+ # Create a non-root user and set up cache directories
43
+ RUN useradd -m -u 1000 user && \
44
+ mkdir -p /tmp/huggingface /tmp/torch/inductor /tmp/triton && \
45
+ chown -R user:user /app /tmp/huggingface /tmp/torch /tmp/triton
46
+
47
  USER user
48
 
49
+ # Set environment variables for optimal vLLM + torch.compile performance
50
  ENV HF_HOME=/tmp/huggingface
51
+ ENV TORCHINDUCTOR_CACHE_DIR=/tmp/torch/inductor
52
+ ENV TRITON_CACHE_DIR=/tmp/triton
53
+ ENV TORCH_COMPILE_DEBUG=0
54
+ ENV CUDA_VISIBLE_DEVICES=0
55
 
56
  # Expose port
57
  EXPOSE 7860
PRIIPS_WORKFLOW.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PRIIPS Document Extraction & RAG Workflow
2
+
3
+ Complete workflow for extracting PRIIPS KID documents and querying with LLM context.
4
+
5
+ ## πŸ“ Directory Structure
6
+
7
+ ```
8
+ priips_documents/
9
+ β”œβ”€β”€ raw/ # Place your PDF documents here
10
+ β”œβ”€β”€ extracted/ # Extracted JSON documents (auto-generated)
11
+ └── processed/ # Chunked documents for RAG (future)
12
+
13
+ scripts/
14
+ β”œβ”€β”€ extract_priips.py # Extract text from PDFs
15
+ └── query_with_context.py # Query LLM with document context
16
+ ```
17
+
18
+ ## πŸš€ Quick Start
19
+
20
+ ### 1. Add PRIIPS Documents
21
+
22
+ Place PDF documents in `priips_documents/raw/`:
23
+
24
+ ```bash
25
+ # Naming convention: {ISIN}_{ProductName}_{Date}.pdf
26
+ cp /path/to/your/priips.pdf priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
27
+ ```
28
+
29
+ ### 2. Extract Document Content
30
+
31
+ ```bash
32
+ # Extract all PDFs in the raw directory
33
+ python scripts/extract_priips.py priips_documents/raw/
34
+
35
+ # Or extract a single file
36
+ python scripts/extract_priips.py priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
37
+ ```
38
+
39
+ **Output:** JSON files in `priips_documents/extracted/` with structured content:
40
+ - Metadata (ISIN, product name, dates)
41
+ - Raw extracted text
42
+ - Parsed sections (objectives, risks, costs, etc.)
43
+
44
+ ### 3. Query with RAG Context
45
+
46
+ ```bash
47
+ # Ask questions about your documents
48
+ python scripts/query_with_context.py "What is the recommended holding period?"
49
+
50
+ python scripts/query_with_context.py "What are the main risks of this investment?"
51
+
52
+ python scripts/query_with_context.py "Summarize the cost structure"
53
+ ```
54
+
55
+ **Options:**
56
+ ```bash
57
+ # Specify different extracted directory
58
+ python scripts/query_with_context.py "Your question" --extracted-dir custom/path/
59
+
60
+ # Control context size and response length
61
+ python scripts/query_with_context.py "Your question" \
62
+ --max-context 3000 \
63
+ --max-tokens 800
64
+ ```
65
+
66
+ ## πŸ“Š Example Workflow
67
+
68
+ ```bash
69
+ # 1. Add a PRIIPS PDF
70
+ cp MyFund.pdf priips_documents/raw/FR0012345678_MyFund_2024.pdf
71
+
72
+ # 2. Extract content
73
+ python scripts/extract_priips.py priips_documents/raw/
74
+
75
+ # Output:
76
+ # πŸ“„ Processing: FR0012345678_MyFund_2024.pdf
77
+ # βœ… Extracted 12,543 characters
78
+ # πŸ’Ύ Saved to: priips_documents/extracted/FR0012345678_MyFund_2024_extracted.json
79
+
80
+ # 3. Query the LLM
81
+ python scripts/query_with_context.py "What is the SRI of this fund?"
82
+
83
+ # Output:
84
+ # πŸ“š Loading documents from priips_documents/extracted...
85
+ # βœ… Loaded 1 documents
86
+ # πŸ” Querying LLM with 1,234 chars of context...
87
+ # πŸ“Š Tokens used: 234
88
+ #
89
+ # πŸ’¬ Answer:
90
+ # Based on the PRIIPS document, the Summary Risk Indicator (SRI) for this fund is 5 out of 7...
91
+ ```
92
+
93
+ ## 🎯 Use Cases
94
+
95
+ ### Document Comparison
96
+ ```bash
97
+ python scripts/query_with_context.py "Compare the risk profiles of all available funds"
98
+ ```
99
+
100
+ ### Specific Information Extraction
101
+ ```bash
102
+ python scripts/query_with_context.py "Extract all recommended holding periods"
103
+ python scripts/query_with_context.py "List all ISINs and their product names"
104
+ ```
105
+
106
+ ### Compliance Checks
107
+ ```bash
108
+ python scripts/query_with_context.py "Are there any funds with SRI above 6?"
109
+ python scripts/query_with_context.py "Which funds have holding periods under 3 years?"
110
+ ```
111
+
112
+ ## πŸ”§ Advanced: Integrate with PydanticAI
113
+
114
+ ```python
115
+ from pydantic_ai import Agent
116
+ from pydantic_ai.models.openai import OpenAIModel
117
+
118
+ # Configure with your deployed service
119
+ model = OpenAIModel(
120
+ 'DragonLLM/qwen3-8b-fin-v1.0',
121
+ base_url='https://jeanbaptdzd-priips-llm-service.hf.space/v1',
122
+ )
123
+
124
+ agent = Agent(model=model)
125
+
126
+ # Load PRIIPS context
127
+ with open('priips_documents/extracted/LU123_extracted.json') as f:
128
+ context = json.load(f)
129
+
130
+ # Query with context
131
+ result = agent.run_sync(
132
+ f"Based on this PRIIPS document: {context['raw_text'][:2000]}... "
133
+ f"What is the recommended holding period?"
134
+ )
135
+ ```
136
+
137
+ ## πŸ“ Extracted Document Schema
138
+
139
+ ```json
140
+ {
141
+ "metadata": {
142
+ "filename": "LU1234567890_GlobalEquity_2024.pdf",
143
+ "extraction_date": "2024-10-28T16:24:00",
144
+ "isin": "LU1234567890",
145
+ "product_name": "GlobalEquity",
146
+ "file_size_bytes": 245678,
147
+ "text_length": 12543
148
+ },
149
+ "raw_text": "Full extracted text from PDF...",
150
+ "sections": {
151
+ "summary": "What is this product? ...",
152
+ "objectives": "Investment objectives and policy...",
153
+ "risk_indicator": "SRI: 5/7 ...",
154
+ "performance_scenarios": "Performance scenarios...",
155
+ "costs": "What are the costs? ...",
156
+ "holding_period": "Recommended: 5 years"
157
+ }
158
+ }
159
+ ```
160
+
161
+ ## πŸš€ Next Steps
162
+
163
+ 1. **Add More Documents:** Place additional PRIIPS PDFs in `raw/`
164
+ 2. **Enhance Extraction:** Improve section parsing in `extract_priips.py`
165
+ 3. **Add Embeddings:** Implement vector search for better RAG
166
+ 4. **Build API:** Create REST API endpoints for document queries
167
+ 5. **Dashboard:** Build web UI for document management and queries
168
+
169
+ ## πŸ“š API Integration
170
+
171
+ The LLM service is OpenAI-compatible and deployed at:
172
+ ```
173
+ https://jeanbaptdzd-priips-llm-service.hf.space/v1
174
+ ```
175
+
176
+ **Endpoints:**
177
+ - `GET /` - Service status
178
+ - `GET /v1/models` - List available models
179
+ - `POST /v1/chat/completions` - Chat completion with context
180
+
181
+ See `test_service.py` for integration examples.
182
+
README.md CHANGED
@@ -7,11 +7,12 @@ sdk: docker
7
  pinned: false
8
  license: mit
9
  app_port: 7860
 
10
  ---
11
 
12
  # PRIIPs LLM Service - Hugging Face Spaces
13
 
14
- OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/LLM-Pro-Finance-Small` via vLLM.
15
 
16
  ## πŸš€ Quick Start
17
 
@@ -34,7 +35,7 @@ curl -X GET "https://your-space-url.hf.space/v1/models"
34
  curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
35
  -H "Content-Type: application/json" \
36
  -d '{
37
- "model": "DragonLLM/LLM-Pro-Finance-Small",
38
  "messages": [{"role": "user", "content": "Hello!"}],
39
  "temperature": 0.7
40
  }'
@@ -95,7 +96,7 @@ from pydantic_ai import Agent
95
  from pydantic_ai.models.openai import OpenAIModel
96
 
97
  model = OpenAIModel(
98
- "DragonLLM/LLM-Pro-Finance-Small",
99
  base_url="https://your-space-url.hf.space/v1"
100
  )
101
 
@@ -107,7 +108,7 @@ agent = Agent(model=model)
107
  import dspy
108
 
109
  lm = dspy.OpenAI(
110
- model="DragonLLM/LLM-Pro-Finance-Small",
111
  api_base="https://your-space-url.hf.space/v1"
112
  )
113
  ```
 
7
  pinned: false
8
  license: mit
9
  app_port: 7860
10
+ hardware: l4
11
  ---
12
 
13
  # PRIIPs LLM Service - Hugging Face Spaces
14
 
15
+ OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
16
 
17
  ## πŸš€ Quick Start
18
 
 
35
  curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
36
  -H "Content-Type: application/json" \
37
  -d '{
38
+ "model": "DragonLLM/gemma3-12b-fin-v0.3",
39
  "messages": [{"role": "user", "content": "Hello!"}],
40
  "temperature": 0.7
41
  }'
 
96
  from pydantic_ai.models.openai import OpenAIModel
97
 
98
  model = OpenAIModel(
99
+ "DragonLLM/gemma3-12b-fin-v0.3",
100
  base_url="https://your-space-url.hf.space/v1"
101
  )
102
 
 
108
  import dspy
109
 
110
  lm = dspy.OpenAI(
111
+ model="DragonLLM/gemma3-12b-fin-v0.3",
112
  api_base="https://your-space-url.hf.space/v1"
113
  )
114
  ```
app/config.py CHANGED
@@ -3,13 +3,14 @@ from pydantic_settings import BaseSettings
3
 
4
  class Settings(BaseSettings):
5
  vllm_base_url: str = "http://localhost:8000/v1"
6
- model: str = "DragonLLM/LLM-Pro-Finance-Small"
7
  service_api_key: str | None = None
8
  log_level: str = "info"
9
 
10
  class Config:
11
  env_file = ".env"
12
  env_file_encoding = "utf-8"
 
13
 
14
 
15
  settings = Settings()
 
3
 
4
  class Settings(BaseSettings):
5
  vllm_base_url: str = "http://localhost:8000/v1"
6
+ model: str = "DragonLLM/qwen3-8b-fin-v1.0"
7
  service_api_key: str | None = None
8
  log_level: str = "info"
9
 
10
  class Config:
11
  env_file = ".env"
12
  env_file_encoding = "utf-8"
13
+ extra = "ignore" # Ignore extra fields in .env
14
 
15
 
16
  settings = Settings()
app/main.py CHANGED
@@ -18,9 +18,19 @@ app.middleware("http")(api_key_guard)
18
 
19
  @app.on_event("startup")
20
  async def startup_event():
21
- """Preload the model on startup"""
 
22
  logger.info("Starting PRIIPs LLM Service...")
23
- logger.info("Model will be loaded on first request to optimize startup time")
 
 
 
 
 
 
 
 
 
24
 
25
  @app.get("/")
26
  async def root():
@@ -28,7 +38,7 @@ async def root():
28
  "status": "ok",
29
  "service": "PRIIPs LLM Service",
30
  "version": "1.0.0",
31
- "model": "DragonLLM/LLM-Pro-Finance-Small",
32
  "backend": "vLLM"
33
  }
34
 
 
18
 
19
  @app.on_event("startup")
20
  async def startup_event():
21
+ """Startup event - initialize model in background"""
22
+ import threading
23
  logger.info("Starting PRIIPs LLM Service...")
24
+ logger.info("Initializing model in background thread...")
25
+
26
+ def load_model():
27
+ from app.providers.vllm import initialize_vllm
28
+ initialize_vllm()
29
+
30
+ # Start model loading in background thread
31
+ thread = threading.Thread(target=load_model, daemon=True)
32
+ thread.start()
33
+ logger.info("Model initialization started in background")
34
 
35
  @app.get("/")
36
  async def root():
 
38
  "status": "ok",
39
  "service": "PRIIPs LLM Service",
40
  "version": "1.0.0",
41
+ "model": "DragonLLM/qwen3-8b-fin-v1.0",
42
  "backend": "vLLM"
43
  }
44
 
app/middleware.py CHANGED
@@ -5,11 +5,22 @@ from app.config import settings
5
 
6
 
7
  async def api_key_guard(request: Request, call_next):
 
 
 
 
 
 
 
 
8
  if not settings.service_api_key:
9
  return await call_next(request)
 
 
10
  key = request.headers.get("x-api-key") or request.headers.get("authorization")
11
  if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
12
  return await call_next(request)
 
13
  return JSONResponse({"error": "unauthorized"}, status_code=401)
14
 
15
 
 
5
 
6
 
7
  async def api_key_guard(request: Request, call_next):
8
+ # Public endpoints that don't require authentication
9
+ public_paths = ["/", "/health", "/docs", "/redoc", "/openapi.json"]
10
+
11
+ # Skip auth for public endpoints
12
+ if request.url.path in public_paths:
13
+ return await call_next(request)
14
+
15
+ # Skip auth if no API key is configured
16
  if not settings.service_api_key:
17
  return await call_next(request)
18
+
19
+ # Check API key
20
  key = request.headers.get("x-api-key") or request.headers.get("authorization")
21
  if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
22
  return await call_next(request)
23
+
24
  return JSONResponse({"error": "unauthorized"}, status_code=401)
25
 
26
 
app/providers/vllm.py CHANGED
@@ -3,9 +3,10 @@ from typing import Dict, Any, AsyncIterator
3
  from vllm import LLM, SamplingParams
4
  from vllm.entrypoints.openai.api_server import build_async_engine_client
5
  import asyncio
 
6
 
7
- # Model configuration
8
- model_name = "DragonLLM/LLM-Pro-Finance-Small"
9
  llm_engine = None
10
 
11
  def initialize_vllm():
@@ -15,26 +16,51 @@ def initialize_vllm():
15
  if llm_engine is None:
16
  print(f"Initializing vLLM with model: {model_name}")
17
 
18
- # Get HF token from environment
19
- hf_token = os.getenv("HF_TOKEN_LC")
 
20
  if hf_token:
 
 
 
 
 
 
 
 
 
21
  os.environ["HF_TOKEN"] = hf_token
22
  os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
 
 
 
23
 
24
  try:
25
- # Initialize vLLM engine
 
 
 
 
 
 
 
26
  llm_engine = LLM(
27
  model=model_name,
28
  trust_remote_code=True,
29
- dtype="float16",
30
- max_model_len=4096,
31
- gpu_memory_utilization=0.9,
32
- tensor_parallel_size=1, # L40 has 1 GPU
33
  download_dir="/tmp/huggingface",
 
 
 
 
 
34
  )
35
- print(f"vLLM engine initialized successfully!")
36
  except Exception as e:
37
- print(f"Error initializing vLLM: {e}")
38
  raise
39
 
40
 
 
3
  from vllm import LLM, SamplingParams
4
  from vllm.entrypoints.openai.api_server import build_async_engine_client
5
  import asyncio
6
+ from huggingface_hub import login
7
 
8
+ # Model configuration - optimized for 8B Qwen3 on L4
9
+ model_name = "DragonLLM/qwen3-8b-fin-v1.0"
10
  llm_engine = None
11
 
12
  def initialize_vllm():
 
16
  if llm_engine is None:
17
  print(f"Initializing vLLM with model: {model_name}")
18
 
19
+ # Get HF token from environment (Hugging Face Space secret)
20
+ # Try HF_TOKEN_LC2 first (for DragonLLM access), then fall back to HF_TOKEN_LC
21
+ hf_token = os.getenv("HF_TOKEN_LC2") or os.getenv("HF_TOKEN_LC")
22
  if hf_token:
23
+ token_source = "HF_TOKEN_LC2" if os.getenv("HF_TOKEN_LC2") else "HF_TOKEN_LC"
24
+ print(f"βœ… {token_source} found (length: {len(hf_token)})")
25
+ # Properly authenticate with Hugging Face Hub
26
+ try:
27
+ login(token=hf_token, add_to_git_credential=False)
28
+ print("βœ… Successfully authenticated with Hugging Face Hub")
29
+ except Exception as e:
30
+ print(f"⚠️ Warning: Failed to authenticate with HF Hub: {e}")
31
+ # Also set environment variables as fallback
32
  os.environ["HF_TOKEN"] = hf_token
33
  os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
34
+ else:
35
+ print("⚠️ WARNING: Neither HF_TOKEN_LC2 nor HF_TOKEN_LC found in environment!")
36
+ print("Available env vars:", list(os.environ.keys()))
37
 
38
  try:
39
+ # Initialize vLLM engine with explicit token
40
+ print(f"Attempting to load model: {model_name}")
41
+ print(f"Model type: Qwen3 8B (bfloat16) - Optimized for L4 with torch.compile")
42
+ print(f"Download directory: /tmp/huggingface")
43
+ print(f"Trust remote code: True")
44
+ print(f"L4 GPU: 24GB VRAM available")
45
+ print(f"Mode: Eager mode (CUDA graphs disabled for L4)")
46
+
47
  llm_engine = LLM(
48
  model=model_name,
49
  trust_remote_code=True,
50
+ dtype="bfloat16", # Use bfloat16 for Qwen3 (required)
51
+ max_model_len=4096, # Reduced for L4 KV cache constraints
52
+ gpu_memory_utilization=0.85, # Increased to fit KV cache
53
+ tensor_parallel_size=1, # Single L4 GPU
54
  download_dir="/tmp/huggingface",
55
+ tokenizer_mode="auto",
56
+ # Disable torch.compile on L4 due to memory constraints
57
+ enforce_eager=True, # Use eager mode (no CUDA graphs/compilation)
58
+ # Let vLLM handle compilation and fallback gracefully
59
+ disable_log_stats=False, # Enable logging for debugging
60
  )
61
+ print(f"βœ… vLLM engine initialized successfully with {model_name}!")
62
  except Exception as e:
63
+ print(f"❌ Error initializing vLLM: {e}")
64
  raise
65
 
66
 
requirements.txt CHANGED
@@ -1,3 +1,5 @@
 
 
1
  fastapi>=0.115.0
2
  uvicorn[standard]>=0.30.0
3
  pydantic>=2.8.0
@@ -7,4 +9,3 @@ python-dotenv>=1.0.1
7
  tenacity>=8.3.0
8
  PyMuPDF>=1.24.0
9
  pytest>=7.4.0
10
-
 
1
+ # Dependencies installed in Dockerfile during HF Space build
2
+ vllm
3
  fastapi>=0.115.0
4
  uvicorn[standard]>=0.30.0
5
  pydantic>=2.8.0
 
9
  tenacity>=8.3.0
10
  PyMuPDF>=1.24.0
11
  pytest>=7.4.0
 
scripts/extract_priips.py ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ PRIIPS Document Extraction Script
4
+
5
+ Extracts text from PRIIPS KID PDFs and processes them for RAG context.
6
+ """
7
+
8
+ import sys
9
+ import json
10
+ from pathlib import Path
11
+ from datetime import datetime
12
+ import argparse
13
+
14
+ # Add parent directory to path
15
+ sys.path.insert(0, str(Path(__file__).parent.parent))
16
+
17
+ from app.utils.pdf import extract_text_from_pdf
18
+
19
+
20
+ def extract_priips_document(pdf_path: Path, output_dir: Path) -> dict:
21
+ """
22
+ Extract content from a PRIIPS KID PDF.
23
+
24
+ Args:
25
+ pdf_path: Path to the PDF file
26
+ output_dir: Directory to save extracted content
27
+
28
+ Returns:
29
+ Dictionary with extracted content
30
+ """
31
+ print(f"πŸ“„ Processing: {pdf_path.name}")
32
+
33
+ # Extract text from PDF
34
+ try:
35
+ raw_text = extract_text_from_pdf(pdf_path)
36
+ print(f"βœ… Extracted {len(raw_text)} characters")
37
+ except Exception as e:
38
+ print(f"❌ Error extracting PDF: {e}")
39
+ return None
40
+
41
+ # Parse filename for metadata
42
+ filename_parts = pdf_path.stem.split("_")
43
+ isin = filename_parts[0] if len(filename_parts) > 0 else "UNKNOWN"
44
+ product_name = filename_parts[1] if len(filename_parts) > 1 else pdf_path.stem
45
+
46
+ # Create structured output
47
+ extracted_data = {
48
+ "metadata": {
49
+ "filename": pdf_path.name,
50
+ "extraction_date": datetime.now().isoformat(),
51
+ "isin": isin,
52
+ "product_name": product_name,
53
+ "file_size_bytes": pdf_path.stat().st_size,
54
+ "text_length": len(raw_text)
55
+ },
56
+ "raw_text": raw_text,
57
+ "sections": extract_sections(raw_text)
58
+ }
59
+
60
+ # Save to JSON
61
+ output_path = output_dir / f"{pdf_path.stem}_extracted.json"
62
+ with open(output_path, "w", encoding="utf-8") as f:
63
+ json.dump(extracted_data, f, indent=2, ensure_ascii=False)
64
+
65
+ print(f"πŸ’Ύ Saved to: {output_path}")
66
+ return extracted_data
67
+
68
+
69
+ def extract_sections(text: str) -> dict:
70
+ """
71
+ Extract common PRIIPS KID sections from text.
72
+
73
+ This is a simple implementation. Can be enhanced with LLM-based extraction.
74
+ """
75
+ sections = {}
76
+
77
+ # Common PRIIPS section keywords
78
+ keywords = {
79
+ "summary": ["what is this product", "summary"],
80
+ "objectives": ["objectives", "investment objectives"],
81
+ "risk_indicator": ["risk indicator", "sri", "summary risk"],
82
+ "performance_scenarios": ["performance scenarios", "what could i get"],
83
+ "costs": ["what are the costs", "costs"],
84
+ "holding_period": ["recommended holding period", "holding period"]
85
+ }
86
+
87
+ text_lower = text.lower()
88
+
89
+ for section_name, search_terms in keywords.items():
90
+ for term in search_terms:
91
+ if term in text_lower:
92
+ # Extract a snippet around the keyword
93
+ start_idx = text_lower.find(term)
94
+ # Get 500 chars after the keyword
95
+ snippet = text[start_idx:start_idx + 500].strip()
96
+ sections[section_name] = snippet
97
+ break
98
+
99
+ return sections
100
+
101
+
102
+ def batch_process_directory(input_dir: Path, output_dir: Path):
103
+ """Process all PDFs in a directory."""
104
+ pdf_files = list(input_dir.glob("*.pdf"))
105
+
106
+ if not pdf_files:
107
+ print(f"⚠️ No PDF files found in {input_dir}")
108
+ return
109
+
110
+ print(f"πŸ“¦ Found {len(pdf_files)} PDF files to process\n")
111
+
112
+ output_dir.mkdir(parents=True, exist_ok=True)
113
+
114
+ results = []
115
+ for pdf_path in pdf_files:
116
+ result = extract_priips_document(pdf_path, output_dir)
117
+ if result:
118
+ results.append(result)
119
+ print() # Blank line between files
120
+
121
+ # Save summary
122
+ summary_path = output_dir / "_extraction_summary.json"
123
+ summary = {
124
+ "extraction_date": datetime.now().isoformat(),
125
+ "total_processed": len(results),
126
+ "total_failed": len(pdf_files) - len(results),
127
+ "files": [r["metadata"] for r in results]
128
+ }
129
+
130
+ with open(summary_path, "w", encoding="utf-8") as f:
131
+ json.dump(summary, f, indent=2)
132
+
133
+ print(f"\nβœ… Processed {len(results)}/{len(pdf_files)} files successfully")
134
+ print(f"πŸ“Š Summary saved to: {summary_path}")
135
+
136
+
137
+ def main():
138
+ parser = argparse.ArgumentParser(
139
+ description="Extract PRIIPS KID documents for RAG context"
140
+ )
141
+ parser.add_argument(
142
+ "input",
143
+ type=str,
144
+ help="Input PDF file or directory containing PDFs"
145
+ )
146
+ parser.add_argument(
147
+ "--output",
148
+ type=str,
149
+ default=None,
150
+ help="Output directory (default: priips_documents/extracted/)"
151
+ )
152
+
153
+ args = parser.parse_args()
154
+
155
+ # Setup paths
156
+ workspace_root = Path(__file__).parent.parent
157
+ input_path = Path(args.input)
158
+
159
+ if not input_path.is_absolute():
160
+ input_path = workspace_root / input_path
161
+
162
+ if args.output:
163
+ output_dir = Path(args.output)
164
+ if not output_dir.is_absolute():
165
+ output_dir = workspace_root / output_dir
166
+ else:
167
+ output_dir = workspace_root / "priips_documents" / "extracted"
168
+
169
+ # Process
170
+ if input_path.is_file():
171
+ output_dir.mkdir(parents=True, exist_ok=True)
172
+ extract_priips_document(input_path, output_dir)
173
+ elif input_path.is_dir():
174
+ batch_process_directory(input_path, output_dir)
175
+ else:
176
+ print(f"❌ Error: {input_path} does not exist")
177
+ sys.exit(1)
178
+
179
+
180
+ if __name__ == "__main__":
181
+ main()
182
+
scripts/query_with_context.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Query LLM with PRIIPS Document Context
4
+
5
+ Loads extracted PRIIPS documents and queries the LLM with RAG context.
6
+ """
7
+
8
+ import sys
9
+ import json
10
+ import argparse
11
+ from pathlib import Path
12
+ from typing import List, Dict
13
+ import requests
14
+
15
+ # Configuration
16
+ BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
17
+ MODEL = "DragonLLM/qwen3-8b-fin-v1.0"
18
+
19
+
20
+ def load_extracted_documents(extracted_dir: Path) -> List[Dict]:
21
+ """Load all extracted PRIIPS documents."""
22
+ documents = []
23
+
24
+ for json_file in extracted_dir.glob("*_extracted.json"):
25
+ if json_file.name.startswith("_"):
26
+ continue # Skip summary files
27
+
28
+ with open(json_file, "r", encoding="utf-8") as f:
29
+ documents.append(json.load(f))
30
+
31
+ return documents
32
+
33
+
34
+ def build_context(documents: List[Dict], query: str, max_chars: int = 2000) -> str:
35
+ """
36
+ Build RAG context from documents relevant to the query.
37
+
38
+ Simple implementation: include all document summaries.
39
+ Can be enhanced with semantic search/embeddings.
40
+ """
41
+ context_parts = []
42
+ total_chars = 0
43
+
44
+ for doc in documents:
45
+ metadata = doc["metadata"]
46
+
47
+ # Build a summary of this document
48
+ doc_summary = f"\n--- Document: {metadata['product_name']} (ISIN: {metadata['isin']}) ---\n"
49
+
50
+ # Include extracted sections
51
+ if "sections" in doc and doc["sections"]:
52
+ for section_name, content in doc["sections"].items():
53
+ if content:
54
+ section_text = f"\n{section_name.upper()}:\n{content[:300]}...\n"
55
+ doc_summary += section_text
56
+
57
+ # Check if we have space
58
+ if total_chars + len(doc_summary) > max_chars:
59
+ break
60
+
61
+ context_parts.append(doc_summary)
62
+ total_chars += len(doc_summary)
63
+
64
+ if not context_parts:
65
+ return "No relevant documents found."
66
+
67
+ return "\n".join(context_parts)
68
+
69
+
70
+ def query_llm(query: str, context: str, max_tokens: int = 500) -> str:
71
+ """Query the LLM with context."""
72
+
73
+ # Build the prompt with context
74
+ prompt = f"""You are a financial expert assistant specializing in PRIIPS Key Information Documents.
75
+
76
+ Use the following context from PRIIPS documents to answer the question:
77
+
78
+ {context}
79
+
80
+ Question: {query}
81
+
82
+ Provide a clear, accurate answer based on the context provided. If the context doesn't contain enough information, say so."""
83
+
84
+ payload = {
85
+ "model": MODEL,
86
+ "messages": [
87
+ {"role": "system", "content": "You are a PRIIPS financial document expert."},
88
+ {"role": "user", "content": prompt}
89
+ ],
90
+ "max_tokens": max_tokens,
91
+ "temperature": 0.3 # Lower temperature for more factual responses
92
+ }
93
+
94
+ print(f"πŸ” Querying LLM with {len(context)} chars of context...")
95
+
96
+ try:
97
+ response = requests.post(
98
+ f"{BASE_URL}/v1/chat/completions",
99
+ json=payload,
100
+ timeout=60
101
+ )
102
+ response.raise_for_status()
103
+
104
+ data = response.json()
105
+ answer = data["choices"][0]["message"]["content"]
106
+
107
+ # Print usage stats
108
+ usage = data.get("usage", {})
109
+ print(f"πŸ“Š Tokens used: {usage.get('total_tokens', 'N/A')}")
110
+
111
+ return answer
112
+
113
+ except Exception as e:
114
+ return f"Error querying LLM: {e}"
115
+
116
+
117
+ def main():
118
+ parser = argparse.ArgumentParser(
119
+ description="Query LLM with PRIIPS document context"
120
+ )
121
+ parser.add_argument(
122
+ "query",
123
+ type=str,
124
+ help="Question to ask about PRIIPS documents"
125
+ )
126
+ parser.add_argument(
127
+ "--extracted-dir",
128
+ type=str,
129
+ default="priips_documents/extracted",
130
+ help="Directory containing extracted documents"
131
+ )
132
+ parser.add_argument(
133
+ "--max-context",
134
+ type=int,
135
+ default=2000,
136
+ help="Maximum context characters to include"
137
+ )
138
+ parser.add_argument(
139
+ "--max-tokens",
140
+ type=int,
141
+ default=500,
142
+ help="Maximum tokens in response"
143
+ )
144
+
145
+ args = parser.parse_args()
146
+
147
+ # Setup paths
148
+ workspace_root = Path(__file__).parent.parent
149
+ extracted_dir = workspace_root / args.extracted_dir
150
+
151
+ if not extracted_dir.exists():
152
+ print(f"❌ Directory not found: {extracted_dir}")
153
+ print("Run extract_priips.py first to extract documents.")
154
+ sys.exit(1)
155
+
156
+ # Load documents
157
+ print(f"πŸ“š Loading documents from {extracted_dir}...")
158
+ documents = load_extracted_documents(extracted_dir)
159
+
160
+ if not documents:
161
+ print("⚠️ No extracted documents found.")
162
+ print("Add PDFs to priips_documents/raw/ and run extract_priips.py")
163
+ sys.exit(1)
164
+
165
+ print(f"βœ… Loaded {len(documents)} documents")
166
+
167
+ # Build context
168
+ context = build_context(documents, args.query, args.max_context)
169
+
170
+ # Query LLM
171
+ print(f"\n❓ Question: {args.query}\n")
172
+ answer = query_llm(args.query, context, args.max_tokens)
173
+
174
+ print(f"\nπŸ’¬ Answer:\n{answer}\n")
175
+
176
+
177
+ if __name__ == "__main__":
178
+ main()
179
+
test_service.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick test script to verify the PRIIPs LLM Service is working
4
+ Run with: python test_service.py
5
+ """
6
+ import httpx
7
+ import json
8
+ import time
9
+ import os
10
+ from huggingface_hub import get_token
11
+
12
+ BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
13
+
14
+ # Get HF token for private Space access
15
+ HF_TOKEN = get_token()
16
+ if not HF_TOKEN:
17
+ print("⚠️ Warning: No HF token found. Private Space access may fail.")
18
+ print(" Run: huggingface-cli login")
19
+
20
+ def test_endpoint(name, method, url, json_data=None, timeout=10):
21
+ """Test a single endpoint"""
22
+ print(f"\n{'='*60}")
23
+ print(f"Testing: {name}")
24
+ print(f"{'='*60}")
25
+ print(f"URL: {url}")
26
+
27
+ # Add authentication headers for private Space
28
+ headers = {}
29
+ if HF_TOKEN:
30
+ headers["Authorization"] = f"Bearer {HF_TOKEN}"
31
+
32
+ try:
33
+ if method == "GET":
34
+ response = httpx.get(url, headers=headers, timeout=timeout)
35
+ else:
36
+ response = httpx.post(url, json=json_data, headers=headers, timeout=timeout)
37
+
38
+ print(f"Status: {response.status_code}")
39
+
40
+ if response.status_code == 200:
41
+ try:
42
+ data = response.json()
43
+ print(f"Response: {json.dumps(data, indent=2)[:500]}")
44
+ return True
45
+ except:
46
+ print(f"Response (text): {response.text[:200]}")
47
+ return False
48
+ else:
49
+ print(f"Error: {response.text[:200]}")
50
+ return False
51
+
52
+ except httpx.TimeoutException:
53
+ print(f"❌ Timeout after {timeout}s")
54
+ return False
55
+ except Exception as e:
56
+ print(f"❌ Error: {e}")
57
+ return False
58
+
59
+
60
+ def main():
61
+ print(f"\n{'#'*60}")
62
+ print("PRIIPs LLM Service - Quick Test Script")
63
+ print(f"Service: {BASE_URL}")
64
+ print(f"{'#'*60}")
65
+
66
+ results = {}
67
+
68
+ # Test 1: Root endpoint
69
+ results['root'] = test_endpoint(
70
+ "Root Endpoint",
71
+ "GET",
72
+ f"{BASE_URL}/"
73
+ )
74
+
75
+ # Test 2: Health endpoint
76
+ results['health'] = test_endpoint(
77
+ "Health Check",
78
+ "GET",
79
+ f"{BASE_URL}/health"
80
+ )
81
+
82
+ # Test 3: List models
83
+ results['models'] = test_endpoint(
84
+ "List Models",
85
+ "GET",
86
+ f"{BASE_URL}/v1/models"
87
+ )
88
+
89
+ # Test 4: Chat completion (this will load the model - may take 30s-1min first time)
90
+ print("\n" + "="*60)
91
+ print("Testing: Chat Completion (Model Loading)")
92
+ print("="*60)
93
+ print("⚠️ First request will take 30s-1min to load the model...")
94
+ print(" Please wait...")
95
+
96
+ chat_payload = {
97
+ "model": "DragonLLM/gemma3-12b-fin-v0.3",
98
+ "messages": [
99
+ {"role": "user", "content": "What is 2+2?"}
100
+ ],
101
+ "max_tokens": 50,
102
+ "temperature": 0.7
103
+ }
104
+
105
+ results['chat'] = test_endpoint(
106
+ "Chat Completion",
107
+ "POST",
108
+ f"{BASE_URL}/v1/chat/completions",
109
+ json_data=chat_payload,
110
+ timeout=120 # Longer timeout for model loading
111
+ )
112
+
113
+ # Summary
114
+ print(f"\n{'#'*60}")
115
+ print("SUMMARY")
116
+ print(f"{'#'*60}")
117
+
118
+ passed = sum(1 for v in results.values() if v)
119
+ total = len(results)
120
+
121
+ for test_name, success in results.items():
122
+ status = "βœ… PASS" if success else "❌ FAIL"
123
+ print(f"{status} - {test_name}")
124
+
125
+ print(f"\nResults: {passed}/{total} tests passed")
126
+
127
+ if passed == total:
128
+ print("\nπŸŽ‰ All tests passed! Service is fully operational.")
129
+ elif results.get('root') or results.get('health'):
130
+ print("\n⚠️ Service is responding but some endpoints failed.")
131
+ print(" This might be normal if model is still loading.")
132
+ else:
133
+ print("\n❌ Service is not accessible. Check:")
134
+ print(" 1. Space is running on HF dashboard")
135
+ print(" 2. No firewall/network issues")
136
+ print(" 3. Correct URL")
137
+
138
+
139
+ if __name__ == "__main__":
140
+ main()
141
+