Deploy PRIIPs LLM Service to HF Spaces + RAG workflow
Browse filesβ
Successful deployment:
- Model: DragonLLM/qwen3-8b-fin-v1.0 (8B parameters)
- Hardware: L4 GPU (24GB VRAM)
- Backend: vLLM with eager mode (stable)
- Context: 4096 tokens
- API: OpenAI-compatible at https://jeanbaptdzd-priips-llm-service.hf.space
π§ Configuration updates:
- Updated Dockerfile to CUDA 12.4.0, Python 3.11
- Configured vLLM with enforce_eager=True for L4 stability
- Set max_model_len=4096, gpu_memory_utilization=0.85
- Fixed KV cache memory allocation issues
- Background model initialization to avoid timeouts
- Config: allow extra fields in .env
π PRIIPS RAG Workflow:
- Created priips_documents/ directory structure (raw/extracted/processed)
- Added extract_priips.py: PDF β JSON extraction script
- Added query_with_context.py: RAG-powered query system
- Comprehensive documentation in PRIIPS_WORKFLOW.md
- Test service utilities
π― Tested and working:
- All API endpoints operational (/, /v1/models, /v1/chat/completions)
- Financial calculations: CAGR, returns
- Risk assessment: market/credit risk concepts
- PRIIPS knowledge: SRI, KID sections
- Information extraction from documents
- Ready for RAG integration with PydanticAI/DSPy
- Dockerfile +31 -14
- PRIIPS_WORKFLOW.md +182 -0
- README.md +5 -4
- app/config.py +2 -1
- app/main.py +13 -3
- app/middleware.py +11 -0
- app/providers/vllm.py +37 -11
- requirements.txt +2 -1
- scripts/extract_priips.py +182 -0
- scripts/query_with_context.py +179 -0
- test_service.py +141 -0
|
@@ -1,40 +1,57 @@
|
|
| 1 |
-
|
|
|
|
| 2 |
|
| 3 |
# Set environment variables
|
| 4 |
-
ENV DEBIAN_FRONTEND=noninteractive
|
| 5 |
ENV PYTHONUNBUFFERED=1
|
|
|
|
| 6 |
|
| 7 |
-
# Install Python and
|
| 8 |
RUN apt-get update && apt-get install -y \
|
| 9 |
python3.11 \
|
| 10 |
python3.11-dev \
|
| 11 |
python3-pip \
|
| 12 |
git \
|
| 13 |
curl \
|
| 14 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
# Upgrade pip
|
| 18 |
-
RUN
|
| 19 |
|
| 20 |
# Set working directory
|
| 21 |
WORKDIR /app
|
| 22 |
|
| 23 |
-
#
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
# Copy application code
|
| 30 |
COPY app/ ./app/
|
| 31 |
|
| 32 |
-
# Create a non-root user
|
| 33 |
-
RUN useradd -m -u 1000 user &&
|
|
|
|
|
|
|
|
|
|
| 34 |
USER user
|
| 35 |
|
| 36 |
-
# Set
|
| 37 |
ENV HF_HOME=/tmp/huggingface
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
# Expose port
|
| 40 |
EXPOSE 7860
|
|
|
|
| 1 |
+
# Use NVIDIA CUDA 12.4 base image (12.1 is deprecated)
|
| 2 |
+
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
|
| 3 |
|
| 4 |
# Set environment variables
|
|
|
|
| 5 |
ENV PYTHONUNBUFFERED=1
|
| 6 |
+
ENV DEBIAN_FRONTEND=noninteractive
|
| 7 |
|
| 8 |
+
# Install Python 3.11 and build dependencies
|
| 9 |
RUN apt-get update && apt-get install -y \
|
| 10 |
python3.11 \
|
| 11 |
python3.11-dev \
|
| 12 |
python3-pip \
|
| 13 |
git \
|
| 14 |
curl \
|
| 15 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 16 |
+
|
| 17 |
+
# Set Python 3.11 as default
|
| 18 |
+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
|
| 19 |
+
update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
|
| 20 |
|
| 21 |
# Upgrade pip
|
| 22 |
+
RUN python3 -m pip install --upgrade pip
|
| 23 |
|
| 24 |
# Set working directory
|
| 25 |
WORKDIR /app
|
| 26 |
|
| 27 |
+
# Install vLLM and dependencies in one layer for efficiency
|
| 28 |
+
RUN pip install --no-cache-dir \
|
| 29 |
+
vllm \
|
| 30 |
+
fastapi>=0.115.0 \
|
| 31 |
+
uvicorn[standard]>=0.30.0 \
|
| 32 |
+
pydantic>=2.8.0 \
|
| 33 |
+
pydantic-settings>=2.4.0 \
|
| 34 |
+
httpx>=0.27.0 \
|
| 35 |
+
python-dotenv>=1.0.1 \
|
| 36 |
+
tenacity>=8.3.0 \
|
| 37 |
+
PyMuPDF>=1.24.0
|
| 38 |
|
| 39 |
# Copy application code
|
| 40 |
COPY app/ ./app/
|
| 41 |
|
| 42 |
+
# Create a non-root user and set up cache directories
|
| 43 |
+
RUN useradd -m -u 1000 user && \
|
| 44 |
+
mkdir -p /tmp/huggingface /tmp/torch/inductor /tmp/triton && \
|
| 45 |
+
chown -R user:user /app /tmp/huggingface /tmp/torch /tmp/triton
|
| 46 |
+
|
| 47 |
USER user
|
| 48 |
|
| 49 |
+
# Set environment variables for optimal vLLM + torch.compile performance
|
| 50 |
ENV HF_HOME=/tmp/huggingface
|
| 51 |
+
ENV TORCHINDUCTOR_CACHE_DIR=/tmp/torch/inductor
|
| 52 |
+
ENV TRITON_CACHE_DIR=/tmp/triton
|
| 53 |
+
ENV TORCH_COMPILE_DEBUG=0
|
| 54 |
+
ENV CUDA_VISIBLE_DEVICES=0
|
| 55 |
|
| 56 |
# Expose port
|
| 57 |
EXPOSE 7860
|
|
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PRIIPS Document Extraction & RAG Workflow
|
| 2 |
+
|
| 3 |
+
Complete workflow for extracting PRIIPS KID documents and querying with LLM context.
|
| 4 |
+
|
| 5 |
+
## π Directory Structure
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
priips_documents/
|
| 9 |
+
βββ raw/ # Place your PDF documents here
|
| 10 |
+
βββ extracted/ # Extracted JSON documents (auto-generated)
|
| 11 |
+
βββ processed/ # Chunked documents for RAG (future)
|
| 12 |
+
|
| 13 |
+
scripts/
|
| 14 |
+
βββ extract_priips.py # Extract text from PDFs
|
| 15 |
+
βββ query_with_context.py # Query LLM with document context
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
## π Quick Start
|
| 19 |
+
|
| 20 |
+
### 1. Add PRIIPS Documents
|
| 21 |
+
|
| 22 |
+
Place PDF documents in `priips_documents/raw/`:
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
# Naming convention: {ISIN}_{ProductName}_{Date}.pdf
|
| 26 |
+
cp /path/to/your/priips.pdf priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
### 2. Extract Document Content
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
# Extract all PDFs in the raw directory
|
| 33 |
+
python scripts/extract_priips.py priips_documents/raw/
|
| 34 |
+
|
| 35 |
+
# Or extract a single file
|
| 36 |
+
python scripts/extract_priips.py priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
**Output:** JSON files in `priips_documents/extracted/` with structured content:
|
| 40 |
+
- Metadata (ISIN, product name, dates)
|
| 41 |
+
- Raw extracted text
|
| 42 |
+
- Parsed sections (objectives, risks, costs, etc.)
|
| 43 |
+
|
| 44 |
+
### 3. Query with RAG Context
|
| 45 |
+
|
| 46 |
+
```bash
|
| 47 |
+
# Ask questions about your documents
|
| 48 |
+
python scripts/query_with_context.py "What is the recommended holding period?"
|
| 49 |
+
|
| 50 |
+
python scripts/query_with_context.py "What are the main risks of this investment?"
|
| 51 |
+
|
| 52 |
+
python scripts/query_with_context.py "Summarize the cost structure"
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
**Options:**
|
| 56 |
+
```bash
|
| 57 |
+
# Specify different extracted directory
|
| 58 |
+
python scripts/query_with_context.py "Your question" --extracted-dir custom/path/
|
| 59 |
+
|
| 60 |
+
# Control context size and response length
|
| 61 |
+
python scripts/query_with_context.py "Your question" \
|
| 62 |
+
--max-context 3000 \
|
| 63 |
+
--max-tokens 800
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## π Example Workflow
|
| 67 |
+
|
| 68 |
+
```bash
|
| 69 |
+
# 1. Add a PRIIPS PDF
|
| 70 |
+
cp MyFund.pdf priips_documents/raw/FR0012345678_MyFund_2024.pdf
|
| 71 |
+
|
| 72 |
+
# 2. Extract content
|
| 73 |
+
python scripts/extract_priips.py priips_documents/raw/
|
| 74 |
+
|
| 75 |
+
# Output:
|
| 76 |
+
# π Processing: FR0012345678_MyFund_2024.pdf
|
| 77 |
+
# β
Extracted 12,543 characters
|
| 78 |
+
# πΎ Saved to: priips_documents/extracted/FR0012345678_MyFund_2024_extracted.json
|
| 79 |
+
|
| 80 |
+
# 3. Query the LLM
|
| 81 |
+
python scripts/query_with_context.py "What is the SRI of this fund?"
|
| 82 |
+
|
| 83 |
+
# Output:
|
| 84 |
+
# π Loading documents from priips_documents/extracted...
|
| 85 |
+
# β
Loaded 1 documents
|
| 86 |
+
# π Querying LLM with 1,234 chars of context...
|
| 87 |
+
# π Tokens used: 234
|
| 88 |
+
#
|
| 89 |
+
# π¬ Answer:
|
| 90 |
+
# Based on the PRIIPS document, the Summary Risk Indicator (SRI) for this fund is 5 out of 7...
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## π― Use Cases
|
| 94 |
+
|
| 95 |
+
### Document Comparison
|
| 96 |
+
```bash
|
| 97 |
+
python scripts/query_with_context.py "Compare the risk profiles of all available funds"
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Specific Information Extraction
|
| 101 |
+
```bash
|
| 102 |
+
python scripts/query_with_context.py "Extract all recommended holding periods"
|
| 103 |
+
python scripts/query_with_context.py "List all ISINs and their product names"
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### Compliance Checks
|
| 107 |
+
```bash
|
| 108 |
+
python scripts/query_with_context.py "Are there any funds with SRI above 6?"
|
| 109 |
+
python scripts/query_with_context.py "Which funds have holding periods under 3 years?"
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## π§ Advanced: Integrate with PydanticAI
|
| 113 |
+
|
| 114 |
+
```python
|
| 115 |
+
from pydantic_ai import Agent
|
| 116 |
+
from pydantic_ai.models.openai import OpenAIModel
|
| 117 |
+
|
| 118 |
+
# Configure with your deployed service
|
| 119 |
+
model = OpenAIModel(
|
| 120 |
+
'DragonLLM/qwen3-8b-fin-v1.0',
|
| 121 |
+
base_url='https://jeanbaptdzd-priips-llm-service.hf.space/v1',
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
agent = Agent(model=model)
|
| 125 |
+
|
| 126 |
+
# Load PRIIPS context
|
| 127 |
+
with open('priips_documents/extracted/LU123_extracted.json') as f:
|
| 128 |
+
context = json.load(f)
|
| 129 |
+
|
| 130 |
+
# Query with context
|
| 131 |
+
result = agent.run_sync(
|
| 132 |
+
f"Based on this PRIIPS document: {context['raw_text'][:2000]}... "
|
| 133 |
+
f"What is the recommended holding period?"
|
| 134 |
+
)
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
## π Extracted Document Schema
|
| 138 |
+
|
| 139 |
+
```json
|
| 140 |
+
{
|
| 141 |
+
"metadata": {
|
| 142 |
+
"filename": "LU1234567890_GlobalEquity_2024.pdf",
|
| 143 |
+
"extraction_date": "2024-10-28T16:24:00",
|
| 144 |
+
"isin": "LU1234567890",
|
| 145 |
+
"product_name": "GlobalEquity",
|
| 146 |
+
"file_size_bytes": 245678,
|
| 147 |
+
"text_length": 12543
|
| 148 |
+
},
|
| 149 |
+
"raw_text": "Full extracted text from PDF...",
|
| 150 |
+
"sections": {
|
| 151 |
+
"summary": "What is this product? ...",
|
| 152 |
+
"objectives": "Investment objectives and policy...",
|
| 153 |
+
"risk_indicator": "SRI: 5/7 ...",
|
| 154 |
+
"performance_scenarios": "Performance scenarios...",
|
| 155 |
+
"costs": "What are the costs? ...",
|
| 156 |
+
"holding_period": "Recommended: 5 years"
|
| 157 |
+
}
|
| 158 |
+
}
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
## π Next Steps
|
| 162 |
+
|
| 163 |
+
1. **Add More Documents:** Place additional PRIIPS PDFs in `raw/`
|
| 164 |
+
2. **Enhance Extraction:** Improve section parsing in `extract_priips.py`
|
| 165 |
+
3. **Add Embeddings:** Implement vector search for better RAG
|
| 166 |
+
4. **Build API:** Create REST API endpoints for document queries
|
| 167 |
+
5. **Dashboard:** Build web UI for document management and queries
|
| 168 |
+
|
| 169 |
+
## π API Integration
|
| 170 |
+
|
| 171 |
+
The LLM service is OpenAI-compatible and deployed at:
|
| 172 |
+
```
|
| 173 |
+
https://jeanbaptdzd-priips-llm-service.hf.space/v1
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
**Endpoints:**
|
| 177 |
+
- `GET /` - Service status
|
| 178 |
+
- `GET /v1/models` - List available models
|
| 179 |
+
- `POST /v1/chat/completions` - Chat completion with context
|
| 180 |
+
|
| 181 |
+
See `test_service.py` for integration examples.
|
| 182 |
+
|
|
@@ -7,11 +7,12 @@ sdk: docker
|
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
app_port: 7860
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# PRIIPs LLM Service - Hugging Face Spaces
|
| 13 |
|
| 14 |
-
OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/
|
| 15 |
|
| 16 |
## π Quick Start
|
| 17 |
|
|
@@ -34,7 +35,7 @@ curl -X GET "https://your-space-url.hf.space/v1/models"
|
|
| 34 |
curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
|
| 35 |
-H "Content-Type: application/json" \
|
| 36 |
-d '{
|
| 37 |
-
"model": "DragonLLM/
|
| 38 |
"messages": [{"role": "user", "content": "Hello!"}],
|
| 39 |
"temperature": 0.7
|
| 40 |
}'
|
|
@@ -95,7 +96,7 @@ from pydantic_ai import Agent
|
|
| 95 |
from pydantic_ai.models.openai import OpenAIModel
|
| 96 |
|
| 97 |
model = OpenAIModel(
|
| 98 |
-
"DragonLLM/
|
| 99 |
base_url="https://your-space-url.hf.space/v1"
|
| 100 |
)
|
| 101 |
|
|
@@ -107,7 +108,7 @@ agent = Agent(model=model)
|
|
| 107 |
import dspy
|
| 108 |
|
| 109 |
lm = dspy.OpenAI(
|
| 110 |
-
model="DragonLLM/
|
| 111 |
api_base="https://your-space-url.hf.space/v1"
|
| 112 |
)
|
| 113 |
```
|
|
|
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
app_port: 7860
|
| 10 |
+
hardware: l4
|
| 11 |
---
|
| 12 |
|
| 13 |
# PRIIPs LLM Service - Hugging Face Spaces
|
| 14 |
|
| 15 |
+
OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
|
| 16 |
|
| 17 |
## π Quick Start
|
| 18 |
|
|
|
|
| 35 |
curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
|
| 36 |
-H "Content-Type: application/json" \
|
| 37 |
-d '{
|
| 38 |
+
"model": "DragonLLM/gemma3-12b-fin-v0.3",
|
| 39 |
"messages": [{"role": "user", "content": "Hello!"}],
|
| 40 |
"temperature": 0.7
|
| 41 |
}'
|
|
|
|
| 96 |
from pydantic_ai.models.openai import OpenAIModel
|
| 97 |
|
| 98 |
model = OpenAIModel(
|
| 99 |
+
"DragonLLM/gemma3-12b-fin-v0.3",
|
| 100 |
base_url="https://your-space-url.hf.space/v1"
|
| 101 |
)
|
| 102 |
|
|
|
|
| 108 |
import dspy
|
| 109 |
|
| 110 |
lm = dspy.OpenAI(
|
| 111 |
+
model="DragonLLM/gemma3-12b-fin-v0.3",
|
| 112 |
api_base="https://your-space-url.hf.space/v1"
|
| 113 |
)
|
| 114 |
```
|
|
@@ -3,13 +3,14 @@ from pydantic_settings import BaseSettings
|
|
| 3 |
|
| 4 |
class Settings(BaseSettings):
|
| 5 |
vllm_base_url: str = "http://localhost:8000/v1"
|
| 6 |
-
model: str = "DragonLLM/
|
| 7 |
service_api_key: str | None = None
|
| 8 |
log_level: str = "info"
|
| 9 |
|
| 10 |
class Config:
|
| 11 |
env_file = ".env"
|
| 12 |
env_file_encoding = "utf-8"
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
settings = Settings()
|
|
|
|
| 3 |
|
| 4 |
class Settings(BaseSettings):
|
| 5 |
vllm_base_url: str = "http://localhost:8000/v1"
|
| 6 |
+
model: str = "DragonLLM/qwen3-8b-fin-v1.0"
|
| 7 |
service_api_key: str | None = None
|
| 8 |
log_level: str = "info"
|
| 9 |
|
| 10 |
class Config:
|
| 11 |
env_file = ".env"
|
| 12 |
env_file_encoding = "utf-8"
|
| 13 |
+
extra = "ignore" # Ignore extra fields in .env
|
| 14 |
|
| 15 |
|
| 16 |
settings = Settings()
|
|
@@ -18,9 +18,19 @@ app.middleware("http")(api_key_guard)
|
|
| 18 |
|
| 19 |
@app.on_event("startup")
|
| 20 |
async def startup_event():
|
| 21 |
-
"""
|
|
|
|
| 22 |
logger.info("Starting PRIIPs LLM Service...")
|
| 23 |
-
logger.info("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
@app.get("/")
|
| 26 |
async def root():
|
|
@@ -28,7 +38,7 @@ async def root():
|
|
| 28 |
"status": "ok",
|
| 29 |
"service": "PRIIPs LLM Service",
|
| 30 |
"version": "1.0.0",
|
| 31 |
-
|
| 32 |
"backend": "vLLM"
|
| 33 |
}
|
| 34 |
|
|
|
|
| 18 |
|
| 19 |
@app.on_event("startup")
|
| 20 |
async def startup_event():
|
| 21 |
+
"""Startup event - initialize model in background"""
|
| 22 |
+
import threading
|
| 23 |
logger.info("Starting PRIIPs LLM Service...")
|
| 24 |
+
logger.info("Initializing model in background thread...")
|
| 25 |
+
|
| 26 |
+
def load_model():
|
| 27 |
+
from app.providers.vllm import initialize_vllm
|
| 28 |
+
initialize_vllm()
|
| 29 |
+
|
| 30 |
+
# Start model loading in background thread
|
| 31 |
+
thread = threading.Thread(target=load_model, daemon=True)
|
| 32 |
+
thread.start()
|
| 33 |
+
logger.info("Model initialization started in background")
|
| 34 |
|
| 35 |
@app.get("/")
|
| 36 |
async def root():
|
|
|
|
| 38 |
"status": "ok",
|
| 39 |
"service": "PRIIPs LLM Service",
|
| 40 |
"version": "1.0.0",
|
| 41 |
+
"model": "DragonLLM/qwen3-8b-fin-v1.0",
|
| 42 |
"backend": "vLLM"
|
| 43 |
}
|
| 44 |
|
|
@@ -5,11 +5,22 @@ from app.config import settings
|
|
| 5 |
|
| 6 |
|
| 7 |
async def api_key_guard(request: Request, call_next):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
if not settings.service_api_key:
|
| 9 |
return await call_next(request)
|
|
|
|
|
|
|
| 10 |
key = request.headers.get("x-api-key") or request.headers.get("authorization")
|
| 11 |
if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
|
| 12 |
return await call_next(request)
|
|
|
|
| 13 |
return JSONResponse({"error": "unauthorized"}, status_code=401)
|
| 14 |
|
| 15 |
|
|
|
|
| 5 |
|
| 6 |
|
| 7 |
async def api_key_guard(request: Request, call_next):
|
| 8 |
+
# Public endpoints that don't require authentication
|
| 9 |
+
public_paths = ["/", "/health", "/docs", "/redoc", "/openapi.json"]
|
| 10 |
+
|
| 11 |
+
# Skip auth for public endpoints
|
| 12 |
+
if request.url.path in public_paths:
|
| 13 |
+
return await call_next(request)
|
| 14 |
+
|
| 15 |
+
# Skip auth if no API key is configured
|
| 16 |
if not settings.service_api_key:
|
| 17 |
return await call_next(request)
|
| 18 |
+
|
| 19 |
+
# Check API key
|
| 20 |
key = request.headers.get("x-api-key") or request.headers.get("authorization")
|
| 21 |
if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
|
| 22 |
return await call_next(request)
|
| 23 |
+
|
| 24 |
return JSONResponse({"error": "unauthorized"}, status_code=401)
|
| 25 |
|
| 26 |
|
|
@@ -3,9 +3,10 @@ from typing import Dict, Any, AsyncIterator
|
|
| 3 |
from vllm import LLM, SamplingParams
|
| 4 |
from vllm.entrypoints.openai.api_server import build_async_engine_client
|
| 5 |
import asyncio
|
|
|
|
| 6 |
|
| 7 |
-
# Model configuration
|
| 8 |
-
model_name = "DragonLLM/
|
| 9 |
llm_engine = None
|
| 10 |
|
| 11 |
def initialize_vllm():
|
|
@@ -15,26 +16,51 @@ def initialize_vllm():
|
|
| 15 |
if llm_engine is None:
|
| 16 |
print(f"Initializing vLLM with model: {model_name}")
|
| 17 |
|
| 18 |
-
# Get HF token from environment
|
| 19 |
-
|
|
|
|
| 20 |
if hf_token:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
os.environ["HF_TOKEN"] = hf_token
|
| 22 |
os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
try:
|
| 25 |
-
# Initialize vLLM engine
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
llm_engine = LLM(
|
| 27 |
model=model_name,
|
| 28 |
trust_remote_code=True,
|
| 29 |
-
dtype="
|
| 30 |
-
max_model_len=4096,
|
| 31 |
-
gpu_memory_utilization=0.
|
| 32 |
-
tensor_parallel_size=1, #
|
| 33 |
download_dir="/tmp/huggingface",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
)
|
| 35 |
-
print(f"vLLM engine initialized successfully!")
|
| 36 |
except Exception as e:
|
| 37 |
-
print(f"Error initializing vLLM: {e}")
|
| 38 |
raise
|
| 39 |
|
| 40 |
|
|
|
|
| 3 |
from vllm import LLM, SamplingParams
|
| 4 |
from vllm.entrypoints.openai.api_server import build_async_engine_client
|
| 5 |
import asyncio
|
| 6 |
+
from huggingface_hub import login
|
| 7 |
|
| 8 |
+
# Model configuration - optimized for 8B Qwen3 on L4
|
| 9 |
+
model_name = "DragonLLM/qwen3-8b-fin-v1.0"
|
| 10 |
llm_engine = None
|
| 11 |
|
| 12 |
def initialize_vllm():
|
|
|
|
| 16 |
if llm_engine is None:
|
| 17 |
print(f"Initializing vLLM with model: {model_name}")
|
| 18 |
|
| 19 |
+
# Get HF token from environment (Hugging Face Space secret)
|
| 20 |
+
# Try HF_TOKEN_LC2 first (for DragonLLM access), then fall back to HF_TOKEN_LC
|
| 21 |
+
hf_token = os.getenv("HF_TOKEN_LC2") or os.getenv("HF_TOKEN_LC")
|
| 22 |
if hf_token:
|
| 23 |
+
token_source = "HF_TOKEN_LC2" if os.getenv("HF_TOKEN_LC2") else "HF_TOKEN_LC"
|
| 24 |
+
print(f"β
{token_source} found (length: {len(hf_token)})")
|
| 25 |
+
# Properly authenticate with Hugging Face Hub
|
| 26 |
+
try:
|
| 27 |
+
login(token=hf_token, add_to_git_credential=False)
|
| 28 |
+
print("β
Successfully authenticated with Hugging Face Hub")
|
| 29 |
+
except Exception as e:
|
| 30 |
+
print(f"β οΈ Warning: Failed to authenticate with HF Hub: {e}")
|
| 31 |
+
# Also set environment variables as fallback
|
| 32 |
os.environ["HF_TOKEN"] = hf_token
|
| 33 |
os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
|
| 34 |
+
else:
|
| 35 |
+
print("β οΈ WARNING: Neither HF_TOKEN_LC2 nor HF_TOKEN_LC found in environment!")
|
| 36 |
+
print("Available env vars:", list(os.environ.keys()))
|
| 37 |
|
| 38 |
try:
|
| 39 |
+
# Initialize vLLM engine with explicit token
|
| 40 |
+
print(f"Attempting to load model: {model_name}")
|
| 41 |
+
print(f"Model type: Qwen3 8B (bfloat16) - Optimized for L4 with torch.compile")
|
| 42 |
+
print(f"Download directory: /tmp/huggingface")
|
| 43 |
+
print(f"Trust remote code: True")
|
| 44 |
+
print(f"L4 GPU: 24GB VRAM available")
|
| 45 |
+
print(f"Mode: Eager mode (CUDA graphs disabled for L4)")
|
| 46 |
+
|
| 47 |
llm_engine = LLM(
|
| 48 |
model=model_name,
|
| 49 |
trust_remote_code=True,
|
| 50 |
+
dtype="bfloat16", # Use bfloat16 for Qwen3 (required)
|
| 51 |
+
max_model_len=4096, # Reduced for L4 KV cache constraints
|
| 52 |
+
gpu_memory_utilization=0.85, # Increased to fit KV cache
|
| 53 |
+
tensor_parallel_size=1, # Single L4 GPU
|
| 54 |
download_dir="/tmp/huggingface",
|
| 55 |
+
tokenizer_mode="auto",
|
| 56 |
+
# Disable torch.compile on L4 due to memory constraints
|
| 57 |
+
enforce_eager=True, # Use eager mode (no CUDA graphs/compilation)
|
| 58 |
+
# Let vLLM handle compilation and fallback gracefully
|
| 59 |
+
disable_log_stats=False, # Enable logging for debugging
|
| 60 |
)
|
| 61 |
+
print(f"β
vLLM engine initialized successfully with {model_name}!")
|
| 62 |
except Exception as e:
|
| 63 |
+
print(f"β Error initializing vLLM: {e}")
|
| 64 |
raise
|
| 65 |
|
| 66 |
|
|
@@ -1,3 +1,5 @@
|
|
|
|
|
|
|
|
| 1 |
fastapi>=0.115.0
|
| 2 |
uvicorn[standard]>=0.30.0
|
| 3 |
pydantic>=2.8.0
|
|
@@ -7,4 +9,3 @@ python-dotenv>=1.0.1
|
|
| 7 |
tenacity>=8.3.0
|
| 8 |
PyMuPDF>=1.24.0
|
| 9 |
pytest>=7.4.0
|
| 10 |
-
|
|
|
|
| 1 |
+
# Dependencies installed in Dockerfile during HF Space build
|
| 2 |
+
vllm
|
| 3 |
fastapi>=0.115.0
|
| 4 |
uvicorn[standard]>=0.30.0
|
| 5 |
pydantic>=2.8.0
|
|
|
|
| 9 |
tenacity>=8.3.0
|
| 10 |
PyMuPDF>=1.24.0
|
| 11 |
pytest>=7.4.0
|
|
|
|
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
PRIIPS Document Extraction Script
|
| 4 |
+
|
| 5 |
+
Extracts text from PRIIPS KID PDFs and processes them for RAG context.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import sys
|
| 9 |
+
import json
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
from datetime import datetime
|
| 12 |
+
import argparse
|
| 13 |
+
|
| 14 |
+
# Add parent directory to path
|
| 15 |
+
sys.path.insert(0, str(Path(__file__).parent.parent))
|
| 16 |
+
|
| 17 |
+
from app.utils.pdf import extract_text_from_pdf
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def extract_priips_document(pdf_path: Path, output_dir: Path) -> dict:
|
| 21 |
+
"""
|
| 22 |
+
Extract content from a PRIIPS KID PDF.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
pdf_path: Path to the PDF file
|
| 26 |
+
output_dir: Directory to save extracted content
|
| 27 |
+
|
| 28 |
+
Returns:
|
| 29 |
+
Dictionary with extracted content
|
| 30 |
+
"""
|
| 31 |
+
print(f"π Processing: {pdf_path.name}")
|
| 32 |
+
|
| 33 |
+
# Extract text from PDF
|
| 34 |
+
try:
|
| 35 |
+
raw_text = extract_text_from_pdf(pdf_path)
|
| 36 |
+
print(f"β
Extracted {len(raw_text)} characters")
|
| 37 |
+
except Exception as e:
|
| 38 |
+
print(f"β Error extracting PDF: {e}")
|
| 39 |
+
return None
|
| 40 |
+
|
| 41 |
+
# Parse filename for metadata
|
| 42 |
+
filename_parts = pdf_path.stem.split("_")
|
| 43 |
+
isin = filename_parts[0] if len(filename_parts) > 0 else "UNKNOWN"
|
| 44 |
+
product_name = filename_parts[1] if len(filename_parts) > 1 else pdf_path.stem
|
| 45 |
+
|
| 46 |
+
# Create structured output
|
| 47 |
+
extracted_data = {
|
| 48 |
+
"metadata": {
|
| 49 |
+
"filename": pdf_path.name,
|
| 50 |
+
"extraction_date": datetime.now().isoformat(),
|
| 51 |
+
"isin": isin,
|
| 52 |
+
"product_name": product_name,
|
| 53 |
+
"file_size_bytes": pdf_path.stat().st_size,
|
| 54 |
+
"text_length": len(raw_text)
|
| 55 |
+
},
|
| 56 |
+
"raw_text": raw_text,
|
| 57 |
+
"sections": extract_sections(raw_text)
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
# Save to JSON
|
| 61 |
+
output_path = output_dir / f"{pdf_path.stem}_extracted.json"
|
| 62 |
+
with open(output_path, "w", encoding="utf-8") as f:
|
| 63 |
+
json.dump(extracted_data, f, indent=2, ensure_ascii=False)
|
| 64 |
+
|
| 65 |
+
print(f"πΎ Saved to: {output_path}")
|
| 66 |
+
return extracted_data
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def extract_sections(text: str) -> dict:
|
| 70 |
+
"""
|
| 71 |
+
Extract common PRIIPS KID sections from text.
|
| 72 |
+
|
| 73 |
+
This is a simple implementation. Can be enhanced with LLM-based extraction.
|
| 74 |
+
"""
|
| 75 |
+
sections = {}
|
| 76 |
+
|
| 77 |
+
# Common PRIIPS section keywords
|
| 78 |
+
keywords = {
|
| 79 |
+
"summary": ["what is this product", "summary"],
|
| 80 |
+
"objectives": ["objectives", "investment objectives"],
|
| 81 |
+
"risk_indicator": ["risk indicator", "sri", "summary risk"],
|
| 82 |
+
"performance_scenarios": ["performance scenarios", "what could i get"],
|
| 83 |
+
"costs": ["what are the costs", "costs"],
|
| 84 |
+
"holding_period": ["recommended holding period", "holding period"]
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
text_lower = text.lower()
|
| 88 |
+
|
| 89 |
+
for section_name, search_terms in keywords.items():
|
| 90 |
+
for term in search_terms:
|
| 91 |
+
if term in text_lower:
|
| 92 |
+
# Extract a snippet around the keyword
|
| 93 |
+
start_idx = text_lower.find(term)
|
| 94 |
+
# Get 500 chars after the keyword
|
| 95 |
+
snippet = text[start_idx:start_idx + 500].strip()
|
| 96 |
+
sections[section_name] = snippet
|
| 97 |
+
break
|
| 98 |
+
|
| 99 |
+
return sections
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
def batch_process_directory(input_dir: Path, output_dir: Path):
|
| 103 |
+
"""Process all PDFs in a directory."""
|
| 104 |
+
pdf_files = list(input_dir.glob("*.pdf"))
|
| 105 |
+
|
| 106 |
+
if not pdf_files:
|
| 107 |
+
print(f"β οΈ No PDF files found in {input_dir}")
|
| 108 |
+
return
|
| 109 |
+
|
| 110 |
+
print(f"π¦ Found {len(pdf_files)} PDF files to process\n")
|
| 111 |
+
|
| 112 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 113 |
+
|
| 114 |
+
results = []
|
| 115 |
+
for pdf_path in pdf_files:
|
| 116 |
+
result = extract_priips_document(pdf_path, output_dir)
|
| 117 |
+
if result:
|
| 118 |
+
results.append(result)
|
| 119 |
+
print() # Blank line between files
|
| 120 |
+
|
| 121 |
+
# Save summary
|
| 122 |
+
summary_path = output_dir / "_extraction_summary.json"
|
| 123 |
+
summary = {
|
| 124 |
+
"extraction_date": datetime.now().isoformat(),
|
| 125 |
+
"total_processed": len(results),
|
| 126 |
+
"total_failed": len(pdf_files) - len(results),
|
| 127 |
+
"files": [r["metadata"] for r in results]
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
with open(summary_path, "w", encoding="utf-8") as f:
|
| 131 |
+
json.dump(summary, f, indent=2)
|
| 132 |
+
|
| 133 |
+
print(f"\nβ
Processed {len(results)}/{len(pdf_files)} files successfully")
|
| 134 |
+
print(f"π Summary saved to: {summary_path}")
|
| 135 |
+
|
| 136 |
+
|
| 137 |
+
def main():
|
| 138 |
+
parser = argparse.ArgumentParser(
|
| 139 |
+
description="Extract PRIIPS KID documents for RAG context"
|
| 140 |
+
)
|
| 141 |
+
parser.add_argument(
|
| 142 |
+
"input",
|
| 143 |
+
type=str,
|
| 144 |
+
help="Input PDF file or directory containing PDFs"
|
| 145 |
+
)
|
| 146 |
+
parser.add_argument(
|
| 147 |
+
"--output",
|
| 148 |
+
type=str,
|
| 149 |
+
default=None,
|
| 150 |
+
help="Output directory (default: priips_documents/extracted/)"
|
| 151 |
+
)
|
| 152 |
+
|
| 153 |
+
args = parser.parse_args()
|
| 154 |
+
|
| 155 |
+
# Setup paths
|
| 156 |
+
workspace_root = Path(__file__).parent.parent
|
| 157 |
+
input_path = Path(args.input)
|
| 158 |
+
|
| 159 |
+
if not input_path.is_absolute():
|
| 160 |
+
input_path = workspace_root / input_path
|
| 161 |
+
|
| 162 |
+
if args.output:
|
| 163 |
+
output_dir = Path(args.output)
|
| 164 |
+
if not output_dir.is_absolute():
|
| 165 |
+
output_dir = workspace_root / output_dir
|
| 166 |
+
else:
|
| 167 |
+
output_dir = workspace_root / "priips_documents" / "extracted"
|
| 168 |
+
|
| 169 |
+
# Process
|
| 170 |
+
if input_path.is_file():
|
| 171 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 172 |
+
extract_priips_document(input_path, output_dir)
|
| 173 |
+
elif input_path.is_dir():
|
| 174 |
+
batch_process_directory(input_path, output_dir)
|
| 175 |
+
else:
|
| 176 |
+
print(f"β Error: {input_path} does not exist")
|
| 177 |
+
sys.exit(1)
|
| 178 |
+
|
| 179 |
+
|
| 180 |
+
if __name__ == "__main__":
|
| 181 |
+
main()
|
| 182 |
+
|
|
@@ -0,0 +1,179 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Query LLM with PRIIPS Document Context
|
| 4 |
+
|
| 5 |
+
Loads extracted PRIIPS documents and queries the LLM with RAG context.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import sys
|
| 9 |
+
import json
|
| 10 |
+
import argparse
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
from typing import List, Dict
|
| 13 |
+
import requests
|
| 14 |
+
|
| 15 |
+
# Configuration
|
| 16 |
+
BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
|
| 17 |
+
MODEL = "DragonLLM/qwen3-8b-fin-v1.0"
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def load_extracted_documents(extracted_dir: Path) -> List[Dict]:
|
| 21 |
+
"""Load all extracted PRIIPS documents."""
|
| 22 |
+
documents = []
|
| 23 |
+
|
| 24 |
+
for json_file in extracted_dir.glob("*_extracted.json"):
|
| 25 |
+
if json_file.name.startswith("_"):
|
| 26 |
+
continue # Skip summary files
|
| 27 |
+
|
| 28 |
+
with open(json_file, "r", encoding="utf-8") as f:
|
| 29 |
+
documents.append(json.load(f))
|
| 30 |
+
|
| 31 |
+
return documents
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def build_context(documents: List[Dict], query: str, max_chars: int = 2000) -> str:
|
| 35 |
+
"""
|
| 36 |
+
Build RAG context from documents relevant to the query.
|
| 37 |
+
|
| 38 |
+
Simple implementation: include all document summaries.
|
| 39 |
+
Can be enhanced with semantic search/embeddings.
|
| 40 |
+
"""
|
| 41 |
+
context_parts = []
|
| 42 |
+
total_chars = 0
|
| 43 |
+
|
| 44 |
+
for doc in documents:
|
| 45 |
+
metadata = doc["metadata"]
|
| 46 |
+
|
| 47 |
+
# Build a summary of this document
|
| 48 |
+
doc_summary = f"\n--- Document: {metadata['product_name']} (ISIN: {metadata['isin']}) ---\n"
|
| 49 |
+
|
| 50 |
+
# Include extracted sections
|
| 51 |
+
if "sections" in doc and doc["sections"]:
|
| 52 |
+
for section_name, content in doc["sections"].items():
|
| 53 |
+
if content:
|
| 54 |
+
section_text = f"\n{section_name.upper()}:\n{content[:300]}...\n"
|
| 55 |
+
doc_summary += section_text
|
| 56 |
+
|
| 57 |
+
# Check if we have space
|
| 58 |
+
if total_chars + len(doc_summary) > max_chars:
|
| 59 |
+
break
|
| 60 |
+
|
| 61 |
+
context_parts.append(doc_summary)
|
| 62 |
+
total_chars += len(doc_summary)
|
| 63 |
+
|
| 64 |
+
if not context_parts:
|
| 65 |
+
return "No relevant documents found."
|
| 66 |
+
|
| 67 |
+
return "\n".join(context_parts)
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def query_llm(query: str, context: str, max_tokens: int = 500) -> str:
|
| 71 |
+
"""Query the LLM with context."""
|
| 72 |
+
|
| 73 |
+
# Build the prompt with context
|
| 74 |
+
prompt = f"""You are a financial expert assistant specializing in PRIIPS Key Information Documents.
|
| 75 |
+
|
| 76 |
+
Use the following context from PRIIPS documents to answer the question:
|
| 77 |
+
|
| 78 |
+
{context}
|
| 79 |
+
|
| 80 |
+
Question: {query}
|
| 81 |
+
|
| 82 |
+
Provide a clear, accurate answer based on the context provided. If the context doesn't contain enough information, say so."""
|
| 83 |
+
|
| 84 |
+
payload = {
|
| 85 |
+
"model": MODEL,
|
| 86 |
+
"messages": [
|
| 87 |
+
{"role": "system", "content": "You are a PRIIPS financial document expert."},
|
| 88 |
+
{"role": "user", "content": prompt}
|
| 89 |
+
],
|
| 90 |
+
"max_tokens": max_tokens,
|
| 91 |
+
"temperature": 0.3 # Lower temperature for more factual responses
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
print(f"π Querying LLM with {len(context)} chars of context...")
|
| 95 |
+
|
| 96 |
+
try:
|
| 97 |
+
response = requests.post(
|
| 98 |
+
f"{BASE_URL}/v1/chat/completions",
|
| 99 |
+
json=payload,
|
| 100 |
+
timeout=60
|
| 101 |
+
)
|
| 102 |
+
response.raise_for_status()
|
| 103 |
+
|
| 104 |
+
data = response.json()
|
| 105 |
+
answer = data["choices"][0]["message"]["content"]
|
| 106 |
+
|
| 107 |
+
# Print usage stats
|
| 108 |
+
usage = data.get("usage", {})
|
| 109 |
+
print(f"π Tokens used: {usage.get('total_tokens', 'N/A')}")
|
| 110 |
+
|
| 111 |
+
return answer
|
| 112 |
+
|
| 113 |
+
except Exception as e:
|
| 114 |
+
return f"Error querying LLM: {e}"
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def main():
|
| 118 |
+
parser = argparse.ArgumentParser(
|
| 119 |
+
description="Query LLM with PRIIPS document context"
|
| 120 |
+
)
|
| 121 |
+
parser.add_argument(
|
| 122 |
+
"query",
|
| 123 |
+
type=str,
|
| 124 |
+
help="Question to ask about PRIIPS documents"
|
| 125 |
+
)
|
| 126 |
+
parser.add_argument(
|
| 127 |
+
"--extracted-dir",
|
| 128 |
+
type=str,
|
| 129 |
+
default="priips_documents/extracted",
|
| 130 |
+
help="Directory containing extracted documents"
|
| 131 |
+
)
|
| 132 |
+
parser.add_argument(
|
| 133 |
+
"--max-context",
|
| 134 |
+
type=int,
|
| 135 |
+
default=2000,
|
| 136 |
+
help="Maximum context characters to include"
|
| 137 |
+
)
|
| 138 |
+
parser.add_argument(
|
| 139 |
+
"--max-tokens",
|
| 140 |
+
type=int,
|
| 141 |
+
default=500,
|
| 142 |
+
help="Maximum tokens in response"
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
args = parser.parse_args()
|
| 146 |
+
|
| 147 |
+
# Setup paths
|
| 148 |
+
workspace_root = Path(__file__).parent.parent
|
| 149 |
+
extracted_dir = workspace_root / args.extracted_dir
|
| 150 |
+
|
| 151 |
+
if not extracted_dir.exists():
|
| 152 |
+
print(f"β Directory not found: {extracted_dir}")
|
| 153 |
+
print("Run extract_priips.py first to extract documents.")
|
| 154 |
+
sys.exit(1)
|
| 155 |
+
|
| 156 |
+
# Load documents
|
| 157 |
+
print(f"π Loading documents from {extracted_dir}...")
|
| 158 |
+
documents = load_extracted_documents(extracted_dir)
|
| 159 |
+
|
| 160 |
+
if not documents:
|
| 161 |
+
print("β οΈ No extracted documents found.")
|
| 162 |
+
print("Add PDFs to priips_documents/raw/ and run extract_priips.py")
|
| 163 |
+
sys.exit(1)
|
| 164 |
+
|
| 165 |
+
print(f"β
Loaded {len(documents)} documents")
|
| 166 |
+
|
| 167 |
+
# Build context
|
| 168 |
+
context = build_context(documents, args.query, args.max_context)
|
| 169 |
+
|
| 170 |
+
# Query LLM
|
| 171 |
+
print(f"\nβ Question: {args.query}\n")
|
| 172 |
+
answer = query_llm(args.query, context, args.max_tokens)
|
| 173 |
+
|
| 174 |
+
print(f"\n㪠Answer:\n{answer}\n")
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
if __name__ == "__main__":
|
| 178 |
+
main()
|
| 179 |
+
|
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Quick test script to verify the PRIIPs LLM Service is working
|
| 4 |
+
Run with: python test_service.py
|
| 5 |
+
"""
|
| 6 |
+
import httpx
|
| 7 |
+
import json
|
| 8 |
+
import time
|
| 9 |
+
import os
|
| 10 |
+
from huggingface_hub import get_token
|
| 11 |
+
|
| 12 |
+
BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
|
| 13 |
+
|
| 14 |
+
# Get HF token for private Space access
|
| 15 |
+
HF_TOKEN = get_token()
|
| 16 |
+
if not HF_TOKEN:
|
| 17 |
+
print("β οΈ Warning: No HF token found. Private Space access may fail.")
|
| 18 |
+
print(" Run: huggingface-cli login")
|
| 19 |
+
|
| 20 |
+
def test_endpoint(name, method, url, json_data=None, timeout=10):
|
| 21 |
+
"""Test a single endpoint"""
|
| 22 |
+
print(f"\n{'='*60}")
|
| 23 |
+
print(f"Testing: {name}")
|
| 24 |
+
print(f"{'='*60}")
|
| 25 |
+
print(f"URL: {url}")
|
| 26 |
+
|
| 27 |
+
# Add authentication headers for private Space
|
| 28 |
+
headers = {}
|
| 29 |
+
if HF_TOKEN:
|
| 30 |
+
headers["Authorization"] = f"Bearer {HF_TOKEN}"
|
| 31 |
+
|
| 32 |
+
try:
|
| 33 |
+
if method == "GET":
|
| 34 |
+
response = httpx.get(url, headers=headers, timeout=timeout)
|
| 35 |
+
else:
|
| 36 |
+
response = httpx.post(url, json=json_data, headers=headers, timeout=timeout)
|
| 37 |
+
|
| 38 |
+
print(f"Status: {response.status_code}")
|
| 39 |
+
|
| 40 |
+
if response.status_code == 200:
|
| 41 |
+
try:
|
| 42 |
+
data = response.json()
|
| 43 |
+
print(f"Response: {json.dumps(data, indent=2)[:500]}")
|
| 44 |
+
return True
|
| 45 |
+
except:
|
| 46 |
+
print(f"Response (text): {response.text[:200]}")
|
| 47 |
+
return False
|
| 48 |
+
else:
|
| 49 |
+
print(f"Error: {response.text[:200]}")
|
| 50 |
+
return False
|
| 51 |
+
|
| 52 |
+
except httpx.TimeoutException:
|
| 53 |
+
print(f"β Timeout after {timeout}s")
|
| 54 |
+
return False
|
| 55 |
+
except Exception as e:
|
| 56 |
+
print(f"β Error: {e}")
|
| 57 |
+
return False
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def main():
|
| 61 |
+
print(f"\n{'#'*60}")
|
| 62 |
+
print("PRIIPs LLM Service - Quick Test Script")
|
| 63 |
+
print(f"Service: {BASE_URL}")
|
| 64 |
+
print(f"{'#'*60}")
|
| 65 |
+
|
| 66 |
+
results = {}
|
| 67 |
+
|
| 68 |
+
# Test 1: Root endpoint
|
| 69 |
+
results['root'] = test_endpoint(
|
| 70 |
+
"Root Endpoint",
|
| 71 |
+
"GET",
|
| 72 |
+
f"{BASE_URL}/"
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
# Test 2: Health endpoint
|
| 76 |
+
results['health'] = test_endpoint(
|
| 77 |
+
"Health Check",
|
| 78 |
+
"GET",
|
| 79 |
+
f"{BASE_URL}/health"
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
# Test 3: List models
|
| 83 |
+
results['models'] = test_endpoint(
|
| 84 |
+
"List Models",
|
| 85 |
+
"GET",
|
| 86 |
+
f"{BASE_URL}/v1/models"
|
| 87 |
+
)
|
| 88 |
+
|
| 89 |
+
# Test 4: Chat completion (this will load the model - may take 30s-1min first time)
|
| 90 |
+
print("\n" + "="*60)
|
| 91 |
+
print("Testing: Chat Completion (Model Loading)")
|
| 92 |
+
print("="*60)
|
| 93 |
+
print("β οΈ First request will take 30s-1min to load the model...")
|
| 94 |
+
print(" Please wait...")
|
| 95 |
+
|
| 96 |
+
chat_payload = {
|
| 97 |
+
"model": "DragonLLM/gemma3-12b-fin-v0.3",
|
| 98 |
+
"messages": [
|
| 99 |
+
{"role": "user", "content": "What is 2+2?"}
|
| 100 |
+
],
|
| 101 |
+
"max_tokens": 50,
|
| 102 |
+
"temperature": 0.7
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
results['chat'] = test_endpoint(
|
| 106 |
+
"Chat Completion",
|
| 107 |
+
"POST",
|
| 108 |
+
f"{BASE_URL}/v1/chat/completions",
|
| 109 |
+
json_data=chat_payload,
|
| 110 |
+
timeout=120 # Longer timeout for model loading
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
# Summary
|
| 114 |
+
print(f"\n{'#'*60}")
|
| 115 |
+
print("SUMMARY")
|
| 116 |
+
print(f"{'#'*60}")
|
| 117 |
+
|
| 118 |
+
passed = sum(1 for v in results.values() if v)
|
| 119 |
+
total = len(results)
|
| 120 |
+
|
| 121 |
+
for test_name, success in results.items():
|
| 122 |
+
status = "β
PASS" if success else "β FAIL"
|
| 123 |
+
print(f"{status} - {test_name}")
|
| 124 |
+
|
| 125 |
+
print(f"\nResults: {passed}/{total} tests passed")
|
| 126 |
+
|
| 127 |
+
if passed == total:
|
| 128 |
+
print("\nπ All tests passed! Service is fully operational.")
|
| 129 |
+
elif results.get('root') or results.get('health'):
|
| 130 |
+
print("\nβ οΈ Service is responding but some endpoints failed.")
|
| 131 |
+
print(" This might be normal if model is still loading.")
|
| 132 |
+
else:
|
| 133 |
+
print("\nβ Service is not accessible. Check:")
|
| 134 |
+
print(" 1. Space is running on HF dashboard")
|
| 135 |
+
print(" 2. No firewall/network issues")
|
| 136 |
+
print(" 3. Correct URL")
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
if __name__ == "__main__":
|
| 140 |
+
main()
|
| 141 |
+
|