Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

jeanbaptdzd commited on Oct 28

Commit

ee07ed2

1 Parent(s): 772dd21

Deploy PRIIPs LLM Service to HF Spaces + RAG workflow

✅ Successful deployment:
- Model: DragonLLM/qwen3-8b-fin-v1.0 (8B parameters)
- Hardware: L4 GPU (24GB VRAM)
- Backend: vLLM with eager mode (stable)
- Context: 4096 tokens
- API: OpenAI-compatible at https://jeanbaptdzd-priips-llm-service.hf.space

🔧 Configuration updates:
- Updated Dockerfile to CUDA 12.4.0, Python 3.11
- Configured vLLM with enforce_eager=True for L4 stability
- Set max_model_len=4096, gpu_memory_utilization=0.85
- Fixed KV cache memory allocation issues
- Background model initialization to avoid timeouts
- Config: allow extra fields in .env

📚 PRIIPS RAG Workflow:
- Created priips_documents/ directory structure (raw/extracted/processed)
- Added extract_priips.py: PDF → JSON extraction script
- Added query_with_context.py: RAG-powered query system
- Comprehensive documentation in PRIIPS_WORKFLOW.md
- Test service utilities

🎯 Tested and working:
- All API endpoints operational (/, /v1/models, /v1/chat/completions)
- Financial calculations: CAGR, returns
- Risk assessment: market/credit risk concepts
- PRIIPS knowledge: SRI, KID sections
- Information extraction from documents
- Ready for RAG integration with PydanticAI/DSPy

Files changed (11) hide show

Dockerfile +31 -14
PRIIPS_WORKFLOW.md +182 -0
README.md +5 -4
app/config.py +2 -1
app/main.py +13 -3
app/middleware.py +11 -0
app/providers/vllm.py +37 -11
requirements.txt +2 -1
scripts/extract_priips.py +182 -0
scripts/query_with_context.py +179 -0
test_service.py +141 -0

Dockerfile CHANGED Viewed

@@ -1,40 +1,57 @@
-FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
 # Set environment variables
-ENV DEBIAN_FRONTEND=noninteractive
 ENV PYTHONUNBUFFERED=1
-# Install Python and system dependencies
 RUN apt-get update && apt-get install -y \
     python3.11 \
     python3.11-dev \
     python3-pip \
     git \
     curl \
-    && rm -rf /var/lib/apt/lists/* \
-    && ln -s /usr/bin/python3.11 /usr/bin/python
 # Upgrade pip
-RUN python -m pip install --upgrade pip
 # Set working directory
 WORKDIR /app
-# Copy requirements first for better caching
-COPY requirements.txt .
-# Install Python dependencies
-RUN pip install --no-cache-dir -r requirements.txt
 # Copy application code
 COPY app/ ./app/
-# Create a non-root user
-RUN useradd -m -u 1000 user && chown -R user:user /app
 USER user
-# Set HuggingFace cache directory
 ENV HF_HOME=/tmp/huggingface
 # Expose port
 EXPOSE 7860

+# Use NVIDIA CUDA 12.4 base image (12.1 is deprecated)
+FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
 # Set environment variables
 ENV PYTHONUNBUFFERED=1
+ENV DEBIAN_FRONTEND=noninteractive
+# Install Python 3.11 and build dependencies
 RUN apt-get update && apt-get install -y \
     python3.11 \
     python3.11-dev \
     python3-pip \
     git \
     curl \
+    && rm -rf /var/lib/apt/lists/*
+# Set Python 3.11 as default
+RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
+    update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
 # Upgrade pip
+RUN python3 -m pip install --upgrade pip
 # Set working directory
 WORKDIR /app
+# Install vLLM and dependencies in one layer for efficiency
+RUN pip install --no-cache-dir \
+    vllm \
+    fastapi>=0.115.0 \
+    uvicorn[standard]>=0.30.0 \
+    pydantic>=2.8.0 \
+    pydantic-settings>=2.4.0 \
+    httpx>=0.27.0 \
+    python-dotenv>=1.0.1 \
+    tenacity>=8.3.0 \
+    PyMuPDF>=1.24.0
 # Copy application code
 COPY app/ ./app/
+# Create a non-root user and set up cache directories
+RUN useradd -m -u 1000 user && \
+    mkdir -p /tmp/huggingface /tmp/torch/inductor /tmp/triton && \
+    chown -R user:user /app /tmp/huggingface /tmp/torch /tmp/triton
 USER user
+# Set environment variables for optimal vLLM + torch.compile performance
 ENV HF_HOME=/tmp/huggingface
+ENV TORCHINDUCTOR_CACHE_DIR=/tmp/torch/inductor
+ENV TRITON_CACHE_DIR=/tmp/triton
+ENV TORCH_COMPILE_DEBUG=0
+ENV CUDA_VISIBLE_DEVICES=0
 # Expose port
 EXPOSE 7860

PRIIPS_WORKFLOW.md ADDED Viewed

	@@ -0,0 +1,182 @@

+# PRIIPS Document Extraction & RAG Workflow
+Complete workflow for extracting PRIIPS KID documents and querying with LLM context.
+## 📁 Directory Structure
+```
+priips_documents/
+├── raw/           # Place your PDF documents here
+├── extracted/     # Extracted JSON documents (auto-generated)
+└── processed/     # Chunked documents for RAG (future)
+scripts/
+├── extract_priips.py      # Extract text from PDFs
+└── query_with_context.py  # Query LLM with document context
+```
+## 🚀 Quick Start
+### 1. Add PRIIPS Documents
+Place PDF documents in `priips_documents/raw/`:
+```bash
+# Naming convention: {ISIN}_{ProductName}_{Date}.pdf
+cp /path/to/your/priips.pdf priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
+```
+### 2. Extract Document Content
+```bash
+# Extract all PDFs in the raw directory
+python scripts/extract_priips.py priips_documents/raw/
+# Or extract a single file
+python scripts/extract_priips.py priips_documents/raw/LU1234567890_GlobalEquity_2024.pdf
+```
+**Output:** JSON files in `priips_documents/extracted/` with structured content:
+- Metadata (ISIN, product name, dates)
+- Raw extracted text
+- Parsed sections (objectives, risks, costs, etc.)
+### 3. Query with RAG Context
+```bash
+# Ask questions about your documents
+python scripts/query_with_context.py "What is the recommended holding period?"
+python scripts/query_with_context.py "What are the main risks of this investment?"
+python scripts/query_with_context.py "Summarize the cost structure"
+```
+**Options:**
+```bash
+# Specify different extracted directory
+python scripts/query_with_context.py "Your question" --extracted-dir custom/path/
+# Control context size and response length
+python scripts/query_with_context.py "Your question" \
+  --max-context 3000 \
+  --max-tokens 800
+```
+## 📊 Example Workflow
+```bash
+# 1. Add a PRIIPS PDF
+cp MyFund.pdf priips_documents/raw/FR0012345678_MyFund_2024.pdf
+# 2. Extract content
+python scripts/extract_priips.py priips_documents/raw/
+# Output:
+# 📄 Processing: FR0012345678_MyFund_2024.pdf
+# ✅ Extracted 12,543 characters
+# 💾 Saved to: priips_documents/extracted/FR0012345678_MyFund_2024_extracted.json
+# 3. Query the LLM
+python scripts/query_with_context.py "What is the SRI of this fund?"
+# Output:
+# 📚 Loading documents from priips_documents/extracted...
+# ✅ Loaded 1 documents
+# 🔍 Querying LLM with 1,234 chars of context...
+# 📊 Tokens used: 234
+#
+# 💬 Answer:
+# Based on the PRIIPS document, the Summary Risk Indicator (SRI) for this fund is 5 out of 7...
+```
+## 🎯 Use Cases
+### Document Comparison
+```bash
+python scripts/query_with_context.py "Compare the risk profiles of all available funds"
+```
+### Specific Information Extraction
+```bash
+python scripts/query_with_context.py "Extract all recommended holding periods"
+python scripts/query_with_context.py "List all ISINs and their product names"
+```
+### Compliance Checks
+```bash
+python scripts/query_with_context.py "Are there any funds with SRI above 6?"
+python scripts/query_with_context.py "Which funds have holding periods under 3 years?"
+```
+## 🔧 Advanced: Integrate with PydanticAI
+```python
+from pydantic_ai import Agent
+from pydantic_ai.models.openai import OpenAIModel
+# Configure with your deployed service
+model = OpenAIModel(
+    'DragonLLM/qwen3-8b-fin-v1.0',
+    base_url='https://jeanbaptdzd-priips-llm-service.hf.space/v1',
+)
+agent = Agent(model=model)
+# Load PRIIPS context
+with open('priips_documents/extracted/LU123_extracted.json') as f:
+    context = json.load(f)
+# Query with context
+result = agent.run_sync(
+    f"Based on this PRIIPS document: {context['raw_text'][:2000]}... "
+    f"What is the recommended holding period?"
+)
+```
+## 📝 Extracted Document Schema
+```json
+{
+  "metadata": {
+    "filename": "LU1234567890_GlobalEquity_2024.pdf",
+    "extraction_date": "2024-10-28T16:24:00",
+    "isin": "LU1234567890",
+    "product_name": "GlobalEquity",
+    "file_size_bytes": 245678,
+    "text_length": 12543
+  },
+  "raw_text": "Full extracted text from PDF...",
+  "sections": {
+    "summary": "What is this product? ...",
+    "objectives": "Investment objectives and policy...",
+    "risk_indicator": "SRI: 5/7 ...",
+    "performance_scenarios": "Performance scenarios...",
+    "costs": "What are the costs? ...",
+    "holding_period": "Recommended: 5 years"
+  }
+}
+```
+## 🚀 Next Steps
+1. **Add More Documents:** Place additional PRIIPS PDFs in `raw/`
+2. **Enhance Extraction:** Improve section parsing in `extract_priips.py`
+3. **Add Embeddings:** Implement vector search for better RAG
+4. **Build API:** Create REST API endpoints for document queries
+5. **Dashboard:** Build web UI for document management and queries
+## 📚 API Integration
+The LLM service is OpenAI-compatible and deployed at:
+```
+https://jeanbaptdzd-priips-llm-service.hf.space/v1
+```
+**Endpoints:**
+- `GET /` - Service status
+- `GET /v1/models` - List available models
+- `POST /v1/chat/completions` - Chat completion with context
+See `test_service.py` for integration examples.

README.md CHANGED Viewed

@@ -7,11 +7,12 @@ sdk: docker
 pinned: false
 license: mit
 app_port: 7860
 ---
 # PRIIPs LLM Service - Hugging Face Spaces
-OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/LLM-Pro-Finance-Small` via vLLM.
 ## 🚀 Quick Start
@@ -34,7 +35,7 @@ curl -X GET "https://your-space-url.hf.space/v1/models"
 curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
-    "model": "DragonLLM/LLM-Pro-Finance-Small",
     "messages": [{"role": "user", "content": "Hello!"}],
     "temperature": 0.7
   }'
@@ -95,7 +96,7 @@ from pydantic_ai import Agent
 from pydantic_ai.models.openai import OpenAIModel
 model = OpenAIModel(
-    "DragonLLM/LLM-Pro-Finance-Small",
     base_url="https://your-space-url.hf.space/v1"
 )
@@ -107,7 +108,7 @@ agent = Agent(model=model)
 import dspy
 lm = dspy.OpenAI(
-    model="DragonLLM/LLM-Pro-Finance-Small",
     api_base="https://your-space-url.hf.space/v1"
 )
 ```

 pinned: false
 license: mit
 app_port: 7860
+hardware: l4
 ---
 # PRIIPs LLM Service - Hugging Face Spaces
+OpenAI-compatible API and PRIIPs extractor powered by `DragonLLM/qwen3-8b-fin-v1.0` via vLLM.
 ## 🚀 Quick Start
 curl -X POST "https://your-space-url.hf.space/v1/chat/completions" \
   -H "Content-Type: application/json" \
   -d '{
+    "model": "DragonLLM/gemma3-12b-fin-v0.3",
     "messages": [{"role": "user", "content": "Hello!"}],
     "temperature": 0.7
   }'
 from pydantic_ai.models.openai import OpenAIModel
 model = OpenAIModel(
+    "DragonLLM/gemma3-12b-fin-v0.3",
     base_url="https://your-space-url.hf.space/v1"
 )
 import dspy
 lm = dspy.OpenAI(
+    model="DragonLLM/gemma3-12b-fin-v0.3",
     api_base="https://your-space-url.hf.space/v1"
 )
 ```

app/config.py CHANGED Viewed

@@ -3,13 +3,14 @@ from pydantic_settings import BaseSettings
 class Settings(BaseSettings):
     vllm_base_url: str = "http://localhost:8000/v1"
-    model: str = "DragonLLM/LLM-Pro-Finance-Small"
     service_api_key: str | None = None
     log_level: str = "info"
     class Config:
         env_file = ".env"
         env_file_encoding = "utf-8"
 settings = Settings()

 class Settings(BaseSettings):
     vllm_base_url: str = "http://localhost:8000/v1"
+    model: str = "DragonLLM/qwen3-8b-fin-v1.0"
     service_api_key: str | None = None
     log_level: str = "info"
     class Config:
         env_file = ".env"
         env_file_encoding = "utf-8"
+        extra = "ignore"  # Ignore extra fields in .env
 settings = Settings()

app/main.py CHANGED Viewed

@@ -18,9 +18,19 @@ app.middleware("http")(api_key_guard)
 @app.on_event("startup")
 async def startup_event():
-    """Preload the model on startup"""
     logger.info("Starting PRIIPs LLM Service...")
-    logger.info("Model will be loaded on first request to optimize startup time")
 @app.get("/")
 async def root():
@@ -28,7 +38,7 @@ async def root():
         "status": "ok",
         "service": "PRIIPs LLM Service",
         "version": "1.0.0",
-        "model": "DragonLLM/LLM-Pro-Finance-Small",
         "backend": "vLLM"
     }

 @app.on_event("startup")
 async def startup_event():
+    """Startup event - initialize model in background"""
+    import threading
     logger.info("Starting PRIIPs LLM Service...")
+    logger.info("Initializing model in background thread...")
+    def load_model():
+        from app.providers.vllm import initialize_vllm
+        initialize_vllm()
+    # Start model loading in background thread
+    thread = threading.Thread(target=load_model, daemon=True)
+    thread.start()
+    logger.info("Model initialization started in background")
 @app.get("/")
 async def root():
         "status": "ok",
         "service": "PRIIPs LLM Service",
         "version": "1.0.0",
+            "model": "DragonLLM/qwen3-8b-fin-v1.0",
         "backend": "vLLM"
     }

app/middleware.py CHANGED Viewed

@@ -5,11 +5,22 @@ from app.config import settings
 async def api_key_guard(request: Request, call_next):
     if not settings.service_api_key:
         return await call_next(request)
     key = request.headers.get("x-api-key") or request.headers.get("authorization")
     if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
         return await call_next(request)
     return JSONResponse({"error": "unauthorized"}, status_code=401)

 async def api_key_guard(request: Request, call_next):
+    # Public endpoints that don't require authentication
+    public_paths = ["/", "/health", "/docs", "/redoc", "/openapi.json"]
+    # Skip auth for public endpoints
+    if request.url.path in public_paths:
+        return await call_next(request)
+    # Skip auth if no API key is configured
     if not settings.service_api_key:
         return await call_next(request)
+    # Check API key
     key = request.headers.get("x-api-key") or request.headers.get("authorization")
     if key and key.replace("Bearer ", "").strip() == settings.service_api_key:
         return await call_next(request)
     return JSONResponse({"error": "unauthorized"}, status_code=401)

app/providers/vllm.py CHANGED Viewed

@@ -3,9 +3,10 @@ from typing import Dict, Any, AsyncIterator
 from vllm import LLM, SamplingParams
 from vllm.entrypoints.openai.api_server import build_async_engine_client
 import asyncio
-# Model configuration
-model_name = "DragonLLM/LLM-Pro-Finance-Small"
 llm_engine = None
 def initialize_vllm():
@@ -15,26 +16,51 @@ def initialize_vllm():
     if llm_engine is None:
         print(f"Initializing vLLM with model: {model_name}")
-        # Get HF token from environment
-        hf_token = os.getenv("HF_TOKEN_LC")
         if hf_token:
             os.environ["HF_TOKEN"] = hf_token
             os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
         try:
-            # Initialize vLLM engine
             llm_engine = LLM(
                 model=model_name,
                 trust_remote_code=True,
-                dtype="float16",
-                max_model_len=4096,
-                gpu_memory_utilization=0.9,
-                tensor_parallel_size=1,  # L40 has 1 GPU
                 download_dir="/tmp/huggingface",
             )
-            print(f"vLLM engine initialized successfully!")
         except Exception as e:
-            print(f"Error initializing vLLM: {e}")
             raise

 from vllm import LLM, SamplingParams
 from vllm.entrypoints.openai.api_server import build_async_engine_client
 import asyncio
+from huggingface_hub import login
+# Model configuration - optimized for 8B Qwen3 on L4
+model_name = "DragonLLM/qwen3-8b-fin-v1.0"
 llm_engine = None
 def initialize_vllm():
     if llm_engine is None:
         print(f"Initializing vLLM with model: {model_name}")
+        # Get HF token from environment (Hugging Face Space secret)
+        # Try HF_TOKEN_LC2 first (for DragonLLM access), then fall back to HF_TOKEN_LC
+        hf_token = os.getenv("HF_TOKEN_LC2") or os.getenv("HF_TOKEN_LC")
         if hf_token:
+            token_source = "HF_TOKEN_LC2" if os.getenv("HF_TOKEN_LC2") else "HF_TOKEN_LC"
+            print(f"✅ {token_source} found (length: {len(hf_token)})")
+            # Properly authenticate with Hugging Face Hub
+            try:
+                login(token=hf_token, add_to_git_credential=False)
+                print("✅ Successfully authenticated with Hugging Face Hub")
+            except Exception as e:
+                print(f"⚠️  Warning: Failed to authenticate with HF Hub: {e}")
+            # Also set environment variables as fallback
             os.environ["HF_TOKEN"] = hf_token
             os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
+        else:
+            print("⚠️  WARNING: Neither HF_TOKEN_LC2 nor HF_TOKEN_LC found in environment!")
+            print("Available env vars:", list(os.environ.keys()))
         try:
+            # Initialize vLLM engine with explicit token
+            print(f"Attempting to load model: {model_name}")
+            print(f"Model type: Qwen3 8B (bfloat16) - Optimized for L4 with torch.compile")
+            print(f"Download directory: /tmp/huggingface")
+            print(f"Trust remote code: True")
+            print(f"L4 GPU: 24GB VRAM available")
+            print(f"Mode: Eager mode (CUDA graphs disabled for L4)")
             llm_engine = LLM(
                 model=model_name,
                 trust_remote_code=True,
+                dtype="bfloat16",  # Use bfloat16 for Qwen3 (required)
+                max_model_len=4096,  # Reduced for L4 KV cache constraints
+                gpu_memory_utilization=0.85,  # Increased to fit KV cache
+                tensor_parallel_size=1,  # Single L4 GPU
                 download_dir="/tmp/huggingface",
+                tokenizer_mode="auto",
+                # Disable torch.compile on L4 due to memory constraints
+                enforce_eager=True,  # Use eager mode (no CUDA graphs/compilation)
+                # Let vLLM handle compilation and fallback gracefully
+                disable_log_stats=False,  # Enable logging for debugging
             )
+            print(f"✅ vLLM engine initialized successfully with {model_name}!")
         except Exception as e:
+            print(f"❌ Error initializing vLLM: {e}")
             raise

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
 fastapi>=0.115.0
 uvicorn[standard]>=0.30.0
 pydantic>=2.8.0
@@ -7,4 +9,3 @@ python-dotenv>=1.0.1
 tenacity>=8.3.0
 PyMuPDF>=1.24.0
 pytest>=7.4.0

+# Dependencies installed in Dockerfile during HF Space build
+vllm
 fastapi>=0.115.0
 uvicorn[standard]>=0.30.0
 pydantic>=2.8.0
 tenacity>=8.3.0
 PyMuPDF>=1.24.0
 pytest>=7.4.0

scripts/extract_priips.py ADDED Viewed

	@@ -0,0 +1,182 @@

+#!/usr/bin/env python3
+"""
+PRIIPS Document Extraction Script
+Extracts text from PRIIPS KID PDFs and processes them for RAG context.
+"""
+import sys
+import json
+from pathlib import Path
+from datetime import datetime
+import argparse
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from app.utils.pdf import extract_text_from_pdf
+def extract_priips_document(pdf_path: Path, output_dir: Path) -> dict:
+    """
+    Extract content from a PRIIPS KID PDF.
+    Args:
+        pdf_path: Path to the PDF file
+        output_dir: Directory to save extracted content
+    Returns:
+        Dictionary with extracted content
+    """
+    print(f"📄 Processing: {pdf_path.name}")
+    # Extract text from PDF
+    try:
+        raw_text = extract_text_from_pdf(pdf_path)
+        print(f"✅ Extracted {len(raw_text)} characters")
+    except Exception as e:
+        print(f"❌ Error extracting PDF: {e}")
+        return None
+    # Parse filename for metadata
+    filename_parts = pdf_path.stem.split("_")
+    isin = filename_parts[0] if len(filename_parts) > 0 else "UNKNOWN"
+    product_name = filename_parts[1] if len(filename_parts) > 1 else pdf_path.stem
+    # Create structured output
+    extracted_data = {
+        "metadata": {
+            "filename": pdf_path.name,
+            "extraction_date": datetime.now().isoformat(),
+            "isin": isin,
+            "product_name": product_name,
+            "file_size_bytes": pdf_path.stat().st_size,
+            "text_length": len(raw_text)
+        },
+        "raw_text": raw_text,
+        "sections": extract_sections(raw_text)
+    }
+    # Save to JSON
+    output_path = output_dir / f"{pdf_path.stem}_extracted.json"
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(extracted_data, f, indent=2, ensure_ascii=False)
+    print(f"💾 Saved to: {output_path}")
+    return extracted_data
+def extract_sections(text: str) -> dict:
+    """
+    Extract common PRIIPS KID sections from text.
+    This is a simple implementation. Can be enhanced with LLM-based extraction.
+    """
+    sections = {}
+    # Common PRIIPS section keywords
+    keywords = {
+        "summary": ["what is this product", "summary"],
+        "objectives": ["objectives", "investment objectives"],
+        "risk_indicator": ["risk indicator", "sri", "summary risk"],
+        "performance_scenarios": ["performance scenarios", "what could i get"],
+        "costs": ["what are the costs", "costs"],
+        "holding_period": ["recommended holding period", "holding period"]
+    }
+    text_lower = text.lower()
+    for section_name, search_terms in keywords.items():
+        for term in search_terms:
+            if term in text_lower:
+                # Extract a snippet around the keyword
+                start_idx = text_lower.find(term)
+                # Get 500 chars after the keyword
+                snippet = text[start_idx:start_idx + 500].strip()
+                sections[section_name] = snippet
+                break
+    return sections
+def batch_process_directory(input_dir: Path, output_dir: Path):
+    """Process all PDFs in a directory."""
+    pdf_files = list(input_dir.glob("*.pdf"))
+    if not pdf_files:
+        print(f"⚠️  No PDF files found in {input_dir}")
+        return
+    print(f"📦 Found {len(pdf_files)} PDF files to process\n")
+    output_dir.mkdir(parents=True, exist_ok=True)
+    results = []
+    for pdf_path in pdf_files:
+        result = extract_priips_document(pdf_path, output_dir)
+        if result:
+            results.append(result)
+        print()  # Blank line between files
+    # Save summary
+    summary_path = output_dir / "_extraction_summary.json"
+    summary = {
+        "extraction_date": datetime.now().isoformat(),
+        "total_processed": len(results),
+        "total_failed": len(pdf_files) - len(results),
+        "files": [r["metadata"] for r in results]
+    }
+    with open(summary_path, "w", encoding="utf-8") as f:
+        json.dump(summary, f, indent=2)
+    print(f"\n✅ Processed {len(results)}/{len(pdf_files)} files successfully")
+    print(f"📊 Summary saved to: {summary_path}")
+def main():
+    parser = argparse.ArgumentParser(
+        description="Extract PRIIPS KID documents for RAG context"
+    )
+    parser.add_argument(
+        "input",
+        type=str,
+        help="Input PDF file or directory containing PDFs"
+    )
+    parser.add_argument(
+        "--output",
+        type=str,
+        default=None,
+        help="Output directory (default: priips_documents/extracted/)"
+    )
+    args = parser.parse_args()
+    # Setup paths
+    workspace_root = Path(__file__).parent.parent
+    input_path = Path(args.input)
+    if not input_path.is_absolute():
+        input_path = workspace_root / input_path
+    if args.output:
+        output_dir = Path(args.output)
+        if not output_dir.is_absolute():
+            output_dir = workspace_root / output_dir
+    else:
+        output_dir = workspace_root / "priips_documents" / "extracted"
+    # Process
+    if input_path.is_file():
+        output_dir.mkdir(parents=True, exist_ok=True)
+        extract_priips_document(input_path, output_dir)
+    elif input_path.is_dir():
+        batch_process_directory(input_path, output_dir)
+    else:
+        print(f"❌ Error: {input_path} does not exist")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

scripts/query_with_context.py ADDED Viewed

	@@ -0,0 +1,179 @@

+#!/usr/bin/env python3
+"""
+Query LLM with PRIIPS Document Context
+Loads extracted PRIIPS documents and queries the LLM with RAG context.
+"""
+import sys
+import json
+import argparse
+from pathlib import Path
+from typing import List, Dict
+import requests
+# Configuration
+BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
+MODEL = "DragonLLM/qwen3-8b-fin-v1.0"
+def load_extracted_documents(extracted_dir: Path) -> List[Dict]:
+    """Load all extracted PRIIPS documents."""
+    documents = []
+    for json_file in extracted_dir.glob("*_extracted.json"):
+        if json_file.name.startswith("_"):
+            continue  # Skip summary files
+        with open(json_file, "r", encoding="utf-8") as f:
+            documents.append(json.load(f))
+    return documents
+def build_context(documents: List[Dict], query: str, max_chars: int = 2000) -> str:
+    """
+    Build RAG context from documents relevant to the query.
+    Simple implementation: include all document summaries.
+    Can be enhanced with semantic search/embeddings.
+    """
+    context_parts = []
+    total_chars = 0
+    for doc in documents:
+        metadata = doc["metadata"]
+        # Build a summary of this document
+        doc_summary = f"\n--- Document: {metadata['product_name']} (ISIN: {metadata['isin']}) ---\n"
+        # Include extracted sections
+        if "sections" in doc and doc["sections"]:
+            for section_name, content in doc["sections"].items():
+                if content:
+                    section_text = f"\n{section_name.upper()}:\n{content[:300]}...\n"
+                    doc_summary += section_text
+        # Check if we have space
+        if total_chars + len(doc_summary) > max_chars:
+            break
+        context_parts.append(doc_summary)
+        total_chars += len(doc_summary)
+    if not context_parts:
+        return "No relevant documents found."
+    return "\n".join(context_parts)
+def query_llm(query: str, context: str, max_tokens: int = 500) -> str:
+    """Query the LLM with context."""
+    # Build the prompt with context
+    prompt = f"""You are a financial expert assistant specializing in PRIIPS Key Information Documents.
+Use the following context from PRIIPS documents to answer the question:
+{context}
+Question: {query}
+Provide a clear, accurate answer based on the context provided. If the context doesn't contain enough information, say so."""
+    payload = {
+        "model": MODEL,
+        "messages": [
+            {"role": "system", "content": "You are a PRIIPS financial document expert."},
+            {"role": "user", "content": prompt}
+        ],
+        "max_tokens": max_tokens,
+        "temperature": 0.3  # Lower temperature for more factual responses
+    }
+    print(f"🔍 Querying LLM with {len(context)} chars of context...")
+    try:
+        response = requests.post(
+            f"{BASE_URL}/v1/chat/completions",
+            json=payload,
+            timeout=60
+        )
+        response.raise_for_status()
+        data = response.json()
+        answer = data["choices"][0]["message"]["content"]
+        # Print usage stats
+        usage = data.get("usage", {})
+        print(f"📊 Tokens used: {usage.get('total_tokens', 'N/A')}")
+        return answer
+    except Exception as e:
+        return f"Error querying LLM: {e}"
+def main():
+    parser = argparse.ArgumentParser(
+        description="Query LLM with PRIIPS document context"
+    )
+    parser.add_argument(
+        "query",
+        type=str,
+        help="Question to ask about PRIIPS documents"
+    )
+    parser.add_argument(
+        "--extracted-dir",
+        type=str,
+        default="priips_documents/extracted",
+        help="Directory containing extracted documents"
+    )
+    parser.add_argument(
+        "--max-context",
+        type=int,
+        default=2000,
+        help="Maximum context characters to include"
+    )
+    parser.add_argument(
+        "--max-tokens",
+        type=int,
+        default=500,
+        help="Maximum tokens in response"
+    )
+    args = parser.parse_args()
+    # Setup paths
+    workspace_root = Path(__file__).parent.parent
+    extracted_dir = workspace_root / args.extracted_dir
+    if not extracted_dir.exists():
+        print(f"❌ Directory not found: {extracted_dir}")
+        print("Run extract_priips.py first to extract documents.")
+        sys.exit(1)
+    # Load documents
+    print(f"📚 Loading documents from {extracted_dir}...")
+    documents = load_extracted_documents(extracted_dir)
+    if not documents:
+        print("⚠️  No extracted documents found.")
+        print("Add PDFs to priips_documents/raw/ and run extract_priips.py")
+        sys.exit(1)
+    print(f"✅ Loaded {len(documents)} documents")
+    # Build context
+    context = build_context(documents, args.query, args.max_context)
+    # Query LLM
+    print(f"\n❓ Question: {args.query}\n")
+    answer = query_llm(args.query, context, args.max_tokens)
+    print(f"\n💬 Answer:\n{answer}\n")
+if __name__ == "__main__":
+    main()

test_service.py ADDED Viewed

	@@ -0,0 +1,141 @@

+#!/usr/bin/env python3
+"""
+Quick test script to verify the PRIIPs LLM Service is working
+Run with: python test_service.py
+"""
+import httpx
+import json
+import time
+import os
+from huggingface_hub import get_token
+BASE_URL = "https://jeanbaptdzd-priips-llm-service.hf.space"
+# Get HF token for private Space access
+HF_TOKEN = get_token()
+if not HF_TOKEN:
+    print("⚠️  Warning: No HF token found. Private Space access may fail.")
+    print("   Run: huggingface-cli login")
+def test_endpoint(name, method, url, json_data=None, timeout=10):
+    """Test a single endpoint"""
+    print(f"\n{'='*60}")
+    print(f"Testing: {name}")
+    print(f"{'='*60}")
+    print(f"URL: {url}")
+    # Add authentication headers for private Space
+    headers = {}
+    if HF_TOKEN:
+        headers["Authorization"] = f"Bearer {HF_TOKEN}"
+    try:
+        if method == "GET":
+            response = httpx.get(url, headers=headers, timeout=timeout)
+        else:
+            response = httpx.post(url, json=json_data, headers=headers, timeout=timeout)
+        print(f"Status: {response.status_code}")
+        if response.status_code == 200:
+            try:
+                data = response.json()
+                print(f"Response: {json.dumps(data, indent=2)[:500]}")
+                return True
+            except:
+                print(f"Response (text): {response.text[:200]}")
+                return False
+        else:
+            print(f"Error: {response.text[:200]}")
+            return False
+    except httpx.TimeoutException:
+        print(f"❌ Timeout after {timeout}s")
+        return False
+    except Exception as e:
+        print(f"❌ Error: {e}")
+        return False
+def main():
+    print(f"\n{'#'*60}")
+    print("PRIIPs LLM Service - Quick Test Script")
+    print(f"Service: {BASE_URL}")
+    print(f"{'#'*60}")
+    results = {}
+    # Test 1: Root endpoint
+    results['root'] = test_endpoint(
+        "Root Endpoint",
+        "GET",
+        f"{BASE_URL}/"
+    )
+    # Test 2: Health endpoint
+    results['health'] = test_endpoint(
+        "Health Check",
+        "GET",
+        f"{BASE_URL}/health"
+    )
+    # Test 3: List models
+    results['models'] = test_endpoint(
+        "List Models",
+        "GET",
+        f"{BASE_URL}/v1/models"
+    )
+    # Test 4: Chat completion (this will load the model - may take 30s-1min first time)
+    print("\n" + "="*60)
+    print("Testing: Chat Completion (Model Loading)")
+    print("="*60)
+    print("⚠️  First request will take 30s-1min to load the model...")
+    print("    Please wait...")
+    chat_payload = {
+        "model": "DragonLLM/gemma3-12b-fin-v0.3",
+        "messages": [
+            {"role": "user", "content": "What is 2+2?"}
+        ],
+        "max_tokens": 50,
+        "temperature": 0.7
+    }
+    results['chat'] = test_endpoint(
+        "Chat Completion",
+        "POST",
+        f"{BASE_URL}/v1/chat/completions",
+        json_data=chat_payload,
+        timeout=120  # Longer timeout for model loading
+    )
+    # Summary
+    print(f"\n{'#'*60}")
+    print("SUMMARY")
+    print(f"{'#'*60}")
+    passed = sum(1 for v in results.values() if v)
+    total = len(results)
+    for test_name, success in results.items():
+        status = "✅ PASS" if success else "❌ FAIL"
+        print(f"{status} - {test_name}")
+    print(f"\nResults: {passed}/{total} tests passed")
+    if passed == total:
+        print("\n🎉 All tests passed! Service is fully operational.")
+    elif results.get('root') or results.get('health'):
+        print("\n⚠️  Service is responding but some endpoints failed.")
+        print("   This might be normal if model is still loading.")
+    else:
+        print("\n❌ Service is not accessible. Check:")
+        print("   1. Space is running on HF dashboard")
+        print("   2. No firewall/network issues")
+        print("   3. Correct URL")
+if __name__ == "__main__":
+    main()