Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

jeanbaptdzd commited on 23 days ago

Commit

6d3bf74

1 Parent(s): c77ec91

Reorganize tests and clean up documentation

- Move integration tests to tests/integration/ directory
- Add integration tests documentation in README
- Remove outdated docs (tool_calls_analysis, reasoning_models)
- Clean up remaining docs: remove PydanticAI references
- Fix unit tests (temperature default, conftest cleanup)
- Update project structure documentation

Files changed (11) hide show

README.md +34 -13
docs/openai_api_verification.md +14 -15
docs/qwen3_specifications.md +2 -2
docs/reasoning_models.md +0 -94
docs/tool_calls_analysis_hf_space.md +0 -257
tests/conftest.py +0 -1
tests/integration/__init__.py +10 -0
test_space_basic.py → tests/integration/test_space_basic.py +0 -0
test_space_with_tools.py → tests/integration/test_space_with_tools.py +0 -0
test_tool_calls.py → tests/integration/test_tool_calls.py +0 -0
tests/test_openai_models.py +1 -1

README.md CHANGED Viewed

@@ -19,14 +19,14 @@ This service provides an OpenAI-compatible API for the DragonLLM Qwen3-8B financ
 ## Features
-- ✅ **OpenAI-Compatible API** - Drop-in replacement for OpenAI API
-- ✅ **French & English Support** - Automatic language detection
-- ✅ **Rate Limiting** - Built-in protection (30 req/min, 500 req/hour)
-- ✅ **Statistics Tracking** - Token usage and request metrics via `/v1/stats`
-- ✅ **Health Monitoring** - Model readiness status in `/health` endpoint
-- ✅ **Streaming Support** - Real-time response streaming
-- ✅ **Tool Calls Support** - OpenAI-compatible tool/function calling
-- ✅ **Structured Outputs** - JSON format support via response_format
 ## API Endpoints
@@ -133,7 +133,7 @@ lm = dspy.OpenAI(
 - English and French support
 **Backend:**
-- Transformers 4.40.0+
 - PyTorch 2.5.0+ (CUDA 12.4)
 - Accelerate 0.30.0+
@@ -157,14 +157,33 @@ uvicorn app.main:app --reload --port 8080
 ### Testing
 ```bash
-# Run tests
-pytest -v
-# Test deployment
-./test_deployment.sh
 ```
 ## Project Structure
 ```
@@ -177,6 +196,8 @@ pytest -v
 │   └── utils/             # Utilities, stats tracking
 ├── docs/                  # Documentation
 ├── tests/                 # Test suite
 └── scripts/               # Utility scripts
 ```

 ## Features
+- OpenAI-compatible API - Drop-in replacement for OpenAI API
+- French and English support - Automatic language detection
+- Rate limiting - Built-in protection (30 req/min, 500 req/hour)
+- Statistics tracking - Token usage and request metrics via `/v1/stats`
+- Health monitoring - Model readiness status in `/health` endpoint
+- Streaming support - Real-time response streaming
+- Tool calls support - OpenAI-compatible tool/function calling
+- Structured outputs - JSON format support via response_format
 ## API Endpoints
 - English and French support
 **Backend:**
+- Transformers 4.45.0+
 - PyTorch 2.5.0+ (CUDA 12.4)
 - Accelerate 0.30.0+
 ### Testing
+**Unit Tests:**
 ```bash
+pytest tests/ -v
+```
+**Integration Tests:**
+The integration tests evaluate the model's ability to produce valid JSON outputs and execute tool calls, which are critical requirements for financial applications.
+```bash
+# Basic API functionality
+python tests/integration/test_space_basic.py
+# Tool calls and JSON format
+python tests/integration/test_space_with_tools.py
+# Detailed tool call validation
+python tests/integration/test_tool_calls.py
 ```
+**Test Coverage:**
+- API endpoints (health, models, chat completions)
+- Tool calls with `tool_choice` parameter
+- Structured JSON outputs via `response_format`
+- Model response parsing and validation
+These tests verify that the small 8B model can reliably produce valid JSON and execute tool calls, which is mandatory for financial workflows requiring structured data and function execution.
 ## Project Structure
 ```
 │   └── utils/             # Utilities, stats tracking
 ├── docs/                  # Documentation
 ├── tests/                 # Test suite
+│   ├── integration/      # Integration tests (API, tool calls, JSON)
+│   └── performance/      # Performance benchmarks
 └── scripts/               # Utility scripts
 ```

docs/openai_api_verification.md CHANGED Viewed

@@ -6,8 +6,8 @@ This document verifies that our OpenAI API wrapper implementation correctly foll
 ## Connection Flow
 ```
-PydanticAI Agent
-    ↓ (OpenAI-compatible requests)
 Hugging Face Space API (simple-llm-pro-finance)
     ↓ (FastAPI router)
 TransformersProvider
@@ -93,30 +93,29 @@ Qwen-Open-Finance-R-8B Model
 When `response_format={"type": "json_object"}` is provided:
 - ✅ System prompt is enhanced with JSON output instructions
 - ✅ Response is parsed to extract JSON from markdown code blocks
-- ✅ Clean JSON is returned for PydanticAI validation
 **Implementation**: Since Qwen doesn't have native JSON mode, we enforce it via prompt engineering and post-processing.
-## PydanticAI Integration
-### ✅ What PydanticAI Sends
-When using `output_type` parameter:
 ```python
-# PydanticAI sends:
 {
     "model": "dragon-llm-open-finance",
     "messages": [...],
     "temperature": 0.7,
     "max_tokens": 3000,
-    "response_format": {"type": "json_object"},  # ✅ Now supported
-    "tool_choice": "required",  # ✅ Now accepted (converted to "auto")
-    "tools": [...]  # ✅ If tools are defined
 }
 ```
-### ✅ Our Implementation Handles
 1. ✅ `tool_choice="required"` → Accepted and converted to `"auto"`
 2. ✅ `response_format={"type": "json_object"}` → JSON instructions added to prompt
@@ -163,10 +162,10 @@ When using `output_type` parameter:
 - [x] Streaming support implemented
 - [x] Tool calls properly formatted
-### PydanticAI Compatibility
 - [x] `tool_choice="required"` accepted
 - [x] `response_format` supported
-- [x] `output_type` requests handled correctly
 - [x] Tool definitions passed through
 - [x] Structured outputs extracted
@@ -181,7 +180,7 @@ When using `output_type` parameter:
 1. **Basic Chat**: Verify simple chat completions work
 2. **Tool Calls**: Test with tools defined, verify parsing
-3. **Structured Outputs**: Test with `output_type`, verify JSON extraction
 4. **Error Handling**: Test invalid requests return proper errors
 5. **Streaming**: Test streaming responses work correctly
@@ -197,7 +196,7 @@ When using `output_type` parameter:
 The implementation:
 - Follows OpenAI API specification
-- Handles PydanticAI-specific parameters correctly
 - Properly integrates with Qwen model via Transformers
 - Provides fallbacks for features not natively supported by Qwen

 ## Connection Flow
 ```
+OpenAI-compatible Client
+    ↓ (OpenAI API requests)
 Hugging Face Space API (simple-llm-pro-finance)
     ↓ (FastAPI router)
 TransformersProvider
 When `response_format={"type": "json_object"}` is provided:
 - ✅ System prompt is enhanced with JSON output instructions
 - ✅ Response is parsed to extract JSON from markdown code blocks
+- ✅ Clean JSON is returned for validation
 **Implementation**: Since Qwen doesn't have native JSON mode, we enforce it via prompt engineering and post-processing.
+## Client Integration
+### ✅ Supported Parameters
+The API accepts standard OpenAI API parameters:
 ```python
 {
     "model": "dragon-llm-open-finance",
     "messages": [...],
     "temperature": 0.7,
     "max_tokens": 3000,
+    "response_format": {"type": "json_object"},  # ✅ Supported
+    "tool_choice": "required",  # ✅ Accepted (converted to "auto")
+    "tools": [...]  # ✅ Tool definitions supported
 }
 ```
+### ✅ Implementation Details
 1. ✅ `tool_choice="required"` → Accepted and converted to `"auto"`
 2. ✅ `response_format={"type": "json_object"}` → JSON instructions added to prompt
 - [x] Streaming support implemented
 - [x] Tool calls properly formatted
+### Client Compatibility
 - [x] `tool_choice="required"` accepted
 - [x] `response_format` supported
+- [x] Structured output requests handled correctly
 - [x] Tool definitions passed through
 - [x] Structured outputs extracted
 1. **Basic Chat**: Verify simple chat completions work
 2. **Tool Calls**: Test with tools defined, verify parsing
+3. **Structured Outputs**: Test with `response_format`, verify JSON extraction
 4. **Error Handling**: Test invalid requests return proper errors
 5. **Streaming**: Test streaming responses work correctly
 The implementation:
 - Follows OpenAI API specification
+- Handles OpenAI-compatible parameters correctly
 - Properly integrates with Qwen model via Transformers
 - Provides fallbacks for features not natively supported by Qwen

docs/qwen3_specifications.md CHANGED Viewed

@@ -45,8 +45,8 @@ Contexte total = Prompt système + Messages conversation + Réponse générée
 ## Configuration actuelle
-Dans notre application PydanticAI:
-- `max_tokens` (génération): **1500 tokens** (configurable)
 - Contexte d'entrée: Illimité jusqu'à ~30K tokens (pour laisser de la marge)
 - Contexte total: Jusqu'à 32K tokens (base) ou 128K (avec YaRN)
 - Limite théorique max: 20K tokens en sortie (mais contrainte par contexte disponible)

 ## Configuration actuelle
+Dans notre application:
+- `max_tokens` (génération): **1500 tokens** (configurable via API)
 - Contexte d'entrée: Illimité jusqu'à ~30K tokens (pour laisser de la marge)
 - Contexte total: Jusqu'à 32K tokens (base) ou 128K (avec YaRN)
 - Limite théorique max: 20K tokens en sortie (mais contrainte par contexte disponible)

docs/reasoning_models.md DELETED Viewed

@@ -1,94 +0,0 @@
-# Gestion des modèles de raisonnement avec PydanticAI
-## Problème: "finish on length"
-Quand vous voyez `finish_reason: "length"`, cela signifie que le modèle a atteint la limite de `max_tokens` avant de terminer sa réponse.
-## Pourquoi c'est fréquent avec les modèles de raisonnement?
-Les modèles comme Qwen3 utilisent des balises `<think>` (ou `<think>`) pour le raisonnement en chaîne:
-```
-<think>
-1. L'utilisateur demande un message SWIFT MT103
-2. Je dois identifier les champs requis
-3. Format: :20: référence, :32A: date/devise/montant...
-</think>
-Voici le message SWIFT généré:
-:20:NONREF
-:23B:CRED
-...
-```
-**Le raisonnement peut consommer 40-60% du budget de tokens!**
-## Solution: Augmenter max_tokens
-Nous avons configuré `max_tokens=1500` dans `app/config.py` pour permettre:
-- ~600-900 tokens pour le raisonnement (`<think>` tags)
-- ~600-900 tokens pour la réponse finale
-- Total: ~1500 tokens pour des réponses complètes
-## Configuration actuelle
-```python
-# app/config.py
-max_tokens: int = 1500  # Pour modèles de raisonnement
-# app/models.py
-model_settings = ModelSettings(
-    max_output_tokens=settings.max_tokens,
-)
-finance_model = OpenAIModel(
-    ...,
-    model_settings=model_settings,
-)
-```
-## Recommandations par type de requête
-| Type de requête | max_tokens recommandé |
-|----------------|----------------------|
-| Questions simples | 800-1000 |
-| Génération SWIFT | 1200-1500 |
-| Analyse complexe | 1500-2000 |
-| Extraction structurée | 1000-1200 |
-## Comment ajuster pour un agent spécifique?
-Vous pouvez créer des agents avec des settings différents:
-```python
-from pydantic_ai import ModelSettings, Agent
-# Agent pour tâches courtes
-short_agent = Agent(
-    finance_model,
-    model_settings=ModelSettings(max_output_tokens=800),
-    system_prompt="..."
-)
-# Agent pour tâches longues (SWIFT, analyses)
-long_agent = Agent(
-    finance_model,
-    model_settings=ModelSettings(max_output_tokens=2000),
-    system_prompt="..."
-)
-```
-## Vérifier si la réponse est complète
-Notre utilitaire `extract_answer_from_reasoning()` dans `app/utils.py` gère automatiquement:
-- Extraction de la réponse après les balises `<think>`
-- Détection si la réponse est tronquée
-- Nettoyage des balises de raisonnement

docs/tool_calls_analysis_hf_space.md DELETED Viewed

@@ -1,257 +0,0 @@
-# Analyse : Pourquoi les Tool Calls ne Fonctionnent Pas
-## 🔍 Problème Identifié
-L'API Hugging Face Space **ne supporte PAS les tool calls** dans son implémentation actuelle.
-## 📋 Analyse du Code
-### 1. Modèle de Requête (`app/models/openai.py`)
-```python
-class ChatCompletionRequest(BaseModel):
-    model: Optional[str] = None
-    messages: List[Message]
-    temperature: Optional[float] = 0.7
-    max_tokens: Optional[int] = None
-    stream: Optional[bool] = False
-    top_p: Optional[float] = 1.0
-    # ❌ PAS de champ "tools"
-    # ❌ PAS de champ "tool_choice"
-```
-**Problème :** Le modèle Pydantic ne définit pas les champs `tools` et `tool_choice`, donc même si PydanticAI les envoie, ils sont **ignorés** par FastAPI.
-### 2. Modèle de Réponse (`app/models/openai.py`)
-```python
-class ChoiceMessage(BaseModel):
-    role: Literal["assistant"]
-    content: Optional[str] = None
-    # ❌ PAS de champ "tool_calls"
-```
-**Problème :** Le modèle de réponse ne définit pas le champ `tool_calls`, donc même si le modèle générait des tool calls, ils ne seraient **pas retournés** dans la réponse.
-### 3. Provider Transformers (`app/providers/transformers_provider.py`)
-```python
-async def chat(self, payload: Dict[str, Any], stream: bool = False):
-    messages = payload.get("messages", [])
-    temperature = payload.get("temperature", DEFAULT_TEMPERATURE)
-    max_tokens = payload.get("max_tokens", DEFAULT_MAX_TOKENS)
-    top_p = payload.get("top_p", DEFAULT_TOP_P)
-    # ❌ PAS d'extraction de "tools"
-    # ❌ PAS d'extraction de "tool_choice"
-    # Génère juste du texte
-    generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
-    return {
-        "choices": [{
-            "message": {"role": "assistant", "content": generated_text},
-            # ❌ PAS de "tool_calls"
-        }]
-    }
-```
-**Problème :** Le provider :
-1. N'extrait pas `tools` du payload
-2. Ne passe pas les tools au modèle
-3. Ne parse pas les tool calls de la réponse
-4. Ne retourne pas de `tool_calls` dans la réponse
-## 🔄 Flux Actuel
-```
-PydanticAI Agent
-    ↓ (envoie tools dans la requête)
-FastAPI Router
-    ↓ (parse avec ChatCompletionRequest - IGNORE tools)
-TransformersProvider
-    ↓ (n'extrait pas tools du payload)
-Qwen 8B Model
-    ↓ (génère du texte, pas de tool calls)
-TransformersProvider
-    ↓ (retourne juste content, pas tool_calls)
-FastAPI Router
-    ↓ (retourne ChoiceMessage sans tool_calls)
-PydanticAI Agent
-    ↓ (reçoit tool_calls = [])
-```
-## ✅ Solution : Ajouter le Support des Tool Calls
-### Étape 1 : Mettre à Jour le Modèle de Requête
-```python
-# app/models/openai.py
-from typing import List, Literal, Optional, Dict, Any
-from pydantic import BaseModel, Field
-class Function(BaseModel):
-    name: str
-    description: Optional[str] = None
-    parameters: Dict[str, Any]
-class Tool(BaseModel):
-    type: Literal["function"] = "function"
-    function: Function
-class ChatCompletionRequest(BaseModel):
-    model: Optional[str] = None
-    messages: List[Message]
-    temperature: Optional[float] = 0.7
-    max_tokens: Optional[int] = None
-    stream: Optional[bool] = False
-    top_p: Optional[float] = 1.0
-    tools: Optional[List[Tool]] = None  # ✅ AJOUTER
-    tool_choice: Optional[Union[Literal["none", "auto"], Dict[str, Any]]] = None  # ✅ AJOUTER
-```
-### Étape 2 : Mettre à Jour le Modèle de Réponse
-```python
-# app/models/openai.py
-class FunctionCall(BaseModel):
-    name: str
-    arguments: str  # JSON string
-class ToolCall(BaseModel):
-    id: str
-    type: Literal["function"] = "function"
-    function: FunctionCall
-class ChoiceMessage(BaseModel):
-    role: Literal["assistant"]
-    content: Optional[str] = None
-    tool_calls: Optional[List[ToolCall]] = None  # ✅ AJOUTER
-```
-### Étape 3 : Mettre à Jour le Provider
-Le provider doit :
-1. **Extraire les tools du payload**
-2. **Inclure les tools dans le prompt** (format spécial pour Qwen)
-3. **Parser la réponse** pour détecter les tool calls
-4. **Retourner les tool calls** dans la réponse
-**Option A : Format Textuel (Plus Simple)**
-Si le modèle génère des tool calls en texte, parser la réponse :
-```python
-def _parse_tool_calls(self, generated_text: str, tools: List[Tool]) -> List[ToolCall]:
-    """Parse tool calls from generated text."""
-    # Chercher des patterns comme:
-    # <tool_call>
-    # {"name": "calculer_valeur_future", "arguments": "{\"capital_initial\": 10000}"}
-    # </tool_call>
-    import re
-    import json
-    tool_calls = []
-    pattern = r'<tool_call>\s*({.*?})\s*</tool_call>'
-    matches = re.findall(pattern, generated_text, re.DOTALL)
-    for i, match in enumerate(matches):
-        try:
-            call_data = json.loads(match)
-            tool_calls.append(ToolCall(
-                id=f"call_{i}",
-                type="function",
-                function=FunctionCall(
-                    name=call_data["name"],
-                    arguments=json.dumps(call_data.get("arguments", {}))
-                )
-            ))
-        except Exception as e:
-            logger.warning(f"Failed to parse tool call: {e}")
-    return tool_calls
-```
-**Option B : Format JSON Structured Output**
-Si le modèle supporte le JSON mode, forcer un format structuré :
-```python
-# Dans le prompt, ajouter:
-# "You must respond in JSON format with tool_calls array"
-# Puis parser le JSON
-```
-### Étape 4 : Mettre à Jour le Router
-Le router doit passer les tools au provider :
-```python
-# app/routers/openai_api.py
-payload: Dict[str, Any] = {
-    "model": body.model or settings.model,
-    "messages": [m.model_dump() for m in body.messages],
-    "temperature": body.temperature or 0.7,
-    "top_p": body.top_p or 1.0,
-    "stream": body.stream or False,
-}
-# ✅ AJOUTER
-if body.tools:
-    payload["tools"] = [t.model_dump() for t in body.tools]
-if body.tool_choice:
-    payload["tool_choice"] = body.tool_choice
-```
-## 🎯 Stratégie de Mise en Œuvre
-### Phase 1 : Support Basique (Textuel)
-1. ✅ Ajouter `tools` et `tool_choice` au modèle de requête
-2. ✅ Ajouter `tool_calls` au modèle de réponse
-3. ✅ Parser les tool calls depuis le texte généré
-4. ✅ Retourner les tool calls dans la réponse
-### Phase 2 : Support Avancé (Structured Output)
-1. 🔄 Forcer le modèle à générer du JSON structuré
-2. 🔄 Parser le JSON pour extraire les tool calls
-3. 🔄 Valider les tool calls contre les tools fournis
-### Phase 3 : Support Complet (Native)
-1. 🎯 Fine-tuner le modèle pour générer des tool calls natifs
-2. 🎯 Utiliser un format de sortie spécialisé
-3. 🎯 Support complet du format OpenAI
-## 📝 Notes Importantes
-### Limitations du Modèle Qwen 8B
-Le modèle Qwen 8B fine-tuné peut :
-- ✅ Générer du texte qui mentionne les outils
-- ❌ Ne pas générer de tool calls au format OpenAI natif
-- ❌ Ne pas structurer la réponse avec `tool_calls`
-### Solutions de Contournement
-1. **Parser le texte** : Extraire les tool calls depuis le texte généré
-2. **Format spécialisé** : Utiliser un format de prompt spécial pour forcer les tool calls
-3. **Post-processing** : Analyser la réponse et exécuter les outils mentionnés
-## 🔗 Fichiers à Modifier
-1. `app/models/openai.py` : Ajouter `tools`, `tool_choice`, `tool_calls`
-2. `app/providers/transformers_provider.py` : Gérer les tools et parser les tool calls
-3. `app/routers/openai_api.py` : Passer les tools au provider
-4. Tests : Ajouter des tests pour les tool calls
-## 📚 Références
-- [OpenAI Tool Calls Format](https://platform.openai.com/docs/guides/function-calling)
-- [PydanticAI Tools Documentation](https://ai.pydantic.dev/tools/)
-- [Qwen Model Documentation](https://huggingface.co/Qwen)

tests/conftest.py CHANGED Viewed

@@ -1,7 +1,6 @@
 import os
 import sys
 # Ensure project root is on sys.path so `import app` works in tests
 ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
 if ROOT not in sys.path:

 import os
 import sys
 # Ensure project root is on sys.path so `import app` works in tests
 ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
 if ROOT not in sys.path:

tests/integration/__init__.py ADDED Viewed

	@@ -0,0 +1,10 @@

+"""Integration tests for the OpenAI-compatible API.
+These tests evaluate the model's ability to:
+- Produce valid JSON outputs
+- Execute tool calls correctly
+- Handle structured data requirements
+Critical for financial applications where tool execution and structured outputs are mandatory.
+"""

test_space_basic.py → tests/integration/test_space_basic.py RENAMED Viewed

File without changes

test_space_with_tools.py → tests/integration/test_space_with_tools.py RENAMED Viewed

File without changes

test_tool_calls.py → tests/integration/test_tool_calls.py RENAMED Viewed

File without changes

tests/test_openai_models.py CHANGED Viewed

@@ -53,7 +53,7 @@ def test_chat_completion_request_defaults():
     )
     assert request.model == "test-model"
-    assert request.temperature == 0.2
     assert request.max_tokens is None
     assert request.stream is False

     )
     assert request.model == "test-model"
+    assert request.temperature == 0.7  # Default temperature
     assert request.max_tokens is None
     assert request.stream is False