Spaces:
Sleeping
Sleeping
Patryk Studzinski
commited on
Commit
Β·
715794e
1
Parent(s):
9079aae
removing old md file
Browse files- llm_app_rework.md +0 -142
llm_app_rework.md
DELETED
|
@@ -1,142 +0,0 @@
|
|
| 1 |
-
# LLM App Rework Plan
|
| 2 |
-
|
| 3 |
-
## Goal
|
| 4 |
-
Transform single-model app β multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
|
| 5 |
-
|
| 6 |
-
---
|
| 7 |
-
|
| 8 |
-
## Current State
|
| 9 |
-
- Single model: Bielik-1.5B (local HuggingFace)
|
| 10 |
-
- Single domain: cars
|
| 11 |
-
- No comparison capability
|
| 12 |
-
|
| 13 |
-
## Target State
|
| 14 |
-
- Multiple open-source LLMs via HuggingFace
|
| 15 |
-
- Same prompt β multiple outputs β compare results
|
| 16 |
-
- Support compression/decompression testing
|
| 17 |
-
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
## Architecture Changes
|
| 21 |
-
|
| 22 |
-
### 1. Model Registry
|
| 23 |
-
```
|
| 24 |
-
app/models/
|
| 25 |
-
βββ registry.py # Model registry + factory
|
| 26 |
-
βββ base_llm.py # Abstract base class
|
| 27 |
-
βββ huggingface_local.py # Refactored current service
|
| 28 |
-
```
|
| 29 |
-
|
| 30 |
-
### 2. Base LLM Interface
|
| 31 |
-
```python
|
| 32 |
-
class BaseLLM(ABC):
|
| 33 |
-
name: str
|
| 34 |
-
model_id: str
|
| 35 |
-
async def generate(prompt, **params) -> str
|
| 36 |
-
async def initialize() -> None
|
| 37 |
-
def is_initialized() -> bool
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
### 3. Model Registry
|
| 41 |
-
```python
|
| 42 |
-
MODELS = {
|
| 43 |
-
"bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
|
| 44 |
-
"qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
|
| 45 |
-
"gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
|
| 46 |
-
"pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
|
| 47 |
-
}
|
| 48 |
-
```
|
| 49 |
-
|
| 50 |
-
### 4. Two Model Types
|
| 51 |
-
| Type | Description | Use Case |
|
| 52 |
-
|------|-------------|----------|
|
| 53 |
-
| `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
|
| 54 |
-
| `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
|
| 55 |
-
|
| 56 |
-
### 5. New Endpoints
|
| 57 |
-
|
| 58 |
-
| Endpoint | Purpose |
|
| 59 |
-
|----------|---------|
|
| 60 |
-
| `POST /enhance` | Single model (existing) |
|
| 61 |
-
| `POST /compare` | Multiple models, return all outputs |
|
| 62 |
-
| `GET /models` | List available models |
|
| 63 |
-
|
| 64 |
-
### 6. Compare Request/Response
|
| 65 |
-
```python
|
| 66 |
-
# Request
|
| 67 |
-
{
|
| 68 |
-
"domain": "cars",
|
| 69 |
-
"data": {...},
|
| 70 |
-
"models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
|
| 71 |
-
}
|
| 72 |
-
|
| 73 |
-
# Response
|
| 74 |
-
{
|
| 75 |
-
"results": [
|
| 76 |
-
{"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
|
| 77 |
-
{"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
|
| 78 |
-
{"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
|
| 79 |
-
{"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
|
| 80 |
-
]
|
| 81 |
-
}
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
---
|
| 85 |
-
|
| 86 |
-
## Implementation Steps
|
| 87 |
-
|
| 88 |
-
1. **Create base_llm.py** - abstract interface
|
| 89 |
-
2. **Create huggingface_inference_api.py** - HF Inference API client
|
| 90 |
-
3. **Refactor huggingface_service.py** β HuggingFaceLocal (implements BaseLLM)
|
| 91 |
-
4. **Create registry.py** - model factory + config
|
| 92 |
-
5. **Add /compare endpoint** in main.py
|
| 93 |
-
6. **Add /models endpoint** - list available
|
| 94 |
-
7. **Update schemas** - CompareRequest, CompareResponse
|
| 95 |
-
|
| 96 |
-
---
|
| 97 |
-
|
| 98 |
-
## HuggingFace Inference API
|
| 99 |
-
```python
|
| 100 |
-
from huggingface_hub import InferenceClient
|
| 101 |
-
|
| 102 |
-
client = InferenceClient(token=HF_TOKEN)
|
| 103 |
-
response = client.text_generation(
|
| 104 |
-
model="mistralai/Mistral-7B-Instruct-v0.3",
|
| 105 |
-
prompt=formatted_prompt,
|
| 106 |
-
max_new_tokens=150
|
| 107 |
-
)
|
| 108 |
-
```
|
| 109 |
-
|
| 110 |
-
---
|
| 111 |
-
|
| 112 |
-
## Env Vars (HuggingFace Secrets)
|
| 113 |
-
```
|
| 114 |
-
HF_TOKEN=hf_... # For Inference API access
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
---
|
| 118 |
-
|
| 119 |
-
## Models (Approved)
|
| 120 |
-
|
| 121 |
-
| Model | Size | Type | Polish Support | HuggingFace ID |
|
| 122 |
-
|-------|------|------|----------------|----------------|
|
| 123 |
-
| Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
|
| 124 |
-
| Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
|
| 125 |
-
| Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
|
| 126 |
-
| PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
|
| 127 |
-
|
| 128 |
-
---
|
| 129 |
-
|
| 130 |
-
## Priority
|
| 131 |
-
1. HuggingFace Inference API integration
|
| 132 |
-
2. /compare endpoint
|
| 133 |
-
3. /models endpoint
|
| 134 |
-
|
| 135 |
-
---
|
| 136 |
-
|
| 137 |
-
## Notes
|
| 138 |
-
- All models = open source via HuggingFace
|
| 139 |
-
- 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
|
| 140 |
-
- 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
|
| 141 |
-
- HF_TOKEN needed for API models and gated models (Gemma)
|
| 142 |
-
- Focus on Polish language output comparison
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|