Patryk Studzinski commited on
Commit
715794e
Β·
1 Parent(s): 9079aae

removing old md file

Browse files
Files changed (1) hide show
  1. llm_app_rework.md +0 -142
llm_app_rework.md DELETED
@@ -1,142 +0,0 @@
1
- # LLM App Rework Plan
2
-
3
- ## Goal
4
- Transform single-model app β†’ multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
5
-
6
- ---
7
-
8
- ## Current State
9
- - Single model: Bielik-1.5B (local HuggingFace)
10
- - Single domain: cars
11
- - No comparison capability
12
-
13
- ## Target State
14
- - Multiple open-source LLMs via HuggingFace
15
- - Same prompt β†’ multiple outputs β†’ compare results
16
- - Support compression/decompression testing
17
-
18
- ---
19
-
20
- ## Architecture Changes
21
-
22
- ### 1. Model Registry
23
- ```
24
- app/models/
25
- β”œβ”€β”€ registry.py # Model registry + factory
26
- β”œβ”€β”€ base_llm.py # Abstract base class
27
- β”œβ”€β”€ huggingface_local.py # Refactored current service
28
- ```
29
-
30
- ### 2. Base LLM Interface
31
- ```python
32
- class BaseLLM(ABC):
33
- name: str
34
- model_id: str
35
- async def generate(prompt, **params) -> str
36
- async def initialize() -> None
37
- def is_initialized() -> bool
38
- ```
39
-
40
- ### 3. Model Registry
41
- ```python
42
- MODELS = {
43
- "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
44
- "qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
45
- "gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
46
- "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
47
- }
48
- ```
49
-
50
- ### 4. Two Model Types
51
- | Type | Description | Use Case |
52
- |------|-------------|----------|
53
- | `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
54
- | `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
55
-
56
- ### 5. New Endpoints
57
-
58
- | Endpoint | Purpose |
59
- |----------|---------|
60
- | `POST /enhance` | Single model (existing) |
61
- | `POST /compare` | Multiple models, return all outputs |
62
- | `GET /models` | List available models |
63
-
64
- ### 6. Compare Request/Response
65
- ```python
66
- # Request
67
- {
68
- "domain": "cars",
69
- "data": {...},
70
- "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
71
- }
72
-
73
- # Response
74
- {
75
- "results": [
76
- {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
77
- {"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
78
- {"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
79
- {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
80
- ]
81
- }
82
- ```
83
-
84
- ---
85
-
86
- ## Implementation Steps
87
-
88
- 1. **Create base_llm.py** - abstract interface
89
- 2. **Create huggingface_inference_api.py** - HF Inference API client
90
- 3. **Refactor huggingface_service.py** β†’ HuggingFaceLocal (implements BaseLLM)
91
- 4. **Create registry.py** - model factory + config
92
- 5. **Add /compare endpoint** in main.py
93
- 6. **Add /models endpoint** - list available
94
- 7. **Update schemas** - CompareRequest, CompareResponse
95
-
96
- ---
97
-
98
- ## HuggingFace Inference API
99
- ```python
100
- from huggingface_hub import InferenceClient
101
-
102
- client = InferenceClient(token=HF_TOKEN)
103
- response = client.text_generation(
104
- model="mistralai/Mistral-7B-Instruct-v0.3",
105
- prompt=formatted_prompt,
106
- max_new_tokens=150
107
- )
108
- ```
109
-
110
- ---
111
-
112
- ## Env Vars (HuggingFace Secrets)
113
- ```
114
- HF_TOKEN=hf_... # For Inference API access
115
- ```
116
-
117
- ---
118
-
119
- ## Models (Approved)
120
-
121
- | Model | Size | Type | Polish Support | HuggingFace ID |
122
- |-------|------|------|----------------|----------------|
123
- | Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
124
- | Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
125
- | Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
126
- | PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
127
-
128
- ---
129
-
130
- ## Priority
131
- 1. HuggingFace Inference API integration
132
- 2. /compare endpoint
133
- 3. /models endpoint
134
-
135
- ---
136
-
137
- ## Notes
138
- - All models = open source via HuggingFace
139
- - 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
140
- - 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
141
- - HF_TOKEN needed for API models and gated models (Gemma)
142
- - Focus on Polish language output comparison