Patryk Studzinski commited on
Commit
9079aae
·
1 Parent(s): cf748a3

adding md file to remove it

Browse files
Files changed (1) hide show
  1. llm_app_rework.md +18 -17
llm_app_rework.md CHANGED
@@ -41,17 +41,17 @@ class BaseLLM(ABC):
41
  ```python
42
  MODELS = {
43
  "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
 
 
44
  "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
45
- "mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
46
- "gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
47
  }
48
  ```
49
 
50
  ### 4. Two Model Types
51
  | Type | Description | Use Case |
52
  |------|-------------|----------|
53
- | `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
54
- | `inference_api` | HuggingFace Inference API | Larger models (7B+) via API |
55
 
56
  ### 5. New Endpoints
57
 
@@ -67,16 +67,16 @@ MODELS = {
67
  {
68
  "domain": "cars",
69
  "data": {...},
70
- "models": ["bielik-1.5b", "pllum-12b", "mistral-small-3", "gemma-2-9b"]
71
  }
72
 
73
  # Response
74
  {
75
  "results": [
76
  {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
77
- {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"},
78
- {"model": "mistral-small-3", "output": "...", "time": 0.9, "type": "inference_api"},
79
- {"model": "gemma-2-9b", "output": "...", "time": 1.0, "type": "inference_api"}
80
  ]
81
  }
82
  ```
@@ -118,12 +118,12 @@ HF_TOKEN=hf_... # For Inference API access
118
 
119
  ## Models (Approved)
120
 
121
- | Model | Size | Polish Support | HuggingFace ID |
122
- |-------|------|----------------|----------------|
123
- | Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
124
- | PLLuM-12B | 12B | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
125
- | Mistral-Small-3 | 24B | Good | mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
126
- | Gemma-2-9B | 9B | Medium | google/gemma-2-9b-it |
127
 
128
  ---
129
 
@@ -136,6 +136,7 @@ HF_TOKEN=hf_... # For Inference API access
136
 
137
  ## Notes
138
  - All models = open source via HuggingFace
139
- - Local model = Bielik-1.5B (already works)
140
- - Larger models = HF Inference API (no local GPU needed)
141
- - HF_TOKEN needed for gated models (Gemma, etc)
 
 
41
  ```python
42
  MODELS = {
43
  "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
44
+ "qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
45
+ "gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
46
  "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
 
 
47
  }
48
  ```
49
 
50
  ### 4. Two Model Types
51
  | Type | Description | Use Case |
52
  |------|-------------|----------|
53
+ | `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
54
+ | `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
55
 
56
  ### 5. New Endpoints
57
 
 
67
  {
68
  "domain": "cars",
69
  "data": {...},
70
+ "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
71
  }
72
 
73
  # Response
74
  {
75
  "results": [
76
  {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
77
+ {"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
78
+ {"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
79
+ {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
80
  ]
81
  }
82
  ```
 
118
 
119
  ## Models (Approved)
120
 
121
+ | Model | Size | Type | Polish Support | HuggingFace ID |
122
+ |-------|------|------|----------------|----------------|
123
+ | Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
124
+ | Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
125
+ | Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
126
+ | PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
127
 
128
  ---
129
 
 
136
 
137
  ## Notes
138
  - All models = open source via HuggingFace
139
+ - 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
140
+ - 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
141
+ - HF_TOKEN needed for API models and gated models (Gemma)
142
+ - Focus on Polish language output comparison