Spaces:
Running
Running
Patryk Studzinski
commited on
Commit
·
9079aae
1
Parent(s):
cf748a3
adding md file to remove it
Browse files- llm_app_rework.md +18 -17
llm_app_rework.md
CHANGED
|
@@ -41,17 +41,17 @@ class BaseLLM(ABC):
|
|
| 41 |
```python
|
| 42 |
MODELS = {
|
| 43 |
"bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
|
|
|
|
|
|
|
| 44 |
"pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
|
| 45 |
-
"mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
|
| 46 |
-
"gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
|
| 47 |
}
|
| 48 |
```
|
| 49 |
|
| 50 |
### 4. Two Model Types
|
| 51 |
| Type | Description | Use Case |
|
| 52 |
|------|-------------|----------|
|
| 53 |
-
| `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
|
| 54 |
-
| `inference_api` | HuggingFace Inference API |
|
| 55 |
|
| 56 |
### 5. New Endpoints
|
| 57 |
|
|
@@ -67,16 +67,16 @@ MODELS = {
|
|
| 67 |
{
|
| 68 |
"domain": "cars",
|
| 69 |
"data": {...},
|
| 70 |
-
"models": ["bielik-1.5b", "
|
| 71 |
}
|
| 72 |
|
| 73 |
# Response
|
| 74 |
{
|
| 75 |
"results": [
|
| 76 |
{"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
|
| 77 |
-
{"model": "
|
| 78 |
-
{"model": "
|
| 79 |
-
{"model": "
|
| 80 |
]
|
| 81 |
}
|
| 82 |
```
|
|
@@ -118,12 +118,12 @@ HF_TOKEN=hf_... # For Inference API access
|
|
| 118 |
|
| 119 |
## Models (Approved)
|
| 120 |
|
| 121 |
-
| Model | Size | Polish Support | HuggingFace ID |
|
| 122 |
-
|
| 123 |
-
| Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
|
| 124 |
-
|
|
| 125 |
-
|
|
| 126 |
-
|
|
| 127 |
|
| 128 |
---
|
| 129 |
|
|
@@ -136,6 +136,7 @@ HF_TOKEN=hf_... # For Inference API access
|
|
| 136 |
|
| 137 |
## Notes
|
| 138 |
- All models = open source via HuggingFace
|
| 139 |
-
-
|
| 140 |
-
-
|
| 141 |
-
- HF_TOKEN needed for gated models (Gemma
|
|
|
|
|
|
| 41 |
```python
|
| 42 |
MODELS = {
|
| 43 |
"bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
|
| 44 |
+
"qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
|
| 45 |
+
"gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
|
| 46 |
"pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
|
|
|
|
|
|
|
| 47 |
}
|
| 48 |
```
|
| 49 |
|
| 50 |
### 4. Two Model Types
|
| 51 |
| Type | Description | Use Case |
|
| 52 |
|------|-------------|----------|
|
| 53 |
+
| `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
|
| 54 |
+
| `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
|
| 55 |
|
| 56 |
### 5. New Endpoints
|
| 57 |
|
|
|
|
| 67 |
{
|
| 68 |
"domain": "cars",
|
| 69 |
"data": {...},
|
| 70 |
+
"models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
|
| 71 |
}
|
| 72 |
|
| 73 |
# Response
|
| 74 |
{
|
| 75 |
"results": [
|
| 76 |
{"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
|
| 77 |
+
{"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
|
| 78 |
+
{"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
|
| 79 |
+
{"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
|
| 80 |
]
|
| 81 |
}
|
| 82 |
```
|
|
|
|
| 118 |
|
| 119 |
## Models (Approved)
|
| 120 |
|
| 121 |
+
| Model | Size | Type | Polish Support | HuggingFace ID |
|
| 122 |
+
|-------|------|------|----------------|----------------|
|
| 123 |
+
| Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
|
| 124 |
+
| Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
|
| 125 |
+
| Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
|
| 126 |
+
| PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
|
| 127 |
|
| 128 |
---
|
| 129 |
|
|
|
|
| 136 |
|
| 137 |
## Notes
|
| 138 |
- All models = open source via HuggingFace
|
| 139 |
+
- 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
|
| 140 |
+
- 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
|
| 141 |
+
- HF_TOKEN needed for API models and gated models (Gemma)
|
| 142 |
+
- Focus on Polish language output comparison
|