Spaces:

studzinsky
/

bielik_app_service

Running

App Files Files Community

Patryk Studzinski commited on 14 days ago

Commit

9079aae

1 Parent(s): cf748a3

adding md file to remove it

Browse files

Files changed (1) hide show

llm_app_rework.md +18 -17

llm_app_rework.md CHANGED Viewed

@@ -41,17 +41,17 @@ class BaseLLM(ABC):
 ```python
 MODELS = {
     "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
     "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
-    "mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
-    "gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
 }
 ```
 ### 4. Two Model Types
 | Type | Description | Use Case |
 |------|-------------|----------|
-| `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
-| `inference_api` | HuggingFace Inference API | Larger models (7B+) via API |
 ### 5. New Endpoints
@@ -67,16 +67,16 @@ MODELS = {
 {
     "domain": "cars",
     "data": {...},
-    "models": ["bielik-1.5b", "pllum-12b", "mistral-small-3", "gemma-2-9b"]
 }
 # Response
 {
     "results": [
         {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
-        {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"},
-        {"model": "mistral-small-3", "output": "...", "time": 0.9, "type": "inference_api"},
-        {"model": "gemma-2-9b", "output": "...", "time": 1.0, "type": "inference_api"}
     ]
 }
 ```
@@ -118,12 +118,12 @@ HF_TOKEN=hf_...  # For Inference API access
 ## Models (Approved)
-| Model | Size | Polish Support | HuggingFace ID |
-|-------|------|----------------|----------------|
-| Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
-| PLLuM-12B | 12B | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
-| Mistral-Small-3 | 24B | Good | mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
-| Gemma-2-9B | 9B | Medium | google/gemma-2-9b-it |
 ---
@@ -136,6 +136,7 @@ HF_TOKEN=hf_...  # For Inference API access
 ## Notes
 - All models = open source via HuggingFace
-- Local model = Bielik-1.5B (already works)
-- Larger models = HF Inference API (no local GPU needed)
-- HF_TOKEN needed for gated models (Gemma, etc)

 ```python
 MODELS = {
     "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
+    "qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
+    "gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
     "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
 }
 ```
 ### 4. Two Model Types
 | Type | Description | Use Case |
 |------|-------------|----------|
+| `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
+| `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
 ### 5. New Endpoints
 {
     "domain": "cars",
     "data": {...},
+    "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
 }
 # Response
 {
     "results": [
         {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
+        {"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
+        {"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
+        {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
     ]
 }
 ```
 ## Models (Approved)
+| Model | Size | Type | Polish Support | HuggingFace ID |
+|-------|------|------|----------------|----------------|
+| Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
+| Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
+| Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
+| PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
 ---
 ## Notes
 - All models = open source via HuggingFace
+- 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
+- 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
+- HF_TOKEN needed for API models and gated models (Gemma)
+- Focus on Polish language output comparison