Spaces:

studzinsky
/

bielik_app_service

Sleeping

App Files Files Community

Patryk Studzinski commited on 14 days ago

Commit

715794e

1 Parent(s): 9079aae

removing old md file

Browse files

Files changed (1) hide show

llm_app_rework.md +0 -142

llm_app_rework.md DELETED Viewed

@@ -1,142 +0,0 @@
-# LLM App Rework Plan
-## Goal
-Transform single-model app → multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
----
-## Current State
-- Single model: Bielik-1.5B (local HuggingFace)
-- Single domain: cars
-- No comparison capability
-## Target State
-- Multiple open-source LLMs via HuggingFace
-- Same prompt → multiple outputs → compare results
-- Support compression/decompression testing
----
-## Architecture Changes
-### 1. Model Registry
-```
-app/models/
-├── registry.py          # Model registry + factory
-├── base_llm.py          # Abstract base class
-├── huggingface_local.py # Refactored current service
-```
-### 2. Base LLM Interface
-```python
-class BaseLLM(ABC):
-    name: str
-    model_id: str
-    async def generate(prompt, **params) -> str
-    async def initialize() -> None
-    def is_initialized() -> bool
-```
-### 3. Model Registry
-```python
-MODELS = {
-    "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
-    "qwen2.5-3b": {"id": "Qwen/Qwen2.5-3B-Instruct", "type": "local"},
-    "gemma-2-2b": {"id": "google/gemma-2-2b-it", "type": "local"},
-    "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
-}
-```
-### 4. Two Model Types
-| Type | Description | Use Case |
-|------|-------------|----------|
-| `local` | Loaded in container memory | Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (small, fits in RAM) |
-| `inference_api` | HuggingFace Inference API | PLLuM-12B (too large for local) |
-### 5. New Endpoints
-| Endpoint | Purpose |
-|----------|---------|
-| `POST /enhance` | Single model (existing) |
-| `POST /compare` | Multiple models, return all outputs |
-| `GET /models` | List available models |
-### 6. Compare Request/Response
-```python
-# Request
-{
-    "domain": "cars",
-    "data": {...},
-    "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
-}
-# Response
-{
-    "results": [
-        {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
-        {"model": "qwen2.5-3b", "output": "...", "time": 1.8, "type": "local"},
-        {"model": "gemma-2-2b", "output": "...", "time": 1.5, "type": "local"},
-        {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"}
-    ]
-}
-```
----
-## Implementation Steps
-1. **Create base_llm.py** - abstract interface
-2. **Create huggingface_inference_api.py** - HF Inference API client
-3. **Refactor huggingface_service.py** → HuggingFaceLocal (implements BaseLLM)
-4. **Create registry.py** - model factory + config
-5. **Add /compare endpoint** in main.py
-6. **Add /models endpoint** - list available
-7. **Update schemas** - CompareRequest, CompareResponse
----
-## HuggingFace Inference API
-```python
-from huggingface_hub import InferenceClient
-client = InferenceClient(token=HF_TOKEN)
-response = client.text_generation(
-    model="mistralai/Mistral-7B-Instruct-v0.3",
-    prompt=formatted_prompt,
-    max_new_tokens=150
-)
-```
----
-## Env Vars (HuggingFace Secrets)
-```
-HF_TOKEN=hf_...  # For Inference API access
-```
----
-## Models (Approved)
-| Model | Size | Type | Polish Support | HuggingFace ID |
-|-------|------|------|----------------|----------------|
-| Bielik-1.5B | 1.5B | Local | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
-| Qwen2.5-3B | 3B | Local | Good | Qwen/Qwen2.5-3B-Instruct |
-| Gemma-2-2B | 2B | Local | Medium | google/gemma-2-2b-it |
-| PLLuM-12B | 12B | API | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
----
-## Priority
-1. HuggingFace Inference API integration
-2. /compare endpoint
-3. /models endpoint
----
-## Notes
-- All models = open source via HuggingFace
-- 3 local models = Bielik-1.5B, Qwen2.5-3B, Gemma-2-2B (fit in 16GB RAM)
-- 1 API model = PLLuM-12B (too large for HuggingFace Spaces free tier)
-- HF_TOKEN needed for API models and gated models (Gemma)
-- Focus on Polish language output comparison