Text Generation
GGUF
English
code
svelte
sveltekit
svelte-5
runes
code-generation
qwen3
lora
conversational
Instructions to use rockypod/svelte-coder-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rockypod/svelte-coder-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/svelte-coder-8b", filename="svelte-coder-v0.9.0-8b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rockypod/svelte-coder-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/svelte-coder-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/svelte-coder-8b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/svelte-coder-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/svelte-coder-8b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/svelte-coder-8b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/svelte-coder-8b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/svelte-coder-8b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/svelte-coder-8b:Q4_K_M
Use Docker
docker model run hf.co/rockypod/svelte-coder-8b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/svelte-coder-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/svelte-coder-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/svelte-coder-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/svelte-coder-8b:Q4_K_M
- Ollama
How to use rockypod/svelte-coder-8b with Ollama:
ollama run hf.co/rockypod/svelte-coder-8b:Q4_K_M
- Unsloth Studio new
How to use rockypod/svelte-coder-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/svelte-coder-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/svelte-coder-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/svelte-coder-8b to start chatting
- Pi new
How to use rockypod/svelte-coder-8b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/svelte-coder-8b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/svelte-coder-8b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/svelte-coder-8b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/svelte-coder-8b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/svelte-coder-8b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/svelte-coder-8b with Docker Model Runner:
docker model run hf.co/rockypod/svelte-coder-8b:Q4_K_M
- Lemonade
How to use rockypod/svelte-coder-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/svelte-coder-8b:Q4_K_M
Run and chat with the model
lemonade run user.svelte-coder-8b-Q4_K_M
List all available models
lemonade list
| license: mit | |
| language: | |
| - en | |
| - code | |
| library_name: gguf | |
| pipeline_tag: text-generation | |
| tags: | |
| - svelte | |
| - sveltekit | |
| - svelte-5 | |
| - runes | |
| - code-generation | |
| - gguf | |
| - qwen3 | |
| - lora | |
| base_model: Qwen/Qwen3-8B | |
| base_model_relation: finetune | |
| # Svelte Coder 8B (v0.9.0) | |
| A Svelte 5 / SvelteKit 2 specialist coding model — **8B variant**. | |
| Free to use under MIT. Built by [rockypod](https://rockypod.com) on a | |
| homelab RTX 3090 Ti using continuous retrieval-augmented fine-tuning | |
| (RAFT) and a correction-stream methodology. | |
| This is the **8B variant** for hardware where the 14B doesn't fit. | |
| For best benchmark results, use the [14B variant](https://huggingface.co/rockypod/svelte-coder) | |
| when the hardware allows. | |
| **[14B (recommended)](https://huggingface.co/rockypod/svelte-coder)** · | |
| **[4B (lightweight)](https://huggingface.co/rockypod/svelte-coder-4b)** · | |
| **[GitHub — exam, integration guides, transparency](https://github.com/rockypod/svelte-coder)** | |
| ## Benchmark | |
| | Instrument | Score | | |
| |---|---| | |
| | 30Q spot exam | **82.8%** (36.0 / 43.5 weighted) | | |
| | 204Q in-scope (rescored) | 74.68% (145 / 190 raw) | | |
| For comparison, the 14B variant scores 100% / 70.11% on the same | |
| instruments. The 30Q is the cleaner grader; the 204Q has known | |
| keyword-matching artifacts. See the | |
| [main README](https://huggingface.co/rockypod/svelte-coder/blob/main/README.md) | |
| for the full two-exams discussion. | |
| ## Hardware requirements | |
| - **VRAM:** ~5 GB (Q4_K_M GGUF), runs on most consumer GPUs | |
| (RTX 3060 12GB, RTX 4060 8GB+ with offloading, Apple Silicon 8GB+) | |
| - **Context length:** 8192 | |
| - **Recommended use case:** systems where the 14B variant (~8.4 GB) | |
| doesn't fit in available VRAM | |
| ## Files | |
| - `svelte-coder-v0.9.0-8b-q4_k_m.gguf` — 4-bit quantized weights (~5 GB) | |
| ## Usage | |
| ### Ollama | |
| ```bash | |
| ollama pull rockypod/svelte-coder:8b | |
| ollama run rockypod/svelte-coder:8b "Write a Svelte 5 counter with $state and $derived" | |
| ``` | |
| ### LM Studio / llama.cpp | |
| Download `svelte-coder-v0.9.0-8b-q4_k_m.gguf` and load with the | |
| production parameters: temperature 0.2, num_ctx 8192, num_predict 1500, | |
| repeat_penalty 1.5. Use the ChatML template: | |
| ``` | |
| <|im_start|>system | |
| You are SvelteCoder, an expert Svelte 5 / SvelteKit 2 coding assistant. Answer the question with complete, production-quality code.<|im_end|> | |
| <|im_start|>user | |
| Your question<|im_end|> | |
| <|im_start|>assistant | |
| <think> | |
| ``` | |
| ## Limitations specific to the 8B | |
| - **Svelte 4 echo trap is more frequent than on the 14B.** The 8B has | |
| less capacity to override Qwen3-8B's pretrained Svelte 4 reflexes, | |
| particularly on T1 (Runes) and T13 (DaisyUI) fix-this-snippet | |
| questions. Review output for `export let`, `on:click`, `<slot>` | |
| patterns when modernizing Svelte 4 code. | |
| - All other limitations from the [main README](https://huggingface.co/rockypod/svelte-coder/blob/main/README.md) | |
| apply. | |
| ## Apple Silicon note | |
| MLX builds for Apple Silicon are not included in v0.9.0 for the 8B and | |
| 4B variants. Apple Silicon users are recommended to use the 14B variant, | |
| which includes MLX 4-bit weights. | |
| ## License & Attribution | |
| **Fine-tuning work licensed under the MIT License** — see | |
| [LICENSE](LICENSE) in the GitHub repo. | |
| **Base model and teacher model are licensed under Apache 2.0** — see | |
| [LICENSE-APACHE](LICENSE-APACHE) and [NOTICE](NOTICE): | |
| - Base: [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) — © Alibaba Cloud | |
| - Teacher: [Qwen3-Coder-Next 80B](https://huggingface.co/Qwen/Qwen3-Coder-Next) — © Alibaba Cloud | |
| The 8B Svelte Coder weights are a derivative work of Qwen3-8B, | |
| fine-tuned via LoRA adapters on the v1.5 Svelte 5 / SvelteKit 2 | |
| specialist dataset (1,508 entries). | |