Text Generation
GGUF
English
code
svelte
sveltekit
svelte-5
runes
code-generation
qwen3
lora
conversational
Instructions to use rockypod/svelte-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rockypod/svelte-coder-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/svelte-coder-4b", filename="svelte-coder-v0.9.0-4b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rockypod/svelte-coder-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/svelte-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/svelte-coder-4b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/svelte-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/svelte-coder-4b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/svelte-coder-4b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/svelte-coder-4b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/svelte-coder-4b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/svelte-coder-4b:Q4_K_M
Use Docker
docker model run hf.co/rockypod/svelte-coder-4b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/svelte-coder-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/svelte-coder-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/svelte-coder-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/svelte-coder-4b:Q4_K_M
- Ollama
How to use rockypod/svelte-coder-4b with Ollama:
ollama run hf.co/rockypod/svelte-coder-4b:Q4_K_M
- Unsloth Studio new
How to use rockypod/svelte-coder-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/svelte-coder-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/svelte-coder-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/svelte-coder-4b to start chatting
- Pi new
How to use rockypod/svelte-coder-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/svelte-coder-4b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/svelte-coder-4b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/svelte-coder-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/svelte-coder-4b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/svelte-coder-4b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/svelte-coder-4b with Docker Model Runner:
docker model run hf.co/rockypod/svelte-coder-4b:Q4_K_M
- Lemonade
How to use rockypod/svelte-coder-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/svelte-coder-4b:Q4_K_M
Run and chat with the model
lemonade run user.svelte-coder-4b-Q4_K_M
List all available models
lemonade list
| license: mit | |
| language: | |
| - en | |
| - code | |
| library_name: gguf | |
| pipeline_tag: text-generation | |
| tags: | |
| - svelte | |
| - sveltekit | |
| - svelte-5 | |
| - runes | |
| - code-generation | |
| - gguf | |
| - qwen3 | |
| - lora | |
| base_model: Qwen/Qwen3-4B | |
| base_model_relation: finetune | |
| # Svelte Coder 4B (v0.9.0) | |
| A Svelte 5 / SvelteKit 2 specialist coding model — **4B variant**. | |
| Free to use under MIT. Built by [rockypod](https://rockypod.com) on a | |
| homelab RTX 3090 Ti using continuous retrieval-augmented fine-tuning | |
| (RAFT) and a correction-stream methodology. | |
| This is the **4B variant** for edge hardware. For best benchmark | |
| results, use the [14B variant](https://huggingface.co/rockypod/svelte-coder) | |
| when the hardware allows. | |
| **[14B (recommended)](https://huggingface.co/rockypod/svelte-coder)** · | |
| **[8B (mid-tier)](https://huggingface.co/rockypod/svelte-coder-8b)** · | |
| **[GitHub — exam, integration guides, transparency](https://github.com/rockypod/svelte-coder)** | |
| ## Benchmark | |
| | Instrument | Score | | |
| |---|---| | |
| | 30Q spot exam | **79.3%** (34.5 / 43.5 weighted) | | |
| | 204Q in-scope (rescored) | 67.81% (131 / 190 raw) | | |
| For comparison, the 14B variant scores 100% / 70.11% on the same | |
| instruments. The 4B trades capability for accessibility on edge | |
| hardware. See the | |
| [main README](https://huggingface.co/rockypod/svelte-coder/blob/main/README.md) | |
| for the full two-exams discussion. | |
| ## Hardware requirements | |
| - **VRAM:** ~3 GB (Q4_K_M GGUF), runs on entry-level GPUs and | |
| Apple Silicon devices with limited memory | |
| - **Context length:** 8192 | |
| - **Recommended use case:** edge hardware where even the 8B variant | |
| is too large; iOS/Android via llama.cpp; constrained Linux servers | |
| ## Files | |
| - `svelte-coder-v0.9.0-4b-q4_k_m.gguf` — 4-bit quantized weights (~3 GB) | |
| ## Usage | |
| ### Ollama | |
| ```bash | |
| ollama pull rockypod/svelte-coder:4b | |
| ollama run rockypod/svelte-coder:4b "Write a Svelte 5 counter with $state and $derived" | |
| ``` | |
| ### LM Studio / llama.cpp | |
| Download `svelte-coder-v0.9.0-4b-q4_k_m.gguf` and load with the | |
| production parameters: temperature 0.2, num_ctx 8192, num_predict 1500, | |
| repeat_penalty 1.5. Use the ChatML template: | |
| ``` | |
| <|im_start|>system | |
| You are SvelteCoder, an expert Svelte 5 / SvelteKit 2 coding assistant. Answer the question with complete, production-quality code.<|im_end|> | |
| <|im_start|>user | |
| Your question<|im_end|> | |
| <|im_start|>assistant | |
| <think> | |
| ``` | |
| ## Limitations specific to the 4B | |
| - **Svelte 4 echo trap is most frequent on this variant.** The 4B has | |
| the least capacity to override Qwen3-4B's pretrained Svelte 4 | |
| reflexes. T1 (Runes) and T4 (WCAG/ARIA) fix-this-snippet questions | |
| show 1/3 pass rates on the 30Q spot. Review output for `export let`, | |
| `on:click`, `<slot>` patterns when modernizing Svelte 4 code, and | |
| prefer the 8B or 14B if Svelte 4 conversion is a primary use case. | |
| - **Hard reasoning weaker than larger variants.** T6 multi-step | |
| refactors are weaker on the 4B than on the 8B or 14B. Use the larger | |
| variants for architectural decisions or complex refactors. | |
| - All other limitations from the [main README](https://huggingface.co/rockypod/svelte-coder/blob/main/README.md) | |
| apply. | |
| ## Apple Silicon note | |
| MLX builds for Apple Silicon are not included in v0.9.0 for the 8B and | |
| 4B variants. Apple Silicon users are recommended to use the 14B variant, | |
| which includes MLX 4-bit weights. | |
| ## License & Attribution | |
| **Fine-tuning work licensed under the MIT License** — see | |
| [LICENSE](LICENSE) in the GitHub repo. | |
| **Base model and teacher model are licensed under Apache 2.0** — see | |
| [LICENSE-APACHE](LICENSE-APACHE) and [NOTICE](NOTICE): | |
| - Base: [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) — © Alibaba Cloud | |
| - Teacher: [Qwen3-Coder-Next 80B](https://huggingface.co/Qwen/Qwen3-Coder-Next) — © Alibaba Cloud | |
| The 4B Svelte Coder weights are a derivative work of Qwen3-4B, | |
| fine-tuned via LoRA adapters on the v1.5 Svelte 5 / SvelteKit 2 | |
| specialist dataset (1,508 entries). | |