Instructions to use forlop/microdata-copilot-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use forlop/microdata-copilot-v2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="forlop/microdata-copilot-v2", filename="microdata-copilot-v2-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use forlop/microdata-copilot-v2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use Docker
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use forlop/microdata-copilot-v2 with Ollama:
ollama run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Unsloth Studio new
How to use forlop/microdata-copilot-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for forlop/microdata-copilot-v2 to start chatting
- Pi new
How to use forlop/microdata-copilot-v2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "forlop/microdata-copilot-v2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use forlop/microdata-copilot-v2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default forlop/microdata-copilot-v2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use forlop/microdata-copilot-v2 with Docker Model Runner:
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Lemonade
How to use forlop/microdata-copilot-v2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull forlop/microdata-copilot-v2:Q4_K_M
Run and chat with the model
lemonade run user.microdata-copilot-v2-Q4_K_M
List all available models
lemonade list
File size: 5,792 Bytes
5348691 d0897b5 5348691 d0897b5 5348691 d0897b5 5348691 d0897b5 5348691 d0897b5 5348691 d0897b5 5348691 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | ---
license: mit
language:
- en
- 'no'
base_model: unsloth/Qwen3.5-4B
tags:
- microdata.no
- ssb
- norwegian
- register-data
- lora
- gguf
- rag
- ollama
library_name: gguf
---
# microdata.no copilot β v2.0 (q4_k_m GGUF)
A small, locally-deployable AI assistant fine-tuned to help users write
[microdata.no](https://microdata.no) scripts and answer questions about
Norwegian register-data variables published by [SSB (Statistics
Norway)](https://www.ssb.no/).
This repo hosts the deployed **q4_k_m quantised GGUF** (2.7 GB) and the
companion **Ollama `Modelfile`**. The full source code (training, RAG,
eval, deployment) and the technical note live at
**<https://github.com/forlop/microdata-no-copilot>**.
## Quick start
```bash
# Install Ollama if you don't have it yet:
# Linux/WSL: curl -fsSL https://ollama.com/install.sh | sh
# macOS: brew install ollama (or download from ollama.com)
# Windows: download OllamaSetup.exe from ollama.com
# 1. Pull the base GGUF from this repo (~2.7 GB, one-time)
ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M
# 2. Clone the GitHub repo (contains the Modelfile + RAG layer)
git clone https://github.com/forlop/microdata-no-copilot
cd microdata-no-copilot
# 3. Apply the SYSTEM prompt + refusal few-shots + stop-token parameters
ollama create microdata-copilot -f deploy/Modelfile
# 4. Try it
ollama run microdata-copilot "What is INNTEKT_LONN?"
```
> **Why two steps?** `ollama pull` from Hugging Face downloads the raw
> GGUF plus the chat template embedded in its metadata β but **not** the
> custom Modelfile in this repo. Ollama only applies curated Modelfiles
> for models in its official library. For HF-hosted models, you apply
> your own Modelfile locally via `ollama create`. Without step 3, the
> model bleeds `<|endoftext|>` tokens and loops. With it, you get the
> full deployed configuration (system prompt, refusal patterns, stop
> tokens, greedy decoding).
## Full RAG-wrapped Streamlit demo
```bash
# After the four steps above, from the cloned repo directory:
pip install -r requirements.txt streamlit
streamlit run rag/app.py
```
Streamlit prints a `http://localhost:8501` URL β open it in your browser.
On CPU expect ~10β15 s per response; on a recent GPU, ~1β2 s.
## What this is
- **Base model:** Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release).
- **Fine-tuning:** rank-32 LoRA, 3 epochs, ~1.5 h on a single 16 GB RTX 5070 Ti.
- **Training corpus:** ~1,400 cards distilled from 729 microdata.no variables,
~100 manual sections, 40 example scripts, plus refusal/abstention cards.
- **Deployed quantisation:** q4_k_m via llama.cpp (2.7 GB on disk, runs on CPU
or GPU).
- **Designed for:** local deployment behind a thin retrieval layer (FAISS dense
+ BM25 sparse + Reciprocal Rank Fusion). All data stays on the user's machine;
no API calls leave the network.
## Honest evaluation
Measured under strict held-out + adversarial evaluation (80 prompts written
after the model was frozen, LLM-judge scorer with rubric locked before
seeing responses, syntax validator catching fictional commands):
| Class | Pass rate | What it measures |
|---|---|---|
| JAILBREAK | **100% (5/5)** | Refusing role-override, system-prompt extraction, confidentiality bypass |
| RAG (variable lookup) | **80% (8/10)** | Variable definitions, populations, valid periods β when retrieval succeeds |
| LANG (language matching) | **80% (4/5)** | Norwegian Q β Norwegian A, English Q β English A |
| SCRIPT (write a script) | 33% (5/15) | Real commands; failures are fabricated variable names |
| MANUAL (explain a command) | 29% (2/7) | Some command explanations are vague or partial |
| STALE (admit "I don't know") | **0% (0/5)** | Calibration weakness β doesn't say "I don't know" when it should |
| **Overall** | **53.8% (43/80)** | Strict-eval pass rate |
Refusal and jailbreak resistance are essentially solid. Retrieval-grounded
lookup works when retrieval succeeds. The model's main failure mode is
fabricating variable names when asked to *suggest* one (rather than confirm
a known one), and not calibrating uncertainty well.
A lenient substring-based scorer on a 46-prompt iteration set reports
**82.6%** β that's real but it measures performance on prompts we iterated
*against*. The 53.8% is the honest out-of-sample number.
Full evaluation methodology and class-level breakdown:
[TECHNICAL_NOTE.md Β§17](https://github.com/forlop/microdata-no-copilot/blob/main/TECHNICAL_NOTE.md#17-deployed-system-eval-strict-held-out--adversarial)
on GitHub.
## Limitations
- **Not a finished product.** 53.8% strict pass-rate is below what a
researcher can rely on without verification. Treat as a research preview.
- **Variable name hallucination.** When asked to suggest variables for a
task (rather than confirm a specific one), the model invents plausible
but non-existent names. The RAG layer mitigates this when the user names
a variable; it doesn't fix open-ended suggestion.
- **Domain-specific.** This model is useful only for microdata.no scripting
and SSB register-data variables. It is not a general-purpose chatbot.
- **Single-turn training.** The cards are single-turn user/assistant pairs.
Multi-turn behaviour is emergent and degrades faster than a chat-tuned
foundation model would. The CLI/Streamlit front-ends use small windows
(3 exchanges) to compensate.
## Citation
If you reference this work:
```bibtex
@misc{zhang2026microdata,
title = {microdata.no copilot: a locally-deployed LoRA + RAG assistant for SSB register data},
author = {Tao Zhang},
year = {2026},
url = {https://github.com/forlop/microdata-no-copilot}
}
```
## License
MIT. See [LICENSE](https://github.com/forlop/microdata-no-copilot/blob/main/LICENSE).
|