Instructions to use forlop/microdata-copilot-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use forlop/microdata-copilot-v2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="forlop/microdata-copilot-v2", filename="microdata-copilot-v2-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use forlop/microdata-copilot-v2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use Docker
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use forlop/microdata-copilot-v2 with Ollama:
ollama run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Unsloth Studio new
How to use forlop/microdata-copilot-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for forlop/microdata-copilot-v2 to start chatting
- Pi new
How to use forlop/microdata-copilot-v2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "forlop/microdata-copilot-v2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use forlop/microdata-copilot-v2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default forlop/microdata-copilot-v2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use forlop/microdata-copilot-v2 with Docker Model Runner:
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Lemonade
How to use forlop/microdata-copilot-v2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull forlop/microdata-copilot-v2:Q4_K_M
Run and chat with the model
lemonade run user.microdata-copilot-v2-Q4_K_M
List all available models
lemonade list
microdata.no copilot — v2.0 (q4_k_m GGUF)
A small, locally-deployable AI assistant fine-tuned to help users write microdata.no scripts and answer questions about Norwegian register-data variables published by SSB (Statistics Norway).
This repo hosts the deployed q4_k_m quantised GGUF (2.7 GB) and the
companion Ollama Modelfile. The full source code (training, RAG,
eval, deployment) and the technical note live at
https://github.com/forlop/microdata-no-copilot.
Quick start
# Install Ollama if you don't have it yet:
# Linux/WSL: curl -fsSL https://ollama.com/install.sh | sh
# macOS: brew install ollama (or download from ollama.com)
# Windows: download OllamaSetup.exe from ollama.com
# 1. Pull the base GGUF from this repo (~2.7 GB, one-time)
ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M
# 2. Clone the GitHub repo (contains the Modelfile + RAG layer)
git clone https://github.com/forlop/microdata-no-copilot
cd microdata-no-copilot
# 3. Apply the SYSTEM prompt + refusal few-shots + stop-token parameters
ollama create microdata-copilot -f deploy/Modelfile
# 4. Try it
ollama run microdata-copilot "What is INNTEKT_LONN?"
Why two steps?
ollama pullfrom Hugging Face downloads the raw GGUF plus the chat template embedded in its metadata — but not the custom Modelfile in this repo. Ollama only applies curated Modelfiles for models in its official library. For HF-hosted models, you apply your own Modelfile locally viaollama create. Without step 3, the model bleeds<|endoftext|>tokens and loops. With it, you get the full deployed configuration (system prompt, refusal patterns, stop tokens, greedy decoding).
Full RAG-wrapped Streamlit demo
# After the four steps above, from the cloned repo directory:
pip install -r requirements.txt streamlit
streamlit run rag/app.py
Streamlit prints a http://localhost:8501 URL — open it in your browser.
On CPU expect ~10–15 s per response; on a recent GPU, ~1–2 s.
What this is
- Base model: Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release).
- Fine-tuning: rank-32 LoRA, 3 epochs, ~1.5 h on a single 16 GB RTX 5070 Ti.
- Training corpus: ~1,400 cards distilled from 729 microdata.no variables, ~100 manual sections, 40 example scripts, plus refusal/abstention cards.
- Deployed quantisation: q4_k_m via llama.cpp (2.7 GB on disk, runs on CPU or GPU).
- Designed for: local deployment behind a thin retrieval layer (FAISS dense
- BM25 sparse + Reciprocal Rank Fusion). All data stays on the user's machine; no API calls leave the network.
Honest evaluation
Measured under strict held-out + adversarial evaluation (80 prompts written after the model was frozen, LLM-judge scorer with rubric locked before seeing responses, syntax validator catching fictional commands):
| Class | Pass rate | What it measures |
|---|---|---|
| JAILBREAK | 100% (5/5) | Refusing role-override, system-prompt extraction, confidentiality bypass |
| RAG (variable lookup) | 80% (8/10) | Variable definitions, populations, valid periods — when retrieval succeeds |
| LANG (language matching) | 80% (4/5) | Norwegian Q → Norwegian A, English Q → English A |
| SCRIPT (write a script) | 33% (5/15) | Real commands; failures are fabricated variable names |
| MANUAL (explain a command) | 29% (2/7) | Some command explanations are vague or partial |
| STALE (admit "I don't know") | 0% (0/5) | Calibration weakness — doesn't say "I don't know" when it should |
| Overall | 53.8% (43/80) | Strict-eval pass rate |
Refusal and jailbreak resistance are essentially solid. Retrieval-grounded lookup works when retrieval succeeds. The model's main failure mode is fabricating variable names when asked to suggest one (rather than confirm a known one), and not calibrating uncertainty well.
A lenient substring-based scorer on a 46-prompt iteration set reports 82.6% — that's real but it measures performance on prompts we iterated against. The 53.8% is the honest out-of-sample number.
Full evaluation methodology and class-level breakdown: TECHNICAL_NOTE.md §17 on GitHub.
Limitations
- Not a finished product. 53.8% strict pass-rate is below what a researcher can rely on without verification. Treat as a research preview.
- Variable name hallucination. When asked to suggest variables for a task (rather than confirm a specific one), the model invents plausible but non-existent names. The RAG layer mitigates this when the user names a variable; it doesn't fix open-ended suggestion.
- Domain-specific. This model is useful only for microdata.no scripting and SSB register-data variables. It is not a general-purpose chatbot.
- Single-turn training. The cards are single-turn user/assistant pairs. Multi-turn behaviour is emergent and degrades faster than a chat-tuned foundation model would. The CLI/Streamlit front-ends use small windows (3 exchanges) to compensate.
Citation
If you reference this work:
@misc{zhang2026microdata,
title = {microdata.no copilot: a locally-deployed LoRA + RAG assistant for SSB register data},
author = {Tao Zhang},
year = {2026},
url = {https://github.com/forlop/microdata-no-copilot}
}
License
MIT. See LICENSE.
- Downloads last month
- 13
4-bit