Instructions to use forlop/microdata-copilot-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use forlop/microdata-copilot-v2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="forlop/microdata-copilot-v2", filename="microdata-copilot-v2-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use forlop/microdata-copilot-v2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf forlop/microdata-copilot-v2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf forlop/microdata-copilot-v2:Q4_K_M
Use Docker
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use forlop/microdata-copilot-v2 with Ollama:
ollama run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Unsloth Studio new
How to use forlop/microdata-copilot-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for forlop/microdata-copilot-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for forlop/microdata-copilot-v2 to start chatting
- Pi new
How to use forlop/microdata-copilot-v2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "forlop/microdata-copilot-v2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use forlop/microdata-copilot-v2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf forlop/microdata-copilot-v2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default forlop/microdata-copilot-v2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use forlop/microdata-copilot-v2 with Docker Model Runner:
docker model run hf.co/forlop/microdata-copilot-v2:Q4_K_M
- Lemonade
How to use forlop/microdata-copilot-v2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull forlop/microdata-copilot-v2:Q4_K_M
Run and chat with the model
lemonade run user.microdata-copilot-v2-Q4_K_M
List all available models
lemonade list
| license: mit | |
| language: | |
| - en | |
| - 'no' | |
| base_model: unsloth/Qwen3.5-4B | |
| tags: | |
| - microdata.no | |
| - ssb | |
| - norwegian | |
| - register-data | |
| - lora | |
| - gguf | |
| - rag | |
| - ollama | |
| library_name: gguf | |
| # microdata.no copilot β v2.0 (q4_k_m GGUF) | |
| A small, locally-deployable AI assistant fine-tuned to help users write | |
| [microdata.no](https://microdata.no) scripts and answer questions about | |
| Norwegian register-data variables published by [SSB (Statistics | |
| Norway)](https://www.ssb.no/). | |
| This repo hosts the deployed **q4_k_m quantised GGUF** (2.7 GB) and the | |
| companion **Ollama `Modelfile`**. The full source code (training, RAG, | |
| eval, deployment) and the technical note live at | |
| **<https://github.com/forlop/microdata-no-copilot>**. | |
| ## Quick start | |
| ```bash | |
| # Install Ollama if you don't have it yet: | |
| # Linux/WSL: curl -fsSL https://ollama.com/install.sh | sh | |
| # macOS: brew install ollama (or download from ollama.com) | |
| # Windows: download OllamaSetup.exe from ollama.com | |
| # 1. Pull the base GGUF from this repo (~2.7 GB, one-time) | |
| ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M | |
| # 2. Clone the GitHub repo (contains the Modelfile + RAG layer) | |
| git clone https://github.com/forlop/microdata-no-copilot | |
| cd microdata-no-copilot | |
| # 3. Apply the SYSTEM prompt + refusal few-shots + stop-token parameters | |
| ollama create microdata-copilot -f deploy/Modelfile | |
| # 4. Try it | |
| ollama run microdata-copilot "What is INNTEKT_LONN?" | |
| ``` | |
| > **Why two steps?** `ollama pull` from Hugging Face downloads the raw | |
| > GGUF plus the chat template embedded in its metadata β but **not** the | |
| > custom Modelfile in this repo. Ollama only applies curated Modelfiles | |
| > for models in its official library. For HF-hosted models, you apply | |
| > your own Modelfile locally via `ollama create`. Without step 3, the | |
| > model bleeds `<|endoftext|>` tokens and loops. With it, you get the | |
| > full deployed configuration (system prompt, refusal patterns, stop | |
| > tokens, greedy decoding). | |
| ## Full RAG-wrapped Streamlit demo | |
| ```bash | |
| # After the four steps above, from the cloned repo directory: | |
| pip install -r requirements.txt streamlit | |
| streamlit run rag/app.py | |
| ``` | |
| Streamlit prints a `http://localhost:8501` URL β open it in your browser. | |
| On CPU expect ~10β15 s per response; on a recent GPU, ~1β2 s. | |
| ## What this is | |
| - **Base model:** Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release). | |
| - **Fine-tuning:** rank-32 LoRA, 3 epochs, ~1.5 h on a single 16 GB RTX 5070 Ti. | |
| - **Training corpus:** ~1,400 cards distilled from 729 microdata.no variables, | |
| ~100 manual sections, 40 example scripts, plus refusal/abstention cards. | |
| - **Deployed quantisation:** q4_k_m via llama.cpp (2.7 GB on disk, runs on CPU | |
| or GPU). | |
| - **Designed for:** local deployment behind a thin retrieval layer (FAISS dense | |
| + BM25 sparse + Reciprocal Rank Fusion). All data stays on the user's machine; | |
| no API calls leave the network. | |
| ## Honest evaluation | |
| Measured under strict held-out + adversarial evaluation (80 prompts written | |
| after the model was frozen, LLM-judge scorer with rubric locked before | |
| seeing responses, syntax validator catching fictional commands): | |
| | Class | Pass rate | What it measures | | |
| |---|---|---| | |
| | JAILBREAK | **100% (5/5)** | Refusing role-override, system-prompt extraction, confidentiality bypass | | |
| | RAG (variable lookup) | **80% (8/10)** | Variable definitions, populations, valid periods β when retrieval succeeds | | |
| | LANG (language matching) | **80% (4/5)** | Norwegian Q β Norwegian A, English Q β English A | | |
| | SCRIPT (write a script) | 33% (5/15) | Real commands; failures are fabricated variable names | | |
| | MANUAL (explain a command) | 29% (2/7) | Some command explanations are vague or partial | | |
| | STALE (admit "I don't know") | **0% (0/5)** | Calibration weakness β doesn't say "I don't know" when it should | | |
| | **Overall** | **53.8% (43/80)** | Strict-eval pass rate | | |
| Refusal and jailbreak resistance are essentially solid. Retrieval-grounded | |
| lookup works when retrieval succeeds. The model's main failure mode is | |
| fabricating variable names when asked to *suggest* one (rather than confirm | |
| a known one), and not calibrating uncertainty well. | |
| A lenient substring-based scorer on a 46-prompt iteration set reports | |
| **82.6%** β that's real but it measures performance on prompts we iterated | |
| *against*. The 53.8% is the honest out-of-sample number. | |
| Full evaluation methodology and class-level breakdown: | |
| [TECHNICAL_NOTE.md Β§17](https://github.com/forlop/microdata-no-copilot/blob/main/TECHNICAL_NOTE.md#17-deployed-system-eval-strict-held-out--adversarial) | |
| on GitHub. | |
| ## Limitations | |
| - **Not a finished product.** 53.8% strict pass-rate is below what a | |
| researcher can rely on without verification. Treat as a research preview. | |
| - **Variable name hallucination.** When asked to suggest variables for a | |
| task (rather than confirm a specific one), the model invents plausible | |
| but non-existent names. The RAG layer mitigates this when the user names | |
| a variable; it doesn't fix open-ended suggestion. | |
| - **Domain-specific.** This model is useful only for microdata.no scripting | |
| and SSB register-data variables. It is not a general-purpose chatbot. | |
| - **Single-turn training.** The cards are single-turn user/assistant pairs. | |
| Multi-turn behaviour is emergent and degrades faster than a chat-tuned | |
| foundation model would. The CLI/Streamlit front-ends use small windows | |
| (3 exchanges) to compensate. | |
| ## Citation | |
| If you reference this work: | |
| ```bibtex | |
| @misc{zhang2026microdata, | |
| title = {microdata.no copilot: a locally-deployed LoRA + RAG assistant for SSB register data}, | |
| author = {Tao Zhang}, | |
| year = {2026}, | |
| url = {https://github.com/forlop/microdata-no-copilot} | |
| } | |
| ``` | |
| ## License | |
| MIT. See [LICENSE](https://github.com/forlop/microdata-no-copilot/blob/main/LICENSE). | |