File size: 5,792 Bytes
5348691
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0897b5
 
 
 
5348691
 
 
 
 
 
 
 
 
d0897b5
5348691
d0897b5
 
 
 
 
 
 
 
 
 
5348691
 
d0897b5
 
 
 
 
 
 
 
 
 
5348691
 
d0897b5
5348691
 
 
 
d0897b5
 
 
5348691
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
license: mit
language:
- en
- 'no'
base_model: unsloth/Qwen3.5-4B
tags:
- microdata.no
- ssb
- norwegian
- register-data
- lora
- gguf
- rag
- ollama
library_name: gguf
---

# microdata.no copilot β€” v2.0 (q4_k_m GGUF)

A small, locally-deployable AI assistant fine-tuned to help users write
[microdata.no](https://microdata.no) scripts and answer questions about
Norwegian register-data variables published by [SSB (Statistics
Norway)](https://www.ssb.no/).

This repo hosts the deployed **q4_k_m quantised GGUF** (2.7 GB) and the
companion **Ollama `Modelfile`**. The full source code (training, RAG,
eval, deployment) and the technical note live at
**<https://github.com/forlop/microdata-no-copilot>**.

## Quick start

```bash
# Install Ollama if you don't have it yet:
#   Linux/WSL:  curl -fsSL https://ollama.com/install.sh | sh
#   macOS:      brew install ollama  (or download from ollama.com)
#   Windows:    download OllamaSetup.exe from ollama.com

# 1. Pull the base GGUF from this repo (~2.7 GB, one-time)
ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M

# 2. Clone the GitHub repo (contains the Modelfile + RAG layer)
git clone https://github.com/forlop/microdata-no-copilot
cd microdata-no-copilot

# 3. Apply the SYSTEM prompt + refusal few-shots + stop-token parameters
ollama create microdata-copilot -f deploy/Modelfile

# 4. Try it
ollama run microdata-copilot "What is INNTEKT_LONN?"
```

> **Why two steps?** `ollama pull` from Hugging Face downloads the raw
> GGUF plus the chat template embedded in its metadata β€” but **not** the
> custom Modelfile in this repo. Ollama only applies curated Modelfiles
> for models in its official library. For HF-hosted models, you apply
> your own Modelfile locally via `ollama create`. Without step 3, the
> model bleeds `<|endoftext|>` tokens and loops. With it, you get the
> full deployed configuration (system prompt, refusal patterns, stop
> tokens, greedy decoding).

## Full RAG-wrapped Streamlit demo

```bash
# After the four steps above, from the cloned repo directory:
pip install -r requirements.txt streamlit
streamlit run rag/app.py
```

Streamlit prints a `http://localhost:8501` URL β€” open it in your browser.
On CPU expect ~10–15 s per response; on a recent GPU, ~1–2 s.

## What this is

- **Base model:** Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release).
- **Fine-tuning:** rank-32 LoRA, 3 epochs, ~1.5 h on a single 16 GB RTX 5070 Ti.
- **Training corpus:** ~1,400 cards distilled from 729 microdata.no variables,
  ~100 manual sections, 40 example scripts, plus refusal/abstention cards.
- **Deployed quantisation:** q4_k_m via llama.cpp (2.7 GB on disk, runs on CPU
  or GPU).
- **Designed for:** local deployment behind a thin retrieval layer (FAISS dense
  + BM25 sparse + Reciprocal Rank Fusion). All data stays on the user's machine;
  no API calls leave the network.

## Honest evaluation

Measured under strict held-out + adversarial evaluation (80 prompts written
after the model was frozen, LLM-judge scorer with rubric locked before
seeing responses, syntax validator catching fictional commands):

| Class | Pass rate | What it measures |
|---|---|---|
| JAILBREAK | **100% (5/5)** | Refusing role-override, system-prompt extraction, confidentiality bypass |
| RAG (variable lookup) | **80% (8/10)** | Variable definitions, populations, valid periods β€” when retrieval succeeds |
| LANG (language matching) | **80% (4/5)** | Norwegian Q β†’ Norwegian A, English Q β†’ English A |
| SCRIPT (write a script) | 33% (5/15) | Real commands; failures are fabricated variable names |
| MANUAL (explain a command) | 29% (2/7) | Some command explanations are vague or partial |
| STALE (admit "I don't know") | **0% (0/5)** | Calibration weakness β€” doesn't say "I don't know" when it should |
| **Overall** | **53.8% (43/80)** | Strict-eval pass rate |

Refusal and jailbreak resistance are essentially solid. Retrieval-grounded
lookup works when retrieval succeeds. The model's main failure mode is
fabricating variable names when asked to *suggest* one (rather than confirm
a known one), and not calibrating uncertainty well.

A lenient substring-based scorer on a 46-prompt iteration set reports
**82.6%** β€” that's real but it measures performance on prompts we iterated
*against*. The 53.8% is the honest out-of-sample number.

Full evaluation methodology and class-level breakdown:
[TECHNICAL_NOTE.md Β§17](https://github.com/forlop/microdata-no-copilot/blob/main/TECHNICAL_NOTE.md#17-deployed-system-eval-strict-held-out--adversarial)
on GitHub.

## Limitations

- **Not a finished product.** 53.8% strict pass-rate is below what a
  researcher can rely on without verification. Treat as a research preview.
- **Variable name hallucination.** When asked to suggest variables for a
  task (rather than confirm a specific one), the model invents plausible
  but non-existent names. The RAG layer mitigates this when the user names
  a variable; it doesn't fix open-ended suggestion.
- **Domain-specific.** This model is useful only for microdata.no scripting
  and SSB register-data variables. It is not a general-purpose chatbot.
- **Single-turn training.** The cards are single-turn user/assistant pairs.
  Multi-turn behaviour is emergent and degrades faster than a chat-tuned
  foundation model would. The CLI/Streamlit front-ends use small windows
  (3 exchanges) to compensate.

## Citation

If you reference this work:

```bibtex
@misc{zhang2026microdata,
  title  = {microdata.no copilot: a locally-deployed LoRA + RAG assistant for SSB register data},
  author = {Tao Zhang},
  year   = {2026},
  url    = {https://github.com/forlop/microdata-no-copilot}
}
```

## License

MIT. See [LICENSE](https://github.com/forlop/microdata-no-copilot/blob/main/LICENSE).