forlop commited on
Commit
d0897b5
·
verified ·
1 Parent(s): 5348691

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +29 -10
README.md CHANGED
@@ -23,10 +23,10 @@ A small, locally-deployable AI assistant fine-tuned to help users write
23
  Norwegian register-data variables published by [SSB (Statistics
24
  Norway)](https://www.ssb.no/).
25
 
26
- This repo hosts the deployed **q4_k_m quantised GGUF** (2.7 GB) plus an
27
- Ollama `Modelfile` so the model can be pulled and run with one command.
28
- The full source code (training, RAG, eval, deployment) and the technical
29
- note live at **<https://github.com/forlop/microdata-no-copilot>**.
30
 
31
  ## Quick start
32
 
@@ -36,21 +36,40 @@ note live at **<https://github.com/forlop/microdata-no-copilot>**.
36
  # macOS: brew install ollama (or download from ollama.com)
37
  # Windows: download OllamaSetup.exe from ollama.com
38
 
39
- # Pull and run
40
  ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M
41
- ollama run hf.co/forlop/microdata-copilot-v2:Q4_K_M
 
 
 
 
 
 
 
 
 
42
  ```
43
 
44
- For the full RAG-wrapped experience (retrieval over the live microdata.no
45
- variable catalogue + a Streamlit web UI), clone the GitHub repo:
 
 
 
 
 
 
 
 
46
 
47
  ```bash
48
- git clone https://github.com/forlop/microdata-no-copilot
49
- cd microdata-no-copilot
50
  pip install -r requirements.txt streamlit
51
  streamlit run rag/app.py
52
  ```
53
 
 
 
 
54
  ## What this is
55
 
56
  - **Base model:** Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release).
 
23
  Norwegian register-data variables published by [SSB (Statistics
24
  Norway)](https://www.ssb.no/).
25
 
26
+ This repo hosts the deployed **q4_k_m quantised GGUF** (2.7 GB) and the
27
+ companion **Ollama `Modelfile`**. The full source code (training, RAG,
28
+ eval, deployment) and the technical note live at
29
+ **<https://github.com/forlop/microdata-no-copilot>**.
30
 
31
  ## Quick start
32
 
 
36
  # macOS: brew install ollama (or download from ollama.com)
37
  # Windows: download OllamaSetup.exe from ollama.com
38
 
39
+ # 1. Pull the base GGUF from this repo (~2.7 GB, one-time)
40
  ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M
41
+
42
+ # 2. Clone the GitHub repo (contains the Modelfile + RAG layer)
43
+ git clone https://github.com/forlop/microdata-no-copilot
44
+ cd microdata-no-copilot
45
+
46
+ # 3. Apply the SYSTEM prompt + refusal few-shots + stop-token parameters
47
+ ollama create microdata-copilot -f deploy/Modelfile
48
+
49
+ # 4. Try it
50
+ ollama run microdata-copilot "What is INNTEKT_LONN?"
51
  ```
52
 
53
+ > **Why two steps?** `ollama pull` from Hugging Face downloads the raw
54
+ > GGUF plus the chat template embedded in its metadata — but **not** the
55
+ > custom Modelfile in this repo. Ollama only applies curated Modelfiles
56
+ > for models in its official library. For HF-hosted models, you apply
57
+ > your own Modelfile locally via `ollama create`. Without step 3, the
58
+ > model bleeds `<|endoftext|>` tokens and loops. With it, you get the
59
+ > full deployed configuration (system prompt, refusal patterns, stop
60
+ > tokens, greedy decoding).
61
+
62
+ ## Full RAG-wrapped Streamlit demo
63
 
64
  ```bash
65
+ # After the four steps above, from the cloned repo directory:
 
66
  pip install -r requirements.txt streamlit
67
  streamlit run rag/app.py
68
  ```
69
 
70
+ Streamlit prints a `http://localhost:8501` URL — open it in your browser.
71
+ On CPU expect ~10–15 s per response; on a recent GPU, ~1–2 s.
72
+
73
  ## What this is
74
 
75
  - **Base model:** Qwen3.5-4B (Apache-2.0, via Unsloth's pre-quantised release).