nadiva1243 commited on
Commit
0f331c9
·
verified ·
1 Parent(s): 22631cb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +101 -3
README.md CHANGED
@@ -1,3 +1,101 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - es
5
+ - ca
6
+ license: other
7
+ base_model: microsoft/phi-4
8
+ tags:
9
+ - rag
10
+ - retrieval-augmented-generation
11
+ - lora
12
+ - phi4
13
+ - multilingual
14
+ - ollama
15
+ - gguf
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # Phi-4 RAG (LoRA fine-tuned) — Q4_K_M GGUF
20
+
21
+ Quantized **GGUF** build of **Microsoft Phi-4** with a **LoRA** adapter merged in, for **retrieval-augmented question answering**. The model answers **only from supplied document context** in **English, Spanish, or Catalan**, using the same RAG-oriented system prompt as the **MonkeyGrab** project (TFG, Universitat Politècnica de València).
22
+
23
+ ## Files in this repo
24
+
25
+ | File | Description |
26
+ |------|-------------|
27
+ | `Phi4-Q4_K_M.gguf` | Full weights after LoRA merge, **Q4_K_M** quantization (local inference, e.g. Ollama). |
28
+ | `Modelfile` | Ollama recipe: ChatML template, system prompt, sampling parameters. |
29
+ | `README.md` | This model card (mirrored in the source tree under `models/gguf-output/phi-4/`). |
30
+
31
+ ## Base model and method
32
+
33
+ - **Base:** [`microsoft/phi-4`](https://huggingface.co/microsoft/phi-4) (ChatML-style; end-of-turn `<|im_end|>`).
34
+ - **Adaptation:** PEFT **LoRA** → merge into dense weights → **GGUF** export and **Q4_K_M** quantization via the project toolchain (`scripts/conversion/`, llama.cpp binaries).
35
+
36
+ ### LoRA configuration
37
+
38
+ | Setting | Value |
39
+ |--------|--------|
40
+ | `r` | 32 |
41
+ | `lora_alpha` | 64 |
42
+ | `lora_dropout` | 0.05 |
43
+ | `target_modules` | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
44
+ | `bias` | `none` |
45
+
46
+ ### Training (`scripts/training/train-phi4.py`, v1)
47
+
48
+ - **Seed:** 42.
49
+ - **Task format:** System + user message with instruction and `<context>...</context>`; **loss only on the assistant completion** (prompt labels masked).
50
+ - **Data (balanced 5-way interleaving, 3,200 train samples per source):**
51
+ - Neural-Bridge RAG,
52
+ - Dolly (categories: `closed_qa`, `information_extraction`, `summarization`), 80/10/10 split after filter,
53
+ - Aina RAG — **EN**, **ES**, **CA**.
54
+ - **Sequence limits:** `max_length` 4096, context truncated to **2048** tokens; eval generation up to **2048** new tokens.
55
+ - **Optimizer / schedule:** AdamW 8-bit, **lr** 5e-5, **cosine** with **warmup_ratio** 0.05, **weight_decay** 0.01, **max_grad_norm** 1.0.
56
+ - **Batching:** `per_device_train_batch_size` 1, **gradient_accumulation_steps** 16 → **effective batch 16**; **bf16** + **TF32**; gradient checkpointing on.
57
+ - **Epochs:** 3; checkpoints every **300** steps (keep 3); eval every **150** steps; **load_best_model_at_end** on `eval_loss`; **early stopping** patience **5**.
58
+
59
+ ### Evaluation (frozen splits)
60
+
61
+ - Same **dev/test** partitions for base vs adapted models.
62
+ - **Dev:** up to **320** samples per dataset (Neural-Bridge, Dolly, Aina-EN / ES / CA) for alignment with the baseline benchmark.
63
+ - **Test:** full held-out splits (no size cap beyond validity filters).
64
+ - **Metrics:** Token F1, ROUGE-L F1, BERTScore F1 (`microsoft/deberta-xlarge-mnli`), plus auxiliary context faithfulness; BERTScore after unloading the generative model.
65
+
66
+ Training artifacts (`training_stats.json`, `evaluation_comparison.json`) live under `training-output/phi-4/` in the code repository, not on this Hub model repo.
67
+
68
+ ## Ollama
69
+
70
+ Put `Phi4-Q4_K_M.gguf` next to `Modelfile` (or set `FROM` to your local path). Then:
71
+
72
+ ```bash
73
+ ollama create phi4-rag -f Modelfile
74
+ ollama run phi4-rag
75
+ ```
76
+
77
+ Bundled generation defaults include `num_ctx` 16384, `temperature` 0.15, `top_p` 0.9, `repeat_penalty` 1.15 (see `Modelfile`).
78
+
79
+ ## Limitations
80
+
81
+ - Intended for **grounded** QA; do not treat as unconstrained world-knowledge without retrieval.
82
+ - **Q4_K_M** is a speed/size trade-off vs higher bit-width or FP16.
83
+ - Quality depends on retrieval and on wrapping context in `<context>...</context>` as in training.
84
+
85
+ ## License
86
+
87
+ This adapter build inherits constraints from **Microsoft Phi-4** and from the datasets used. See the base model card and Microsoft’s terms. The **MonkeyGrab** application code is separate; replace this sentence with your public repo URL when published.
88
+
89
+ ## Citation
90
+
91
+ ```bibtex
92
+ @misc{phi4_rag_gguf_monkeygrab,
93
+ title = {Phi-4 RAG LoRA Fine-tune (Q4_K_M GGUF)},
94
+ author = {Nadiv},
95
+ year = {2026},
96
+ howpublished = {Hugging Face: \url{https://huggingface.co/nadiva1243/phi4RAG}},
97
+ note = {Base: microsoft/phi-4; training: MonkeyGrab train-phi4.py v1}
98
+ }
99
+ ```
100
+
101
+ Adjust `author` and the Hub URL if your username or repo name differs.