docs: update README with training details, add assets folder
Browse files- Images/Banner.png +0 -0
- README.md +54 -61
Images/Banner.png
ADDED
|
README.md
CHANGED
|
@@ -8,10 +8,9 @@ pinned: false
|
|
| 8 |
|
| 9 |
<div align="center">
|
| 10 |
|
| 11 |
-
<img src="
|
| 12 |
-
<!-- PLACEHOLDER: Add a banner image.
|
| 13 |
-
|
| 14 |
-
Tools: Figma, Canva, or even a screenshot of the UI works. -->
|
| 15 |
|
| 16 |
# Irminsul
|
| 17 |
|
|
@@ -32,7 +31,7 @@ Most LLM projects stop at inference. This one builds the full stack: a QLoRA fin
|
|
| 32 |
|
| 33 |
---
|
| 34 |
|
| 35 |
-
##
|
| 36 |
|
| 37 |
Irminsul is a domain-specific AI assistant for Genshin Impact β built not because Genshin needed an AI assistant, but because it provided a concrete, evaluable knowledge domain to build an LLMOps pipeline around. Every component was chosen deliberately:
|
| 38 |
|
|
@@ -97,27 +96,40 @@ The domain is the test harness. The pipeline is the project.
|
|
| 97 |
|
| 98 |
### Fine-Tuned Model
|
| 99 |
|
| 100 |
-
Llama 3.1 8B Instruct fine-tuned with QLoRA on
|
| 101 |
|
| 102 |
-
**[β View training notebook on Colab](https://colab.research.google.com/drive/
|
| 103 |
<!-- PLACEHOLDER: Replace YOUR_NOTEBOOK_LINK_HERE with your actual Colab share link
|
| 104 |
File β Share β Copy link (set to "Anyone with the link can view") -->
|
| 105 |
|
| 106 |
| Parameter | Value |
|
| 107 |
|---|---|
|
| 108 |
| Base model | `meta-llama/Llama-3.1-8B-Instruct` |
|
|
|
|
| 109 |
| Method | QLoRA via PEFT |
|
| 110 |
-
| Rank / Alpha | r=16, Ξ±=32 |
|
| 111 |
-
|
|
| 112 |
-
|
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
| Experiment tracking | MLflow (3 runs) |
|
| 115 |
|
| 116 |
-
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
### RAG Pipeline
|
| 123 |
|
|
@@ -290,60 +302,41 @@ Serving the fine-tuned Llama 3.1 8B requires a GPU instance. The minimum viable
|
|
| 290 |
|
| 291 |
```
|
| 292 |
Irminsul/
|
| 293 |
-
βββ main.py
|
| 294 |
-
βββ rag.py
|
| 295 |
-
βββ embedder.py
|
| 296 |
-
βββ ingest.py
|
| 297 |
-
βββ guardrails.py
|
| 298 |
-
βββ index.html
|
|
|
|
|
|
|
| 299 |
β
|
| 300 |
-
βββ Dockerfile
|
| 301 |
-
βββ deploy_azure.sh
|
| 302 |
-
βββ .env.example
|
| 303 |
β
|
| 304 |
-
βββ DEPLOYMENT.md
|
| 305 |
βββ requirements.txt
|
| 306 |
-
βββ
|
| 307 |
β βββ banner.png
|
| 308 |
β βββ ui_main.png
|
| 309 |
β βββ ui_response.png
|
| 310 |
β βββ mlflow_runs.png
|
| 311 |
-
βββ
|
| 312 |
-
|
| 313 |
-
|
| 314 |
```
|
| 315 |
|
| 316 |
---
|
| 317 |
|
| 318 |
## Evaluation
|
| 319 |
|
| 320 |
-
|
| 321 |
-
Consider running a small eval set (20-50 questions) with:
|
| 322 |
-
- Faithfulness: Does the answer contradict the retrieved context?
|
| 323 |
-
- Answer relevance: Does the answer address the question?
|
| 324 |
-
- Context recall: Did retrieval find the right documents?
|
| 325 |
-
|
| 326 |
-
Tools to consider: RAGAS (pip install ragas) against your Pinecone index.
|
| 327 |
-
|
| 328 |
-
Example format:
|
| 329 |
|
| 330 |
| Metric | Score | Method |
|
| 331 |
|---|---|---|
|
| 332 |
-
| Faithfulness | 0.826 |
|
| 333 |
-
| ROUGE-L | 0.466 | vs reference answers |
|
| 334 |
-
| Context recall | TBD | RAGAS |
|
| 335 |
-
| Answer relevance | TBD | RAGAS |
|
| 336 |
-
|
| 337 |
-
The fine-tuned model numbers (0.826 faithfulness, 0.466 ROUGE-L) came from
|
| 338 |
-
your MLflow eval during training β pull those into this table.
|
| 339 |
-
-->
|
| 340 |
-
|
| 341 |
-
The fine-tuned model was evaluated during training with a held-out set:
|
| 342 |
-
|
| 343 |
-
| Metric | Score |
|
| 344 |
-
|---|---|
|
| 345 |
-
| Faithfulness | 0.826 |
|
| 346 |
-
| ROUGE-L | 0.466 |
|
| 347 |
|
| 348 |
Full RAG pipeline evaluation (context recall, answer relevance) is a planned addition β see [What's Next](#whats-next).
|
| 349 |
|
|
@@ -352,16 +345,16 @@ Full RAG pipeline evaluation (context recall, answer relevance) is a planned add
|
|
| 352 |
## Screenshots
|
| 353 |
|
| 354 |
<!-- PLACEHOLDER: Add screenshots once you have them.
|
| 355 |
-
Save to
|
| 356 |
|
| 357 |
-
 to see it in action.*
|
|
@@ -399,6 +392,6 @@ Genshin Impact is owned by HoYoverse. This project is not affiliated with or end
|
|
| 399 |
|
| 400 |
<div align="center">
|
| 401 |
|
| 402 |
-
Built to learn the full MLOps lifecycle β fine-tuning,
|
| 403 |
|
| 404 |
-
</div>
|
|
|
|
| 8 |
|
| 9 |
<div align="center">
|
| 10 |
|
| 11 |
+
<img src="Images\Banner.png" alt="Irminsul Banner" width="100%">
|
| 12 |
+
<!-- PLACEHOLDER: Add a banner image. Recommended: 1280x320px, dark green/Dendro aesthetic.
|
| 13 |
+
Save as assets/banner.png. Tools: Figma, Canva, or a cropped screenshot of the UI. -->
|
|
|
|
| 14 |
|
| 15 |
# Irminsul
|
| 16 |
|
|
|
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
+
## About Irminsul
|
| 35 |
|
| 36 |
Irminsul is a domain-specific AI assistant for Genshin Impact β built not because Genshin needed an AI assistant, but because it provided a concrete, evaluable knowledge domain to build an LLMOps pipeline around. Every component was chosen deliberately:
|
| 37 |
|
|
|
|
| 96 |
|
| 97 |
### Fine-Tuned Model
|
| 98 |
|
| 99 |
+
Llama 3.1 8B Instruct fine-tuned with QLoRA on the Stanford Alpaca dataset (52K instruction-following examples), trained on Google Colab Pro (A100). Local inference runs in 4-bit NF4 quantization on an RTX 3060 6GB.
|
| 100 |
|
| 101 |
+
**[β View training notebook on Colab](https://colab.research.google.com/drive/1wXz6V196IXEEU3FKwxDJ7BBxRh79QqEF?usp=sharing)**
|
| 102 |
<!-- PLACEHOLDER: Replace YOUR_NOTEBOOK_LINK_HERE with your actual Colab share link
|
| 103 |
File β Share β Copy link (set to "Anyone with the link can view") -->
|
| 104 |
|
| 105 |
| Parameter | Value |
|
| 106 |
|---|---|
|
| 107 |
| Base model | `meta-llama/Llama-3.1-8B-Instruct` |
|
| 108 |
+
| Dataset | Stanford Alpaca (`tatsu-lab/alpaca`, 52K examples) |
|
| 109 |
| Method | QLoRA via PEFT |
|
| 110 |
+
| Rank / Alpha | r=16, Ξ±=32, dropout=0.05 |
|
| 111 |
+
| Target modules | q_proj, v_proj, k_proj, o_proj |
|
| 112 |
+
| Learning rate | 2e-4 (cosine schedule, warmup 3%) |
|
| 113 |
+
| Batch size | 4 per device Γ 4 grad accumulation = effective 16 |
|
| 114 |
+
| Epochs | 2 |
|
| 115 |
+
| Optimizer | paged_adamw_32bit |
|
| 116 |
+
| Quantization (inference) | 4-bit NF4, bfloat16 compute dtype |
|
| 117 |
+
| Training infra | Google Colab Pro (A100 40GB) |
|
| 118 |
| Experiment tracking | MLflow (3 runs) |
|
| 119 |
|
| 120 |
+
**[β Download exp2_lr2e-4_r16 model ](https://drive.google.com/drive/folders/1vAVXDXzT5lThnvlgQwXRi0ParmyB3V0P?usp=sharing)**
|
| 121 |
|
| 122 |
+
Three experiments run sequentially, each tracked in MLflow:
|
| 123 |
+
|
| 124 |
+
| Experiment | LR | Rank | Result |
|
| 125 |
+
|---|---|---|---|
|
| 126 |
+
| exp1_lr1e-4_r16 | 1e-4 | 16 | Conservative baseline |
|
| 127 |
+
| exp2_lr2e-4_r16 | 2e-4 | 16 | **Winner** β best loss/quality balance |
|
| 128 |
+
| exp3_lr2e-4_r8 | 2e-4 | 8 | Tests if rank=16 is worth the extra params |
|
| 129 |
+
|
| 130 |
+
Winning checkpoint (`exp2_lr2e-4_r16`) selected by faithfulness (0.826) and ROUGE-L (0.466), both computed locally via cosine similarity and token overlap against a held-out eval set.
|
| 131 |
+
|
| 132 |
+
<!-- PLACEHOLDER: Add MLflow experiment screenshot here β images/mlflow_runs.png -->
|
| 133 |
|
| 134 |
### RAG Pipeline
|
| 135 |
|
|
|
|
| 302 |
|
| 303 |
```
|
| 304 |
Irminsul/
|
| 305 |
+
βββ main.py # FastAPI app: endpoints, lifespan, CORS, response models
|
| 306 |
+
βββ rag.py # LangChain RAG chain, dual backend (Groq / local Llama)
|
| 307 |
+
βββ embedder.py # sentence-transformers singleton (loads once, reused)
|
| 308 |
+
βββ ingest.py # Doc loader β word chunker β Pinecone upsert
|
| 309 |
+
βββ guardrails.py # Input validation: injection detection + domain cosine check
|
| 310 |
+
βββ index.html # Browser UI: dark Dendro theme, query history, source display
|
| 311 |
+
β
|
| 312 |
+
βββ LLMOps_Pipeline.ipynb # Full training notebook: QLoRA, MLflow, eval (Colab A100)
|
| 313 |
β
|
| 314 |
+
βββ Dockerfile # python:3.12-slim, model NOT baked in
|
| 315 |
+
βββ deploy_azure.sh # One-shot ACR build + Container Apps deploy
|
| 316 |
+
βββ .env.example # Environment variable reference
|
| 317 |
β
|
| 318 |
+
βββ DEPLOYMENT.md # Full deployment guide + cost analysis
|
| 319 |
βββ requirements.txt
|
| 320 |
+
βββ assets/ # Screenshots and assets used in this README
|
| 321 |
β βββ banner.png
|
| 322 |
β βββ ui_main.png
|
| 323 |
β βββ ui_response.png
|
| 324 |
β βββ mlflow_runs.png
|
| 325 |
+
βββ models/ # gitignored β place merged model here locally
|
| 326 |
+
βββ merged/
|
| 327 |
+
βββ exp2_lr2e-4_r16/
|
| 328 |
```
|
| 329 |
|
| 330 |
---
|
| 331 |
|
| 332 |
## Evaluation
|
| 333 |
|
| 334 |
+
Winning checkpoint evaluated against a held-out set using a custom local eval (cosine similarity for faithfulness, token overlap for ROUGE-L). RAGAS was attempted but hit async timeout issues on Colab β custom eval used instead, results are fully reproducible from the notebook.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 335 |
|
| 336 |
| Metric | Score | Method |
|
| 337 |
|---|---|---|
|
| 338 |
+
| Faithfulness | 0.826 | Cosine similarity: ground truth β answer embedding |
|
| 339 |
+
| ROUGE-L | 0.466 | Token overlap vs reference answers |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
|
| 341 |
Full RAG pipeline evaluation (context recall, answer relevance) is a planned addition β see [What's Next](#whats-next).
|
| 342 |
|
|
|
|
| 345 |
## Screenshots
|
| 346 |
|
| 347 |
<!-- PLACEHOLDER: Add screenshots once you have them.
|
| 348 |
+
Save to assets/ and uncomment these lines:
|
| 349 |
|
| 350 |
+

|
| 351 |
+

|
| 352 |
+

|
| 353 |
|
| 354 |
Tips:
|
| 355 |
- ui_main.png: screenshot of http://localhost:8000 before any query
|
| 356 |
+
- ui_response.png: run a query so the answer + sources section is visible
|
| 357 |
+
- mlflow_runs.png: Colab experiment comparison table showing 3 runs + metrics
|
| 358 |
-->
|
| 359 |
|
| 360 |
*Screenshots coming soon β [try the live demo](https://huggingface.co/spaces/MukulRay/Irminsul) to see it in action.*
|
|
|
|
| 392 |
|
| 393 |
<div align="center">
|
| 394 |
|
| 395 |
+
Built to learn the full MLOps lifecycle β fine-tuning on Colab, quantized inference on consumer hardware, retrieval, serving, and cloud deployment. Every component chosen deliberately, not for hype.
|
| 396 |
|
| 397 |
+
</div>
|