MukulRay commited on
Commit
d94448f
Β·
1 Parent(s): 41ceb84

docs: update README with training details, add assets folder

Browse files
Files changed (2) hide show
  1. Images/Banner.png +0 -0
  2. README.md +54 -61
Images/Banner.png ADDED
README.md CHANGED
@@ -8,10 +8,9 @@ pinned: false
8
 
9
  <div align="center">
10
 
11
- <img src="docs/assets/banner.png" alt="Irminsul Banner" width="100%">
12
- <!-- PLACEHOLDER: Add a banner image. Can be a dark-themed graphic with the Irminsul logo/name.
13
- Recommended: 1280x320px, dark green/forest aesthetic matching the UI.
14
- Tools: Figma, Canva, or even a screenshot of the UI works. -->
15
 
16
  # Irminsul
17
 
@@ -32,7 +31,7 @@ Most LLM projects stop at inference. This one builds the full stack: a QLoRA fin
32
 
33
  ---
34
 
35
- ## What This Is
36
 
37
  Irminsul is a domain-specific AI assistant for Genshin Impact β€” built not because Genshin needed an AI assistant, but because it provided a concrete, evaluable knowledge domain to build an LLMOps pipeline around. Every component was chosen deliberately:
38
 
@@ -97,27 +96,40 @@ The domain is the test harness. The pipeline is the project.
97
 
98
  ### Fine-Tuned Model
99
 
100
- Llama 3.1 8B Instruct fine-tuned with QLoRA on a custom instruction dataset, trained on Google Colab Pro (A100). Local inference runs in 4-bit NF4 quantization on an RTX 3060 6GB.
101
 
102
- **[β†’ View training notebook on Colab](https://colab.research.google.com/drive/YOUR_NOTEBOOK_LINK_HERE)**
103
  <!-- PLACEHOLDER: Replace YOUR_NOTEBOOK_LINK_HERE with your actual Colab share link
104
  File β†’ Share β†’ Copy link (set to "Anyone with the link can view") -->
105
 
106
  | Parameter | Value |
107
  |---|---|
108
  | Base model | `meta-llama/Llama-3.1-8B-Instruct` |
 
109
  | Method | QLoRA via PEFT |
110
- | Rank / Alpha | r=16, Ξ±=32 |
111
- | Learning rate | 2e-4 |
112
- | Quantization (inference) | 4-bit NF4, bfloat16 compute |
113
- | Training infra | Google Colab Pro (A100) |
 
 
 
 
114
  | Experiment tracking | MLflow (3 runs) |
115
 
116
- Three experiments were tracked in MLflow. Winning checkpoint selected by faithfulness score (0.826) and ROUGE-L (0.466) on a held-out eval set.
117
 
118
- <!-- PLACEHOLDER: Add MLflow experiment screenshot here
119
- docs/assets/mlflow_experiments.png
120
- A screenshot of your MLflow UI showing the 3 runs and metrics comparison -->
 
 
 
 
 
 
 
 
121
 
122
  ### RAG Pipeline
123
 
@@ -290,60 +302,41 @@ Serving the fine-tuned Llama 3.1 8B requires a GPU instance. The minimum viable
290
 
291
  ```
292
  Irminsul/
293
- β”œβ”€β”€ main.py # FastAPI app: endpoints, lifespan, CORS, response models
294
- β”œβ”€β”€ rag.py # LangChain RAG chain, dual backend (Groq / local Llama)
295
- β”œβ”€β”€ embedder.py # sentence-transformers singleton (loads once, reused)
296
- β”œβ”€β”€ ingest.py # Doc loader β†’ word chunker β†’ Pinecone upsert
297
- β”œβ”€β”€ guardrails.py # Input validation: injection detection + domain cosine check
298
- β”œβ”€β”€ index.html # Browser UI: dark Dendro theme, query history, source display
 
 
299
  β”‚
300
- β”œβ”€β”€ Dockerfile # python:3.12-slim, model NOT baked in
301
- β”œβ”€β”€ deploy_azure.sh # One-shot ACR build + Container Apps deploy
302
- β”œβ”€β”€ .env.example # Environment variable reference
303
  β”‚
304
- β”œβ”€β”€ DEPLOYMENT.md # Full deployment guide + cost analysis
305
  β”œβ”€β”€ requirements.txt
306
- β”œβ”€β”€ images/ # Screenshots and assets used in this README
307
  β”‚ β”œβ”€β”€ banner.png
308
  β”‚ β”œβ”€β”€ ui_main.png
309
  β”‚ β”œβ”€β”€ ui_response.png
310
  β”‚ └── mlflow_runs.png
311
- └── docs/
312
- β”œβ”€β”€ corpus/ # Legacy manual corpus docs
313
- └── demo.html # GitHub Pages demo page
314
  ```
315
 
316
  ---
317
 
318
  ## Evaluation
319
 
320
- <!-- PLACEHOLDER: Fill this section once you have eval numbers ready.
321
- Consider running a small eval set (20-50 questions) with:
322
- - Faithfulness: Does the answer contradict the retrieved context?
323
- - Answer relevance: Does the answer address the question?
324
- - Context recall: Did retrieval find the right documents?
325
-
326
- Tools to consider: RAGAS (pip install ragas) against your Pinecone index.
327
-
328
- Example format:
329
 
330
  | Metric | Score | Method |
331
  |---|---|---|
332
- | Faithfulness | 0.826 | Custom eval, n=50 |
333
- | ROUGE-L | 0.466 | vs reference answers |
334
- | Context recall | TBD | RAGAS |
335
- | Answer relevance | TBD | RAGAS |
336
-
337
- The fine-tuned model numbers (0.826 faithfulness, 0.466 ROUGE-L) came from
338
- your MLflow eval during training β€” pull those into this table.
339
- -->
340
-
341
- The fine-tuned model was evaluated during training with a held-out set:
342
-
343
- | Metric | Score |
344
- |---|---|
345
- | Faithfulness | 0.826 |
346
- | ROUGE-L | 0.466 |
347
 
348
  Full RAG pipeline evaluation (context recall, answer relevance) is a planned addition β€” see [What's Next](#whats-next).
349
 
@@ -352,16 +345,16 @@ Full RAG pipeline evaluation (context recall, answer relevance) is a planned add
352
  ## Screenshots
353
 
354
  <!-- PLACEHOLDER: Add screenshots once you have them.
355
- Save to images/ and uncomment these lines:
356
 
357
- ![Irminsul UI](images/ui_main.png)
358
- ![Response with sources](images/ui_response.png)
359
- ![MLflow experiment runs](images/mlflow_runs.png)
360
 
361
  Tips:
362
  - ui_main.png: screenshot of http://localhost:8000 before any query
363
- - ui_response.png: run a query (try "best build for Hu Tao") so the answer + sources section is visible
364
- - mlflow_runs.png: from your Colab β€” the experiment comparison table showing 3 runs
365
  -->
366
 
367
  *Screenshots coming soon β€” [try the live demo](https://huggingface.co/spaces/MukulRay/Irminsul) to see it in action.*
@@ -399,6 +392,6 @@ Genshin Impact is owned by HoYoverse. This project is not affiliated with or end
399
 
400
  <div align="center">
401
 
402
- Built to learn the full MLOps lifecycle β€” fine-tuning, quantization, retrieval, serving, and cloud deployment β€” on consumer hardware. Every component chosen deliberately, not for hype.
403
 
404
- </div>
 
8
 
9
  <div align="center">
10
 
11
+ <img src="Images\Banner.png" alt="Irminsul Banner" width="100%">
12
+ <!-- PLACEHOLDER: Add a banner image. Recommended: 1280x320px, dark green/Dendro aesthetic.
13
+ Save as assets/banner.png. Tools: Figma, Canva, or a cropped screenshot of the UI. -->
 
14
 
15
  # Irminsul
16
 
 
31
 
32
  ---
33
 
34
+ ## About Irminsul
35
 
36
  Irminsul is a domain-specific AI assistant for Genshin Impact β€” built not because Genshin needed an AI assistant, but because it provided a concrete, evaluable knowledge domain to build an LLMOps pipeline around. Every component was chosen deliberately:
37
 
 
96
 
97
  ### Fine-Tuned Model
98
 
99
+ Llama 3.1 8B Instruct fine-tuned with QLoRA on the Stanford Alpaca dataset (52K instruction-following examples), trained on Google Colab Pro (A100). Local inference runs in 4-bit NF4 quantization on an RTX 3060 6GB.
100
 
101
+ **[β†’ View training notebook on Colab](https://colab.research.google.com/drive/1wXz6V196IXEEU3FKwxDJ7BBxRh79QqEF?usp=sharing)**
102
  <!-- PLACEHOLDER: Replace YOUR_NOTEBOOK_LINK_HERE with your actual Colab share link
103
  File β†’ Share β†’ Copy link (set to "Anyone with the link can view") -->
104
 
105
  | Parameter | Value |
106
  |---|---|
107
  | Base model | `meta-llama/Llama-3.1-8B-Instruct` |
108
+ | Dataset | Stanford Alpaca (`tatsu-lab/alpaca`, 52K examples) |
109
  | Method | QLoRA via PEFT |
110
+ | Rank / Alpha | r=16, Ξ±=32, dropout=0.05 |
111
+ | Target modules | q_proj, v_proj, k_proj, o_proj |
112
+ | Learning rate | 2e-4 (cosine schedule, warmup 3%) |
113
+ | Batch size | 4 per device Γ— 4 grad accumulation = effective 16 |
114
+ | Epochs | 2 |
115
+ | Optimizer | paged_adamw_32bit |
116
+ | Quantization (inference) | 4-bit NF4, bfloat16 compute dtype |
117
+ | Training infra | Google Colab Pro (A100 40GB) |
118
  | Experiment tracking | MLflow (3 runs) |
119
 
120
+ **[β†’ Download exp2_lr2e-4_r16 model ](https://drive.google.com/drive/folders/1vAVXDXzT5lThnvlgQwXRi0ParmyB3V0P?usp=sharing)**
121
 
122
+ Three experiments run sequentially, each tracked in MLflow:
123
+
124
+ | Experiment | LR | Rank | Result |
125
+ |---|---|---|---|
126
+ | exp1_lr1e-4_r16 | 1e-4 | 16 | Conservative baseline |
127
+ | exp2_lr2e-4_r16 | 2e-4 | 16 | **Winner** β€” best loss/quality balance |
128
+ | exp3_lr2e-4_r8 | 2e-4 | 8 | Tests if rank=16 is worth the extra params |
129
+
130
+ Winning checkpoint (`exp2_lr2e-4_r16`) selected by faithfulness (0.826) and ROUGE-L (0.466), both computed locally via cosine similarity and token overlap against a held-out eval set.
131
+
132
+ <!-- PLACEHOLDER: Add MLflow experiment screenshot here β€” images/mlflow_runs.png -->
133
 
134
  ### RAG Pipeline
135
 
 
302
 
303
  ```
304
  Irminsul/
305
+ β”œβ”€β”€ main.py # FastAPI app: endpoints, lifespan, CORS, response models
306
+ β”œβ”€β”€ rag.py # LangChain RAG chain, dual backend (Groq / local Llama)
307
+ β”œβ”€β”€ embedder.py # sentence-transformers singleton (loads once, reused)
308
+ β”œβ”€β”€ ingest.py # Doc loader β†’ word chunker β†’ Pinecone upsert
309
+ β”œβ”€β”€ guardrails.py # Input validation: injection detection + domain cosine check
310
+ β”œβ”€β”€ index.html # Browser UI: dark Dendro theme, query history, source display
311
+ β”‚
312
+ β”œβ”€β”€ LLMOps_Pipeline.ipynb # Full training notebook: QLoRA, MLflow, eval (Colab A100)
313
  β”‚
314
+ β”œβ”€β”€ Dockerfile # python:3.12-slim, model NOT baked in
315
+ β”œβ”€β”€ deploy_azure.sh # One-shot ACR build + Container Apps deploy
316
+ β”œβ”€β”€ .env.example # Environment variable reference
317
  β”‚
318
+ β”œβ”€β”€ DEPLOYMENT.md # Full deployment guide + cost analysis
319
  β”œβ”€β”€ requirements.txt
320
+ β”œβ”€β”€ assets/ # Screenshots and assets used in this README
321
  β”‚ β”œβ”€β”€ banner.png
322
  β”‚ β”œβ”€β”€ ui_main.png
323
  β”‚ β”œβ”€β”€ ui_response.png
324
  β”‚ └── mlflow_runs.png
325
+ └── models/ # gitignored β€” place merged model here locally
326
+ └── merged/
327
+ └── exp2_lr2e-4_r16/
328
  ```
329
 
330
  ---
331
 
332
  ## Evaluation
333
 
334
+ Winning checkpoint evaluated against a held-out set using a custom local eval (cosine similarity for faithfulness, token overlap for ROUGE-L). RAGAS was attempted but hit async timeout issues on Colab β€” custom eval used instead, results are fully reproducible from the notebook.
 
 
 
 
 
 
 
 
335
 
336
  | Metric | Score | Method |
337
  |---|---|---|
338
+ | Faithfulness | 0.826 | Cosine similarity: ground truth β†’ answer embedding |
339
+ | ROUGE-L | 0.466 | Token overlap vs reference answers |
 
 
 
 
 
 
 
 
 
 
 
 
 
340
 
341
  Full RAG pipeline evaluation (context recall, answer relevance) is a planned addition β€” see [What's Next](#whats-next).
342
 
 
345
  ## Screenshots
346
 
347
  <!-- PLACEHOLDER: Add screenshots once you have them.
348
+ Save to assets/ and uncomment these lines:
349
 
350
+ ![Irminsul UI](assets/ui_main.png)
351
+ ![Response with sources](assets/ui_response.png)
352
+ ![MLflow experiment runs](assets/mlflow_runs.png)
353
 
354
  Tips:
355
  - ui_main.png: screenshot of http://localhost:8000 before any query
356
+ - ui_response.png: run a query so the answer + sources section is visible
357
+ - mlflow_runs.png: Colab experiment comparison table showing 3 runs + metrics
358
  -->
359
 
360
  *Screenshots coming soon β€” [try the live demo](https://huggingface.co/spaces/MukulRay/Irminsul) to see it in action.*
 
392
 
393
  <div align="center">
394
 
395
+ Built to learn the full MLOps lifecycle β€” fine-tuning on Colab, quantized inference on consumer hardware, retrieval, serving, and cloud deployment. Every component chosen deliberately, not for hype.
396
 
397
+ </div>