CodeSage / README.md
Aditya
Add live HuggingFace Spaces demo link to README
b8a0f1b
|
Raw
History Blame Contribute Delete
21.3 kB
---
title: CodeSage
emoji: ๐Ÿง™
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.35.0"
app_file: demo.py
pinned: false
---
<div align="center">
<!-- Animated Banner -->
<img width="100%" src="https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=200&section=header&text=CodeSage%20๐Ÿง™&fontSize=60&fontColor=fff&animation=twinkling&fontAlignY=35&desc=LLM%20vs%20RAG%20vs%20Fine-Tuning%20โ€”%20Live%20Trade-off%20Platform&descAlignY=60&descSize=18" />
<!-- Typing SVG -->
<img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=700&size=22&pause=1200&color=A78BFA&center=true&vCenter=true&multiline=true&repeat=true&width=700&height=60&lines=Same+Question.+Three+Architectures.+Real+Numbers." alt="Typing SVG" />
<br/>
<!-- Primary Badges -->
<p align="center">
<a href="https://github.com/Adityax-07/LLM-vs-RAG-vs-Fine-Tuning-/stargazers">
<img src="https://img.shields.io/github/stars/Adityax-07/LLM-vs-RAG-vs-Fine-Tuning-?style=for-the-badge&logo=starship&color=f59e0b&logoColor=white&labelColor=1a1a2e" />
</a>
<a href="https://github.com/Adityax-07/LLM-vs-RAG-vs-Fine-Tuning-/network/members">
<img src="https://img.shields.io/github/forks/Adityax-07/LLM-vs-RAG-vs-Fine-Tuning-?style=for-the-badge&logo=git&color=8b5cf6&logoColor=white&labelColor=1a1a2e" />
</a>
<img src="https://img.shields.io/badge/License-MIT-22c55e?style=for-the-badge&logo=opensourceinitiative&logoColor=white&labelColor=1a1a2e" />
<img src="https://img.shields.io/badge/Status-Production%20Ready-22c55e?style=for-the-badge&logo=checkmarx&logoColor=white&labelColor=1a1a2e" />
<img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e" />
</p>
<!-- Live Demo Button -->
<p align="center">
<a href="https://huggingface.co/spaces/Adityax-07/CodeSage">
<img src="https://img.shields.io/badge/๐Ÿค—%20Live%20Demo-Try%20CodeSage%20on%20HF%20Spaces-FFD21E?style=for-the-badge&labelColor=1a1a2e" />
</a>
</p>
<!-- Tech Stack Badges -->
<p align="center">
<img src="https://img.shields.io/badge/Streamlit-FF4B4B?style=for-the-badge&logo=streamlit&logoColor=white" />
<img src="https://img.shields.io/badge/LangChain-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white" />
<img src="https://img.shields.io/badge/Groq_API-F55036?style=for-the-badge&logo=groq&logoColor=white" />
<img src="https://img.shields.io/badge/HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=black" />
<img src="https://img.shields.io/badge/FAISS-0467DF?style=for-the-badge&logo=meta&logoColor=white" />
<img src="https://img.shields.io/badge/LoRA%2FPEFT-EF4444?style=for-the-badge&logo=pytorch&logoColor=white" />
<img src="https://img.shields.io/badge/Plotly-3F4F75?style=for-the-badge&logo=plotly&logoColor=white" />
<img src="https://img.shields.io/badge/Google_Colab-F9AB00?style=for-the-badge&logo=googlecolab&logoColor=black" />
</p>
<!-- Skill Icons -->
<p align="center">
<img src="https://skillicons.dev/icons?i=python,pytorch,tensorflow,git,github,vscode&theme=dark" />
</p>
<br/>
<blockquote>
๐Ÿงช <strong>CodeSage</strong> is a live, side-by-side AI research platform that fires the same programming question at three fundamentally different architectures โ€” <strong>Baseline LLM</strong>, <strong>RAG</strong>, and <strong>Fine-Tuning</strong> โ€” then auto-scores every answer on accuracy, hallucination, groundedness, relevance, and cost.<br/><br/>
No cherry-picking. No manual grading. <strong>Real numbers, real trade-offs.</strong>
</blockquote>
<br/>
<!-- Quick stats strip -->
<p align="center">
<img src="https://img.shields.io/badge/50-Benchmark%20Questions-8b5cf6?style=flat-square" />
<img src="https://img.shields.io/badge/3-AI%20Systems%20Compared-06b6d4?style=flat-square" />
<img src="https://img.shields.io/badge/8-Auto%20Eval%20Metrics-f59e0b?style=flat-square" />
<img src="https://img.shields.io/badge/85.3%25-Fine--Tune%20Accuracy-22c55e?style=flat-square" />
<img src="https://img.shields.io/badge/0%25-Hallucination%20Rate-ef4444?style=flat-square" />
</p>
</div>
## ๐Ÿ“Œ Table of Contents
| | Section |
|:---:|:---|
| โšก | [Benchmark Results](#-benchmark-results) |
| ๐Ÿง  | [What is CodeSage?](#-what-is-codesage) |
| โœจ | [Features](#-features) |
| ๐Ÿ—๏ธ | [Architecture](#๏ธ-architecture) |
| ๐Ÿ“Š | [Evaluation Pipeline](#-evaluation-pipeline) |
| ๐Ÿš€ | [Quick Start](#-quick-start) |
| ๐Ÿ“š | [Knowledge Base](#-knowledge-base) |
| ๐Ÿ’ก | [Decision Guide](#-decision-guide) |
| ๐Ÿ› ๏ธ | [Tech Stack](#๏ธ-tech-stack) |
| ๐Ÿ—‚๏ธ | [Project Structure](#๏ธ-project-structure) |
| ๐Ÿ”ฎ | [Roadmap](#-roadmap) |
## โšก Benchmark Results
> **Full evaluation:** `3 systems` ร— `50 Q&A pairs` ร— `8 metrics` โ€” fully automated, zero manual grading
| ๐Ÿ“ Metric | ๐Ÿ”ต Baseline LLM | ๐ŸŸข RAG Chatbot | ๐ŸŸฃ Fine-Tuned (Qwen2.5 + LoRA) |
|:---|:---:|:---:|:---:|
| ๐ŸŽฏ **Answer Accuracy** | 61.4% | 81.6% | **85.3% โœจ** |
| ๐Ÿšซ **Hallucination Rate** | 43.2% โŒ | 9.8% | **0.0% โœจ** |
| ๐Ÿ” **Answer Relevance** | 0.714 | 0.768 | **0.891 โœจ** |
| ๐Ÿ“Œ **Groundedness** | โ€” | **0.87 โœจ** | โ€” |
| โšก **Avg Latency** | ~1.2s | ~2.1s | **~0.4s โœจ** |
| ๐Ÿ’ฐ **Cost / Query** | ~$0.0020 | ~$0.0030 | **$0.0002 โœจ** |
### ๐Ÿ”‘ Key Findings
| Insight | Detail |
|:---|:---|
| ๐Ÿšซ **Hallucination gap** | Baseline hallucinates on `43.2%` of questions โ€” Fine-Tuning eliminates this entirely โ†’ `0%` |
| ๐Ÿ“‰ **RAG cuts hallucination 4.4ร—** | From `43.2%` โ†’ `9.8%` purely through grounded retrieval, no retraining needed |
| ๐Ÿ’ฐ **Fine-Tuning is 10ร— cheaper** | `$0.0002` vs `~$0.002` per query โ€” smaller model, fully local inference |
| โšก **Fine-Tuning is 3ร— faster** | `0.4s` vs `1.2s` โ€” no retrieval pipeline, no large-model API round-trip |
| ๐ŸŽฏ **No universal winner** | RAG wins on updatability ยท Fine-Tuning wins on cost/speed/precision ยท Baseline wins on zero-setup |
## ๐Ÿง  What is CodeSage?
CodeSage is a **decision-making tool** for AI engineers. When building a domain-specific assistant, you always hit the same three-way fork:
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Domain-Specific AI Assistant โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ–ผ โ–ผ โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ BASELINE LLM โ”‚ โ”‚ RAG PIPELINE โ”‚ โ”‚ FINE-TUNING โ”‚
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ + Zero setup โ”‚ โ”‚ + Always fresh โ”‚ โ”‚ + 10x cheaper โ”‚
โ”‚ + Broad topics โ”‚ โ”‚ + Grounded โ”‚ โ”‚ + 0% hallucin. โ”‚
โ”‚ - Hallucinates โ”‚ โ”‚ - Retrieval lag โ”‚ โ”‚ - Hard to update โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
> CodeSage makes this trade-off **visible and measurable** โ€” same question, same moment, real output from all three.
---
## โœจ Features
<div align="center">
<table>
<tr>
<td align="center" width="220">
<strong>๐Ÿ”€ Side-by-Side Compare</strong><br/><br/>
Three answers to one question,<br/>simultaneously, in one view
</td>
<td align="center" width="220">
<strong>๐Ÿ“Š Auto Evaluation</strong><br/><br/>
8-metric LLM-as-Judge scores<br/>every response automatically
</td>
<td align="center" width="220">
<strong>๐Ÿ† Winner Badge</strong><br/><br/>
Best answer highlighted;<br/>hallucination flag raised on low-confidence
</td>
</tr>
<tr>
<td align="center">
<strong>๐Ÿ“ˆ Analytics Dashboard</strong><br/><br/>
Plotly charts + paper-style TABLE II<br/>aggregated over 50 benchmarks
</td>
<td align="center">
<strong>๐Ÿ’พ Persistent Cache</strong><br/><br/>
Results stored in <code>benchmark_cache.json</code><br/>โ€” instant reload, no re-running
</td>
<td align="center">
<strong>๐Ÿ“„ PDF Ingestion</strong><br/><br/>
Drop any PDF into <code>data/pdfs/</code><br/>โ€” RAG ingests it automatically
</td>
</tr>
</table>
</div>
---
## ๐Ÿ—๏ธ Architecture
```
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ ๐Ÿ–ฅ๏ธ Streamlit UI โ•‘
โ•‘ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ•‘
โ•‘ โ”‚ โšก System 1 โ”‚ โ”‚ ๐Ÿ” System 2 โ”‚ โ”‚ ๐Ÿง  System 3 โ”‚ โ•‘
โ•‘ โ”‚ Baseline LLM โ”‚ โ”‚ RAG Pipeline โ”‚ โ”‚ Fine-Tuned โ”‚ โ•‘
โ•‘ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ”‚ โ”‚ โ”‚
โ–ผ โ–ผ โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Groq API โ”‚ โ”‚ FAISS Index โ”‚ โ”‚ Qwen2.5-1.5B โ”‚
โ”‚ Llama-3.1-8Bโ”‚ โ”‚ all-MiniLM-L6-v2 โ”‚ โ”‚ + LoRA Adapters โ”‚
โ”‚ (zero-shot) โ”‚ โ”‚ (top-3 chunks) โ”‚ โ”‚ (PEFT, local) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
Groq API (with context)
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ›๏ธ LLM-as-Judge โ”‚
โ”‚ 8 metrics, auto โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
### โšก System 1 โ€” Baseline LLM
Sends the question directly to **Llama-3.1-8B** via Groq with a minimal system prompt. No extra knowledge. Represents what an off-the-shelf LLM can do โ€” the floor every other system must beat.
### ๐Ÿ” System 2 โ€” RAG Pipeline
1. Question โ†’ `all-MiniLM-L6-v2` embedding
2. Top-3 chunks retrieved from **FAISS** vector store (17 documents)
3. Chunks injected as context into **Llama-3.1-8B** via Groq
4. Groundedness scored โ€” answers must be traceable to retrieved text
### ๐Ÿง  System 3 โ€” Fine-Tuned Model
**Qwen2.5-1.5B** fine-tuned with **LoRA** (`r=8, ฮฑ=32`) on curated CS Q&A pairs via Google Colab T4 GPU. Adapters loaded locally via `peft` โ€” zero cloud inference cost, sub-second latency.
---
## ๐Ÿ“Š Evaluation Pipeline
Each answer is auto-scored by an LLM judge across **8 dimensions**:
| Icon | Metric | Description | Unit |
|:---:|:---|:---|:---:|
| ๐ŸŽฏ | **Answer Accuracy** | Cosine similarity of answer vs reference embedding | % |
| ๐Ÿ“Œ | **Groundedness** | Cosine similarity of answer vs retrieved context | 0โ€“1 |
| ๐Ÿšซ | **Hallucination Rate** | % of answers with accuracy < 0.5 | % |
| ๐Ÿ” | **Answer Relevance** | Cosine similarity of answer vs question | 0โ€“1 |
| ๐Ÿ“œ | **Faithfulness (ROUGE-L)** | Token overlap with source context or reference | 0โ€“1 |
| โฑ๏ธ | **Avg Response Time** | Mean latency per query | sec |
| ๐Ÿ’ฐ | **Cost per Query** | Token-count-based cost estimate | USD |
| โญ | **Overall Score** | 30% Acc + 20% Ground + 20% (1โˆ’HR) + 15% Rel + 15% Faith | 1โ€“5 |
---
## ๐Ÿš€ Quick Start
### `Step 1` โ€” Clone & Install
```bash
git clone https://github.com/Adityax-07/LLM-vs-RAG-vs-Fine-Tuning-.git
cd LLM-vs-RAG-vs-Fine-Tuning-
pip install -r requirements.txt
```
### `Step 2` โ€” Configure API Key
```bash
echo "GROQ_API_KEY=your_key_here" > .env
```
> ๐Ÿ†“ Free key at [console.groq.com](https://console.groq.com)
### `Step 3` โ€” Launch
```bash
streamlit run demo.py
```
> FAISS vector store builds automatically on first launch. **Systems 1 & 2 are ready instantly.**
### `Step 4` โ€” (Optional) Activate Fine-Tuned Model
```bash
python -c "
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-1.5B-Instruct')
model = PeftModel.from_pretrained(base, 'checkpoint-25')
model.merge_and_unload().save_pretrained('finetuned_model')
AutoTokenizer.from_pretrained('checkpoint-25').save_pretrained('finetuned_model')
"
```
> Or open `system3_finetune_colab.ipynb` in **Google Colab** to train from scratch on a free T4 GPU (~10 min).
### `Step 5` โ€” Regenerate Benchmark *(optional)*
```bash
# Pre-computed results already included in data/benchmark_cache.json
python run_benchmark.py
```
---
## ๐Ÿ“š Knowledge Base
The RAG system retrieves from **17 hand-crafted topic documents** in `data/docs/`:
<div align="center">
<table>
<tr>
<td valign="top" width="33%">
<strong>๐Ÿงฎ Algorithms &amp; DSA</strong><br/><br/>
<code>binary_search</code><br/>
<code>sorting_algorithms</code><br/>
<code>dynamic_programming</code><br/>
<code>graph_algorithms</code><br/>
<code>trees</code><br/>
<code>linked_list</code><br/>
<code>stack_queue</code><br/>
<code>recursion</code><br/>
<code>backtracking</code>
</td>
<td valign="top" width="33%">
<strong>๐Ÿ“ More DSA</strong><br/><br/>
<code>greedy_algorithms</code><br/>
<code>hashing</code><br/>
<code>string_algorithms</code><br/>
<code>two_pointers</code><br/>
<code>big_o_notation</code><br/>
<code>heaps</code>
</td>
<td valign="top" width="33%">
<strong>๐ŸŒ Web &amp; Tooling</strong><br/><br/>
<code>react_hooks</code><br/>
<code>rest_api</code><br/>
<code>javascript_promises</code><br/>
<code>css_flexbox</code><br/>
<code>typescript_basics</code><br/>
<code>sql_basics</code><br/>
<code>git_basics</code>
</td>
</tr>
</table>
</div>
---
## ๐Ÿ’ก Decision Guide
| ๐Ÿค” Situation | โœ… Best Choice | ๐Ÿ“ Why |
|:---|:---:|:---|
| Prototyping or general queries | **Baseline LLM** | Zero setup, covers broad topics well |
| Knowledge changes frequently | **RAG** | Update docs without retraining |
| Fixed domain, cost/latency matters | **Fine-Tuning** | 10ร— cheaper, 3ร— faster, 0% hallucination |
| Need citations & traceability | **RAG** | Groundedness score + visible source chunks |
| Production with tight latency SLA | **Fine-Tuning** | Local inference, no API round-trip |
---
## ๐Ÿ› ๏ธ Tech Stack
<div align="center">
<table>
<tr>
<th>Layer</th>
<th>Technology</th>
<th>Purpose</th>
</tr>
<tr>
<td>๐Ÿ“Š <strong>UI</strong></td>
<td>
<img src="https://img.shields.io/badge/Streamlit-FF4B4B?style=flat-square&logo=streamlit&logoColor=white" />
<img src="https://img.shields.io/badge/Plotly-3F4F75?style=flat-square&logo=plotly&logoColor=white" />
</td>
<td>3-way comparison dashboard + analytics charts</td>
</tr>
<tr>
<td>โšก <strong>LLM</strong></td>
<td><img src="https://img.shields.io/badge/Groq_API-F55036?style=flat-square&logo=groq&logoColor=white" /></td>
<td>Llama-3.1-8B โ€” Baseline + RAG generation</td>
</tr>
<tr>
<td>๐Ÿค– <strong>Embeddings</strong></td>
<td><img src="https://img.shields.io/badge/sentence--transformers-FFD21E?style=flat-square&logo=huggingface&logoColor=black" /></td>
<td><code>all-MiniLM-L6-v2</code> โ€” RAG semantic retrieval</td>
</tr>
<tr>
<td>๐Ÿ” <strong>Vector DB</strong></td>
<td><img src="https://img.shields.io/badge/FAISS-0467DF?style=flat-square&logo=meta&logoColor=white" /></td>
<td>CPU-based semantic search over knowledge base</td>
</tr>
<tr>
<td>๐Ÿง  <strong>Fine-Tuning</strong></td>
<td>
<img src="https://img.shields.io/badge/PEFT%2FLoRA-EF4444?style=flat-square&logo=pytorch&logoColor=white" />
<img src="https://img.shields.io/badge/Transformers-FFD21E?style=flat-square&logo=huggingface&logoColor=black" />
</td>
<td>LoRA adapter (r=8, ฮฑ=32) on Qwen2.5-1.5B</td>
</tr>
<tr>
<td>๐Ÿ‹๏ธ <strong>Base Model</strong></td>
<td><img src="https://img.shields.io/badge/Qwen2.5--1.5B-FFD21E?style=flat-square&logo=huggingface&logoColor=black" /></td>
<td>Alibaba's compact LLM โ€” LoRA fine-tuned locally</td>
</tr>
<tr>
<td>โ˜๏ธ <strong>Training</strong></td>
<td><img src="https://img.shields.io/badge/Google_Colab-F9AB00?style=flat-square&logo=googlecolab&logoColor=black" /></td>
<td>Free T4 GPU โ€” LoRA training in ~10 minutes</td>
</tr>
<tr>
<td>๐Ÿ”— <strong>Orchestration</strong></td>
<td><img src="https://img.shields.io/badge/LangChain-1C3C3C?style=flat-square&logo=langchain&logoColor=white" /></td>
<td>RAG pipeline, FAISS integration, PDF ingestion</td>
</tr>
<tr>
<td>๐Ÿ“ <strong>Metrics</strong></td>
<td><img src="https://img.shields.io/badge/rouge--score-EF4444?style=flat-square&logo=python&logoColor=white" /></td>
<td>ROUGE-L + cosine similarity for auto-evaluation</td>
</tr>
</table>
</div>
---
## ๐Ÿ—‚๏ธ Project Structure
```
๐Ÿ“ฆ LLM-vs-RAG-vs-Fine-Tuning/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ demo.py โ† Streamlit app (main entry point)
โ”œโ”€โ”€ ๐Ÿ“„ system1_baseline.py โ† Baseline LLM via Groq API
โ”œโ”€โ”€ ๐Ÿ“„ system2_rag.py โ† RAG pipeline: FAISS + LangChain + Groq
โ”œโ”€โ”€ ๐Ÿ“„ system3_inference.py โ† Fine-tuned model inference (PEFT)
โ”œโ”€โ”€ ๐Ÿ““ system3_finetune_colab.ipynb โ† LoRA training notebook (Colab T4)
โ”œโ”€โ”€ ๐Ÿ“„ evaluate.py โ† Standalone evaluation script
โ”œโ”€โ”€ ๐Ÿ“„ run_benchmark.py โ† Regenerates benchmark_cache.json
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ checkpoint-25/ โ† Trained LoRA weights (included)
โ”‚ โ”œโ”€โ”€ adapter_model.safetensors
โ”‚ โ”œโ”€โ”€ adapter_config.json โ† r=8, alpha=32
โ”‚ โ””โ”€โ”€ tokenizer.json
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ finetuned_model/ โ† Merged model (after merge step)
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ data/
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ docs/ โ† 17 knowledge base .txt files
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ faiss_index/ โ† FAISS vector store (auto-built)
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ pdfs/ โ† Drop PDFs here for RAG ingestion
โ”‚ โ”œโ”€โ”€ benchmark_cache.json โ† Pre-computed 50Q benchmark results
โ”‚ โ”œโ”€โ”€ reference_answers.json โ† Ground-truth Q&A pairs
โ”‚ โ””โ”€โ”€ finetune_data.jsonl โ† LoRA training data (ChatML format)
โ”‚
โ””โ”€โ”€ ๐Ÿ“„ requirements.txt
```
---
## ๐Ÿ”ฎ Roadmap
| Status | Feature |
|:---:|:---|
| โœ… | 50-question auto-benchmark with persistent cache |
| โœ… | LoRA fine-tune checkpoint (`checkpoint-25`) included |
| โœ… | Analytics dashboard with Plotly + TABLE II |
| โœ… | PDF ingestion into RAG knowledge base |
| ๐Ÿ”œ | Push Qwen2.5 LoRA adapter to HuggingFace Hub |
| ๐Ÿ”œ | Full 3-system live demo on HuggingFace Spaces |
| ๐Ÿ”œ | Expand knowledge base: 17 โ†’ 50+ documents |
| ๐Ÿ”œ | RAGAS-style faithfulness + context precision metrics |
| ๐Ÿ”œ | Custom knowledge base upload via Streamlit UI |
---
<!-- Wave footer -->
<img width="100%" src="https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=120&section=footer" />
<div align="center">
<strong>Built with ๐Ÿง  by <a href="https://github.com/Adityax-07">Adityax-07</a></strong>
<br/>
<em>Powered by Groq ยท HuggingFace ยท FAISS ยท LangChain ยท Streamlit</em>
<br/><br/>
<a href="https://github.com/Adityax-07">
<img src="https://img.shields.io/badge/GitHub-Adityax--07-181717?style=for-the-badge&logo=github&logoColor=white" />
</a>
<br/><br/>
โญ <strong>If CodeSage helped you understand the LLM trade-off space, drop a star!</strong>
</div>