--- title: CodeSage emoji: ๐ง colorFrom: blue colorTo: purple sdk: streamlit sdk_version: "1.35.0" app_file: demo.py pinned: false ---
๐งช CodeSage is a live, side-by-side AI research platform that fires the same programming question at three fundamentally different architectures โ Baseline LLM, RAG, and Fine-Tuning โ then auto-scores every answer on accuracy, hallucination, groundedness, relevance, and cost.
No cherry-picking. No manual grading. Real numbers, real trade-offs.
|
๐ Side-by-Side Compare Three answers to one question, simultaneously, in one view |
๐ Auto Evaluation 8-metric LLM-as-Judge scores every response automatically |
๐ Winner Badge Best answer highlighted; hallucination flag raised on low-confidence |
|
๐ Analytics Dashboard Plotly charts + paper-style TABLE II aggregated over 50 benchmarks |
๐พ Persistent Cache Results stored in benchmark_cache.jsonโ instant reload, no re-running |
๐ PDF Ingestion Drop any PDF into data/pdfs/โ RAG ingests it automatically |
๐งฎ Algorithms & DSAbinary_searchsorting_algorithmsdynamic_programminggraph_algorithmstreeslinked_liststack_queuerecursionbacktracking
|
๐ More DSAgreedy_algorithmshashingstring_algorithmstwo_pointersbig_o_notationheaps
|
๐ Web & Toolingreact_hooksrest_apijavascript_promisescss_flexboxtypescript_basicssql_basicsgit_basics
|
| Layer | Technology | Purpose |
|---|---|---|
| ๐ UI |
|
3-way comparison dashboard + analytics charts |
| โก LLM | Llama-3.1-8B โ Baseline + RAG generation | |
| ๐ค Embeddings | all-MiniLM-L6-v2 โ RAG semantic retrieval |
|
| ๐ Vector DB | CPU-based semantic search over knowledge base | |
| ๐ง Fine-Tuning |
|
LoRA adapter (r=8, ฮฑ=32) on Qwen2.5-1.5B |
| ๐๏ธ Base Model | Alibaba's compact LLM โ LoRA fine-tuned locally | |
| โ๏ธ Training | Free T4 GPU โ LoRA training in ~10 minutes | |
| ๐ Orchestration | RAG pipeline, FAISS integration, PDF ingestion | |
| ๐ Metrics | ROUGE-L + cosine similarity for auto-evaluation |