Spaces:
Running
Running
| # RAG Capstone Project - Code Organization Guide | |
| ## π Directory Structure After Code Review | |
| ``` | |
| RAG Capstone Project/ | |
| β | |
| ββ π― CORE APPLICATION (Active Production Code) | |
| β ββ run.py ............................ Quick start launcher | |
| β ββ streamlit_app.py .................. Main web interface | |
| β ββ config.py ......................... Settings & configuration | |
| β ββ vector_store.py ................... ChromaDB vector database | |
| β ββ llm_client.py ..................... Groq LLM integration | |
| β ββ embedding_models.py ............... Embedding factory pattern | |
| β ββ chunking_strategies.py ............ Document chunking strategies | |
| β ββ dataset_loader.py ................. RAGBench dataset loader | |
| β ββ trace_evaluator.py ................ TRACE metric evaluation | |
| β ββ advanced_rag_evaluator.py ......... Advanced metrics (RMSE, AUC) | |
| β ββ evaluation_pipeline.py ............ Evaluation orchestration | |
| β | |
| ββ π οΈ UTILITIES & RECOVERY (Maintenance Tools) | |
| β ββ rebuild_chroma_index.py ........... Rebuild ChromaDB indices | |
| β ββ rebuild_sqlite_direct.py .......... Direct SQLite rebuild | |
| β ββ recover_chroma_advanced.py ........ Advanced recovery utility | |
| β ββ recover_collections.py ............ Collection recovery | |
| β ββ rename_collections.py ............. Collection renaming | |
| β ββ reset_sqlite_index.py ............. Index reset utility | |
| β | |
| ββ π§ͺ TEST SCRIPTS | |
| β ββ test_llm_audit_trail.py ........... Audit trail testing | |
| β ββ test_rmse_aggregation.py .......... RMSE metric testing | |
| β | |
| ββ π¦ ARCHIVED (Moved from Main Directory) | |
| β ββ archived_scripts/ | |
| β ββ api.py ......................... [UNUSED] FastAPI implementation | |
| β ββ audit_collection_names.py ...... [UNUSED] SQLite audit tool | |
| β ββ cleanup_chroma.py .............. [UNUSED] Cleanup utility | |
| β ββ create_architecture_diagram.py . [UNUSED] Diagram generator | |
| β ββ create_ppt_presentation.py ..... [UNUSED] PPT generator | |
| β ββ create_trace_flow_diagrams.py .. [UNUSED] Flow diagram generator | |
| β ββ example.py ..................... [UNUSED] Example usage | |
| β ββ README.md ...................... Archive documentation | |
| β | |
| ββ βοΈ CONFIGURATION & DEPLOYMENT | |
| β ββ .env ............................... Environment variables (local) | |
| β ββ .env.example ....................... Example environment | |
| β ββ requirements.txt ................... Python dependencies | |
| β ββ docker-compose.yml ................. Docker orchestration | |
| β ββ Dockerfile ......................... Container definition | |
| β ββ Procfile ........................... Heroku/deployment manifest | |
| β | |
| ββ πΎ DATA & STORAGE | |
| β ββ chroma_db/ ......................... ChromaDB vector storage | |
| β ββ data_cache/ ........................ Cached datasets | |
| β ββ venv/ ............................. Python virtual environment | |
| β ββ __pycache__/ ....................... Python bytecode cache | |
| β | |
| ββ π DOCUMENTATION | |
| β ββ CODE_REVIEW_REPORT.md ............. [NEW] Comprehensive code review | |
| β ββ README.md .......................... Project documentation | |
| β ββ docs/ ............................. Additional documentation | |
| β | |
| ββ π GENERATED OUTPUT | |
| ββ RAG_Architecture_Diagram.png ....... System architecture | |
| ββ RAG_Data_Flow_Diagram.png ......... Data flow visualization | |
| ββ RAG_Capstone_Project_Presentation.pptx ... Presentation slides | |
| ββ Sentence_Mapping_Example.png ....... Example output | |
| ``` | |
| --- | |
| ## π― Quick Reference by Task | |
| ### Running the Application | |
| ```bash | |
| python run.py # Quick start launcher | |
| streamlit run streamlit_app.py # Direct web interface | |
| ``` | |
| ### Core System Modules | |
| - **Data Pipeline**: `dataset_loader.py` β `vector_store.py` β `embedding_models.py` | |
| - **Query Pipeline**: `llm_client.py` β `trace_evaluator.py` | |
| - **Orchestration**: `evaluation_pipeline.py` (coordinates everything) | |
| ### Database Maintenance | |
| - **Corruption detected?** Run recovery scripts: | |
| - `recover_chroma_advanced.py` (recommended first) | |
| - `rebuild_chroma_index.py` (full rebuild) | |
| - `recover_collections.py` (collection-specific) | |
| ### Development & Testing | |
| - **Test evaluation**: `test_llm_audit_trail.py` | |
| - **Test metrics**: `test_rmse_aggregation.py` | |
| - **Example usage**: See `archived_scripts/example.py` | |
| --- | |
| ## π File Statistics | |
| | Category | Count | Status | | |
| |----------|-------|--------| | |
| | Core Production | 11 | β Active | | |
| | Recovery/Utilities | 6 | β In Use | | |
| | Test Scripts | 2 | β In Use | | |
| | Archived | 7 | π¦ Not Used | | |
| | Configuration | 5 | β In Use | | |
| | **Total** | **31** | **Clean** | | |
| --- | |
| ## π Why Files Were Moved | |
| ### Archived Files (7 total) | |
| These files do NOT have any imports in the active codebase: | |
| - **api.py** - Alternative FastAPI backend (not used; main app is Streamlit) | |
| - **example.py** - Demo script; not part of production pipeline | |
| - **Diagram/PPT generators** - Documentation tools; run standalone only | |
| - **Audit script** - Development debugging tool; not in main flow | |
| - **Cleanup script** - Maintenance utility; not in main flow | |
| ### Preserved Files (20 total) | |
| These files ARE actively imported: | |
| - **Core modules** - Required by streamlit_app.py and run.py | |
| - **Recovery tools** - Critical for database maintenance | |
| - **Test scripts** - Part of quality assurance process | |
| --- | |
| ## β Code Review Highlights | |
| ### Strengths Found | |
| β Well-structured modular architecture | |
| β Excellent factory pattern implementation | |
| β Intelligent rate limiting for API | |
| β Type-safe configuration with Pydantic | |
| β Clear separation of concerns | |
| ### Improvement Recommendations | |
| β οΈ Add structured logging (replace print statements) | |
| β οΈ Improve error handling (too many broad exceptions) | |
| β οΈ Add comprehensive type hints | |
| β οΈ Add input validation | |
| β οΈ Add performance monitoring | |
| **See CODE_REVIEW_REPORT.md for detailed analysis and recommendations** | |
| --- | |
| ## π Notes | |
| - **Recovery Scripts**: These are NOT "unused" - they're critical maintenance tools kept in main directory | |
| - **Test Scripts**: These are NOT "unused" - they're part of development workflow | |
| - **Archive**: Safe to delete archived_scripts/ if files are never needed again | |
| - **Git**: All files remain in git history; no data is lost | |
| --- | |
| **Last Updated**: January 1, 2026 | |
| **Status**: β Code cleanup complete and documented | |