vgecbot / CODEBASE_DOCUMENTATION.md
harsh-dev's picture
docker deployment
4225666

VGEC RAG Chatbot β€” Codebase Documentation

Generated: 2026-03-25
Version: 1.0.0
Scope: Full system β€” ingestion, retrieval, classification, API, evaluation


Table of Contents

  1. Project Overview
  2. System Architecture
  3. Schema & Data Model
  4. Retrieval Pipeline
  5. Key Classes & Modules
  6. Evaluation & Metrics
  7. Known Limitations
  8. File Structure

1. Project Overview

Purpose

VGEC RAG Chatbot is a Retrieval-Augmented Generation (RAG) chatbot for Vishwakarma Government Engineering College (VGEC), Chandkheda, Gujarat. It allows students, faculty, and visitors to query structured information about the institution β€” departments, faculty, syllabus, labs, intake capacity, and more β€” through natural language.

Domain

  • Institution: VGEC (Government Engineering College, Gujarat)
  • Data Coverage: Department-level information for multiple disciplines (Computer Engineering, Civil, Electrical, IT, ECE, etc.)
  • Topics: Faculty lists, lab facilities, syllabus details, HOD info, research activities, intake capacity, achievements

Tech Stack

Layer Technology
API Framework FastAPI
Vector Database ChromaDB (persistent, local)
Embeddings Google gemini-embedding-001 (via langchain-google-genai)
LLM (Cloud) Google Gemini gemini-2.5-flash-lite
LLM (Local) EXAONE-3.5-2.4B-Instruct-Q4_K_M.gguf via llama-cpp-python
NLP / Preprocessing spaCy (en_core_web_sm), NLTK (PorterStemmer)
Classifier Scikit-learn LogisticRegression + SentenceTransformer (MongoDB/mdbr-leaf-mt)
BM25 langchain-community BM25Retriever
Chunking LangChain RecursiveCharacterTextSplitter
Config Pydantic BaseSettings (.env-backed)

Key Features Implemented

  • βœ… Structured JSON ingestion with intent-aware chunking
  • βœ… Hybrid retrieval: BM25 + vector search fused via Reciprocal Rank Fusion (RRF)
  • βœ… Intent/metadata classification with confidence-gated ChromaDB filters
  • βœ… Abbreviation expansion (CE β†’ Computer Engineering, etc.)
  • βœ… Multi-turn conversation history support
  • βœ… Dual LLM backend with automatic fallback (Gemini ↔ Local)
  • βœ… Full CRUD REST API for vector store management
  • βœ… Offline evaluation endpoint (MRR, hit rate, noise rate)
  • βœ… Classifier accuracy evaluation endpoint

2. System Architecture

Component Diagram

                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚        FastAPI App         β”‚
                         β”‚  /api/v1/rag   /vector    β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚ DI (lru_cache)
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚        RAGService          β”‚
                         β”‚  (core orchestrator)       β”‚
                         β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚           β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ IngestionServiceβ”‚   β”‚  HybridRetrievalServiceβ”‚
              β”‚  (write path)  β”‚   β”‚   (read path)          β”‚
              └──────┬──────── β”˜   └───┬──────────┬─────── β”˜
                     β”‚                 β”‚           β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  FileService β”‚   β”‚ ClassifierSvcβ”‚  β”‚  VectorStore  β”‚
          β”‚ (file +meta) β”‚   β”‚(clf predict) β”‚  β”‚  (ChromaDB)   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

Ingestion Path

File Upload (PDF/MD/TXT/JSON)
   β”‚
   β–Ό
FileService.read_file()          ← type-aware loading (PyMuPDF for PDF)
   β”‚  returns: Document + metadata
   β–Ό
FileService.write_file()         ← persist copy to data/documents/
   β”‚
   β–Ό
IngestionService.handle_*_docs() ← route by file extension
   β”‚
   β”œβ”€ JSON β†’ handle_json_docs()  ← intent-aware chunks (list / detail / count)
   └─ text β†’ handle_text_docs()  ← RecursiveCharacterTextSplitter + normalize()
   β”‚
   β–Ό
VectorStore.add_documents()      ← embed + upsert into ChromaDB
   β”‚
   β–Ό
FileService.patch_metadata()     ← update ingestion record JSON (chunk count, timing, size)

Query Path

User Question
   β”‚
   β–Ό
preprocess_query()               ← tokenize + strip stopwords (spaCy) + normalize
   β”‚
   β–Ό
HybridRetrievalService.retrieve()
   β”‚
   β”œβ”€ clf.expand_abbreviations() ← CE β†’ Computer Engineering
   β”œβ”€ clf.predict_with_filter()  ← LogReg predict β†’ Chroma $and/$or filter
   β”œβ”€ _vector_rank()             ← ChromaDB similarity_search_with_score (k=15)
   β”œβ”€ _bm25_rank()               ← BM25 over the vector candidate pool
   β”œβ”€ _reciprocal_rank_fusion()  ← weighted RRF merge
   β”œβ”€ metadata score boosting    ← multiply fused scores for confident matches
   └─ _apply_title_boost()       ← per-query-word title match bonus
   β”‚
   β–Ό
get_references_v2()              ← filter by threshold, build context string
   β”‚
   β–Ό
LLM.invoke(prompt)               ← Gemini or local LlamaCpp
   β”‚
   β–Ό
Return: { answer, references, context, threshold_used, k_used }

External Dependencies

Dependency Role Provider
ChromaDB Persistent vector store Local disk
Google Gemini API Embeddings + LLM generation Google Cloud
LlamaCpp (GGUF model) Local LLM fallback Local CPU
Sentence Transformers Classifier feature extraction HuggingFace Hub
spaCy en_core_web_sm POS tagging / lemmatization Local

3. Schema & Data Model

Source JSON Format

Source data files (e.g. computer_eng.json) follow this schema:

{
  "id": "computer-engineering-department",
  "name": "Computer Engineering Department",
  "source": "https://www.vgecg.ac.in/department.php?dept=3",
  "category": "computer_eng",
  "type": "department",
  "created_date": "2026-02-19",
  "content": {
    "<topic_key>": {
      "list": ["item 1", "item 2", "..."],
      "details": "Paragraph describing the topic."
    }
  }
}

Top-level fields:

Field Type Description
id string Unique document identifier
name string Human-readable institution/department name
source string Authoritative URL
category string Department slug (e.g. computer_eng)
type string Document type (e.g. department)
created_date string (ISO) Data creation date
content object Topic map; each key = a topic

Chunk Metadata Schema (stored in ChromaDB)

Every vector chunk stored in Chroma carries the following metadata:

Field Type Source
id string (UUID) Auto-generated
title string Document name / topic key
source string Source URL
source_file string Filename (e.g. computer_eng.json)
type string Taxonomy level 1 (e.g. department)
category string Taxonomy level 2 (e.g. computer_eng)
topic string Taxonomy level 3 (e.g. faculty)
intent string Chunk intent: list, detail, or count
chunk_index int Sequential index within file
created_date string (ISO) Ingestion timestamp
updated_at string (ISO) Last modification timestamp
ext string Source file extension (json, pdf, md, txt)

Hierarchical Taxonomy

The classifier predicts and ChromaDB filters operate on a 3-level hierarchy:

type
 └── category
      └── topic
           └── intent  (list | detail | count)

Example mapping (Computer Engineering):

type: "department"
  └── category: "computer_eng"
         β”œβ”€β”€ topic: "faculty"    β†’ intent: list | detail
         β”œβ”€β”€ topic: "lab"        β†’ intent: list | detail
         β”œβ”€β”€ topic: "syllabus"   β†’ intent: list | detail
         β”œβ”€β”€ topic: "hod"        β†’ intent: list | detail
         β”œβ”€β”€ topic: "intake"     β†’ intent: list | detail
         β”œβ”€β”€ topic: "research"   β†’ intent: list | detail
         └── topic: "achievements"

Document Chunking Strategy

JSON documents use a hand-crafted, intent-aware strategy in IngestionService.handle_json_docs():

Intent Chunk Content Metadata
list Numbered list: 1. item\n2. item\n... intent=list
count "Total <topic>: N" (auto-generated) intent=count
detail Raw paragraph text intent=detail

Text/PDF/Markdown documents use RecursiveCharacterTextSplitter:

  • Default: chunk_size=500, chunk_overlap=100
  • Separator priority: \n\n β†’ \n β†’ β†’ (character)
  • Markdown variant respects --- section delimiters
  • Content is passed through normalize() (tokenize + strip blanks) before storage

4. Retrieval Pipeline

Query Processing Flow

# Step 1: Normalize input
question = preprocess_query(question)
# β†’ spaCy POS filter (NOUN, PROPN, VERB, NUM, ADJ) + lemmatize + strip stopwords

# Step 2: Expand abbreviations
processed_query = clf.expand_abbreviations(query)
# β†’ "CE dept" β†’ "computer engineering department"

# Step 3: Classify intent/metadata
filters = clf.predict_with_filter([processed_query])
# β†’ {"$and": [{"type": "department"}, {"intent": "list"}, {"$or": [...]}]}

# Step 4: Vector search with optional filter
raw_results = chroma.similarity_search_with_score(query, k=15, filter=filters)
# Fallback: if filtered results empty, retry without filter

# Step 5: BM25 re-rank over vector candidates
bm25_results = BM25Retriever.from_documents(candidate_docs)

# Step 6: RRF fusion
fused_score(d) = bm25_weight * 1/(rrf_k + rank_bm25)
              + vector_weight * 1/(rrf_k + rank_vec)

# Step 7: Metadata confidence boosting
if doc.metadata[field] == predicted_val and conf > 0.90:
    result.fused_score *= boost_factor  # 1.10–1.20

# Step 8: Title word boost
for word in query_words:
    if word in doc.title:
        result.fused_score += title_boost_per_word  # 0.004

# Step 9: Threshold filter + sort + top-k
results = [r for r in results if r.fused_score >= threshold]

Classifier Thresholds

The Classifier uses two separate threshold tables:

Prediction threshold β€” below this, the field is set to None (not used at all):

Field Threshold
type 0.40
category 0.40
topic 0.50
intent 0.60

Filter threshold β€” above this, the field becomes a hard ChromaDB $and filter:

Field Threshold
type 0.65
category 0.65
topic 0.70

Filter Construction Logic (_build_filter)

# Gate: if type confidence < 0.65 β†’ return None (full scan)
# Hard anchors (always included if type passes):
#   - type == predicted_type
#   - intent == predicted_intent  (special: "count" expands to count OR detail)
# Soft hints (combined as $or):
#   - category == predicted_category  (if conf >= 0.65, else "general")
#   - topic == predicted_topic        (if conf >= 0.70, else "general")

Hybrid Retrieval Config (Defaults)

Parameter hybrid_query search_docs
candidate_k 15 15
top_k (final) settings.similarity_top_k (8) k (param)
bm25_weight 0.45 0.70
vector_weight 0.55 0.30
rrf_k 20 20
bm25_k1 1.2 1.5
bm25_b 0.9 0.75
title_boost_per_word 0.004 0.004
score_threshold 0.4 0.4

Note: search_docs is BM25-heavy (0.70) since it is used for keyword-oriented document browsing, while hybrid_query is vector-heavy for semantic QA.


5. Key Classes & Modules

Services (app/services/)

RAGService

Main orchestrator. Singleton via lru_cache in dependencies.py.

Method Description
query() Semantic-only QA (vector search β†’ LLM)
hybrid_query() Hybrid QA (BM25 + vector β†’ RRF β†’ LLM)
search_docs() BM25-heavy document search, no LLM
ingest_documents() Ingest a file path into the vector store
get_filenames() Return all tracked file metadata records
test_queries() Batch retrieval evaluation (MRR, precision, noise)
test_classifier() Batch classifier accuracy evaluation
delete_database() Drop the entire ChromaDB collection

HybridRetrievalService

Stateless per-request service created inline by RAGService.

Method Description
retrieve(query) Full hybrid retrieval pipeline; returns List[RetrievalResult]
_vector_rank() Chroma similarity search + classifier filter
_bm25_rank() BM25 over candidate pool
_reciprocal_rank_fusion() Merge both ranked lists via RRF
_apply_title_boost() Word-level title match score bonus

RetrievalResult dataclass:

@dataclass
class RetrievalResult:
    document: Document
    fused_score: float
    bm25_rank: Optional[int]
    vector_rank: Optional[int]
    title_boost: float

Classifier

Loaded at startup from a pickled pipeline (chatbot_classifier.pkl).

Method Description
predict(queries) Returns list of {type, category, topic, intent, *_conf} dicts
predict_with_filter(queries) Returns a ChromaDB-compatible filter dict or None
expand_abbreviations(text) Regex-based abbreviation expansion
get_features(queries) Build `[SentenceTransformer embedding
train_models(df) Train 4 LogisticRegression classifiers (offline use)

IngestionService

Method Description
ingest(file_path) Load + chunk a file; returns List[Document]
handle_json_docs() Intent-aware chunking for structured JSON data
handle_text_docs() Recursive character splitting for unstructured text
get_records() Delegate to FileService.get_records()
delete_record(filename) Remove a file's metadata record
path_record(path, metadata) Patch ingestion stats after indexing

FileService

Method Description
read_file(path) Load file content; dispatches by extension
write_file(path, content, metadata) Persist file to data/documents/
patch_metadata(path, metadata) Merge new fields into existing record
get_records() Return all ingestion records dict
delete_record(filename) Remove a record from <collection>.json

VectorStore

Thin wrapper around langchain_chroma.Chroma.

Method Description
get() Retrieve all documents
get_by_id(ids) Retrieve specific documents by ID
add_documents(docs) Embed + insert, skipping empty chunks
update_document(id, doc) Delete then re-insert with same ID
delete(ids) Remove documents by ID list
similarity_search_with_score() Wrapped Chroma search

Utilities (app/utils/)

preprocessing.py

Function Description
preprocess(text) spaCy POS filter + lemmatize + stopword removal β†’ joined string
normalize(text) Tokenize + strip blanks (lightweight, no POS)
preprocess_query(query) Applies normalize() to user queries
preprocess_documents(docs) Applies preprocess() to a document list in-place
preprocess_filename(path) Sanitize filename (remove special chars, lowercase)

document_helpers.py

Function Description
get_references_v2(docs, threshold) Convert RetrievalResult list β†’ references dict + context string
get_references(docs, threshold) Same for raw (Document, distance) tuples (used by query())
build_metadata(path) Parse YAML frontmatter from .md/.txt files
create_documents(chunks, ...) Attach standard metadata (UUID, timestamps, indices) to chunks
create_documents_from_text(text) Full pipeline: frontmatter parse β†’ split β†’ metadata attach
clean_metadata(metadata) Serialize datetime, coerce non-allowed types to string

model_factory.py

Function Description
get_embedding_model() Returns GoogleGenerativeAIEmbeddings
get_gemini_model() Returns ChatGoogleGenerativeAI
get_local_model() Returns ChatLlamaCpp (GGUF, CPU inference)
get_llm_model(provider) Dispatches to Gemini or Local with fallback logic

API Routes (app/api/routes/)

rag.py β€” prefix /api/v1/rag

Method Endpoint Description
GET / Health check
POST / Semantic query
POST /hybrid_query Hybrid RAG query (primary endpoint)
POST /similarity_search Hybrid retrieval, no LLM response
POST /search BM25-heavy document search
POST /test Batch retrieval evaluation
POST /test_classifier Classifier accuracy evaluation
GET /test_classifier_dataset Run built-in test dataset, cache result

vector_store.py β€” prefix /api/v1/vector

Method Endpoint Description
GET / List all documents (paginated, filterable)
GET /filenames List ingested file records
GET /{id} Get single document by ChromaDB ID
POST / Upload + ingest file
PUT /{id} Update document content/metadata
DELETE /ids Bulk delete by ID list
DELETE /{id} Delete single document
DELETE / Filter-based delete (filename/source/contains)

Configuration (app/core/config.py)

All settings are read from .env via Pydantic BaseSettings:

class Settings(BaseSettings):
    # Paths
    collection_name: str = "classifier_test_1"
    persist_directory: str = "./data/vector_stores/classifier_test_1"

    # Chunking
    chunk_size: int = 500
    chunk_overlap: int = 100

    # Retrieval
    similarity_top_k: int = 8
    similarity_threshold: float = 0.4

    # LLM Provider
    llm_provider: Literal["gemini", "local"] = "local"
    enable_fallback: bool = True

    # Models
    embedding_model_name: str = "models/gemini-embedding-001"
    gemini_model_name: str = "gemini-2.5-flash-lite"
    local_model_name: str = "EXAONE-3.5-2.4B-Instruct-Q4_K_M.gguf"

    # Generation
    max_output_tokens: int = 2048
    local_max_tokens: int = 512

    # Auth
    google_api_key: str  # required β€” must be in .env

6. Evaluation & Metrics

Retrieval Evaluation (test_queries / POST /api/v1/rag/test)

Tests each (question, expected_document, expected_chunk_index) triple against hybrid_query:

Metric Formula Interpretation
Hit Rate hits / total % of questions where the exact chunk was retrieved
Top-1 Hit Rate rank==1 hits / total % of questions where exact chunk was top result
MRR mean(1/rank) Mean Reciprocal Rank; higher = correct result ranked earlier
Doc Precision correct_source_chunks / all_chunks How many retrieved chunks came from the right document
Doc Recall 1 if any correct_source_chunk else 0 Did we retrieve at least one chunk from the right document?
Doc Noise wrong_source_chunks / all_chunks Proportion of off-topic chunks in the result set
Error Rate 1 - hit_rate Miss rate for exact chunk retrieval

Test Input Schema:

class TestRequestSchema(BaseModel):
    tests: List[Test]   # question + document + chunk_index
    k: int = 5
    threshold: float = 0.4

Classifier Evaluation (test_classifier / POST /api/v1/rag/test_classifier)

Evaluates predictions for all 4 classification fields (type, category, topic, intent):

Metric Notes
Accuracy sklearn.accuracy_score
Precision (macro) zero_division=0
Recall (macro) zero_division=0
F1 Macro Unweighted average across classes
F1 Weighted Class-frequency weighted
Classification Report Full per-class breakdown (output_dict=True)

A bundled test dataset is stored in app/utils/tests.py as classifier_test_dataset and can be executed via GET /api/v1/rag/test_classifier_dataset. Results are memoized on the RAGService.evaluation dict for the lifetime of the server process.


7. Known Limitations

Technical Debt

  • preprocess_query is incomplete. The function signature has an LLM-powered query rewriting block that is commented out. Currently it just calls normalize() (tokenize only), which means no stopword removal or lemmatization is applied to user queries (only to stored documents).
  • search_docs does not honour filename as a metadata filter in Chroma. The filter is applied in Python post-retrieval, which is inefficient for large collections.
  • Count intent is synthetic. The "Total <topic>: N" chunk is an auto-generated chunk during ingestion, not from the source document. If source data changes, stale count chunks can remain indexed.
  • VectorStore.get_dict() has a print(type(rows)) debug statement left in production code.
  • FileService.__init__ docstring has an extra backtick: "` class docstring.

Planned but Unimplemented

  • Query rewriting via local LLM β€” skeleton is commented out in preprocess_query().
  • Semantic caching β€” no query result memoization at the API layer.
  • Re-ranker β€” no cross-encoder re-ranking step; relies only on RRF + boosting.
  • topic field is not included in the ChromaDB hard filter β€” only type + intent are hard-anchored; category and topic are soft $or hints.

Performance Bottlenecks

  • Local LLM (LlamaCpp) is CPU-only with n_ctx=8096 and n_threads=4. Response latency is high (~10–30s) on low-RAM systems.
  • Classifier uses SentenceTransformer + TF-IDF features β€” inference runs on every request with no caching of query embeddings.
  • BM25 corpus is rebuilt from scratch per request β€” BM25Retriever.from_documents() is called inside _bm25_rank() each time.
  • classify_test_dataset in app/utils/tests.py is a very large file (1.8MB) loaded at import time.
  • The memoized evaluation in rag_service.evaluation is not thread-safe if the server runs with multiple workers.

8. File Structure

VGEC-RAG-Chatbot/
β”‚
β”œβ”€β”€ app/                            # Application package
β”‚   β”œβ”€β”€ main.py                     # FastAPI app, router mounting, CORS middleware
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py               # Pydantic Settings (all tuneable params)
β”‚   β”‚   └── paths.py                # Path constants helper
β”‚   β”‚
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dependencies.py         # lru_cache singleton for RAGService
β”‚   β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”‚   β”œβ”€β”€ rag.py              # /rag endpoints (query, test, classifier)
β”‚   β”‚   β”‚   β”œβ”€β”€ vector_store.py     # /vector endpoints (CRUD for ChromaDB)
β”‚   β”‚   β”‚   └── settings.py         # /settings endpoints
β”‚   β”‚   └── schemas/
β”‚   β”‚       β”œβ”€β”€ requests.py         # RAGRequest, PaginationParams, etc.
β”‚   β”‚       └── tests.py            # TestRequestSchema, TestClassifierReqSchema
β”‚   β”‚
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ rag_service.py          # RAGService (main orchestrator)
β”‚   β”‚   β”œβ”€β”€ hybrid_retrieval.py     # HybridRetrievalService + RRF logic
β”‚   β”‚   β”œβ”€β”€ classifier_service.py   # Classifier class + singleton clf
β”‚   β”‚   β”œβ”€β”€ ingestion_service.py    # IngestionService (chunking pipeline)
β”‚   β”‚   β”œβ”€β”€ file_service.py         # FileService (file I/O + metadata JSON)
β”‚   β”‚   β”œβ”€β”€ vector_store.py         # VectorStore (thin ChromaDB wrapper)
β”‚   β”‚   β”œβ”€β”€ text_splitter.py        # TextSplitter (RecursiveCharacter + variants)
β”‚   β”‚   └── document_loader.py      # (legacy loader, not in primary path)
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ preprocessing.py        # preprocess(), normalize(), preprocess_query()
β”‚   β”‚   β”œβ”€β”€ document_helpers.py     # get_references_v2(), build_metadata(), create_documents()
β”‚   β”‚   β”œβ”€β”€ model_factory.py        # get_llm_model(), get_embedding_model()
β”‚   β”‚   β”œβ”€β”€ constants.py            # stopwords list, short_words_mappings
β”‚   β”‚   β”œβ”€β”€ embeddings.py           # (thin embedding util)
β”‚   β”‚   β”œβ”€β”€ llm_models.py           # (thin LLM util)
β”‚   β”‚   └── tests.py                # classifier_test_dataset (large, 1.8MB)
β”‚   β”‚
β”‚   └── prompts/
β”‚       └── __init__.py             # SYSTEM_PROMPT, wrap_exaone()
β”‚
β”œβ”€β”€ ml_models/
β”‚   β”œβ”€β”€ classifier/
β”‚   β”‚   └── chatbot_classifier.pkl  # Pickled pipeline (models, tfidf, label encoders, etc.)
β”‚   β”œβ”€β”€ embeddings/                 # (Local embedding model weights, if any)
β”‚   └── llm/
β”‚       └── EXAONE-3.5-2.4B-*.gguf # Local LLM weights
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ department_data/            # Source JSON files per department
β”‚   β”‚   β”œβ”€β”€ computer_eng.json
β”‚   β”‚   β”œβ”€β”€ civil.json
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ documents/                  # Persistent copies of ingested files
β”‚   β”œβ”€β”€ vector_stores/
β”‚   β”‚   └── classifier_test_1/      # ChromaDB persist directory
β”‚   β”œβ”€β”€ classifier_test_1.json      # Ingestion metadata registry (FileService records)
β”‚   └── other_data/                 # Misc data files
β”‚
β”œβ”€β”€ temp/                           # Staging area for uploaded files (auto-cleared)
β”œβ”€β”€ scripts/                        # Offline scripts (training, testing)
β”œβ”€β”€ tests/                          # Test files
β”‚
β”œβ”€β”€ requirements.txt                # Pinned production dependencies
β”œβ”€β”€ .env                            # Runtime secrets (google_api_key, etc.)
β”œβ”€β”€ .env.example                    # Template for .env
└── CODEBASE_DOCUMENTATION.md       # This file

End of documentation.