# 🤖 RAG Chatbot — Application Report

**Course:** Makers Lab (Term 3) | **Institute:** SPJIMR  
**Date:** February 10, 2026

---

## 1. What Is This Application?

This is an **AI-powered chatbot** that can answer questions by reading through your own documents. Instead of searching the internet, it looks through a personal "knowledge base" — a folder of text files you provide — and gives you accurate, sourced answers.

Think of it like having a **personal assistant who has read all your documents** and can instantly recall information from them when you ask a question.

---

## 2. The Core Idea: RAG (Retrieval-Augmented Generation)

**RAG** stands for **Retrieval-Augmented Generation**. In simple terms, it combines two steps:

| Step | What Happens | Analogy |
|------|-------------|---------|
| **1. Retrieval** | The system searches your documents and finds the most relevant paragraphs related to your question | Like flipping through a textbook to find the right page |
| **2. Generation** | An AI model reads those paragraphs and writes a clear, human-like answer | Like a student summarizing what they found in their own words |

> [!IMPORTANT]
> The AI **only answers from your documents** — it does not make things up or pull information from the internet. If the answer isn't in your files, it will tell you.

---

## 3. How It Works (Step by Step)

```mermaid
flowchart LR
    A["📄 Your Documents"] --> B["✂️ Split into Chunks"]
    B --> C["🔢 Convert to Numbers\n(Embeddings)"]
    C --> D["🗄️ Store in FAISS\n(Vector Database)"]
    E["❓ Your Question"] --> F["🔢 Convert to Numbers"]
    F --> G["🔍 Find Similar Chunks"]
    D --> G
    G --> H["🤖 AI Generates Answer"]
    H --> I["💬 Response Shown"]
```

### Breaking it down:

1. **You add documents** — Place `.txt` files in the `knowledge_base` folder (e.g., company policies, notes, research papers)

2. **Documents are split** — Large files are broken into smaller, manageable pieces called "chunks" (like cutting a book into individual pages)

3. **Chunks become numbers** — Each chunk is converted into a list of numbers (called an "embedding") that captures its meaning. This is done by an **Embedding Model** running on HuggingFace's servers

4. **Numbers are stored** — These numerical representations are saved in a **FAISS database** (a fast search engine for numbers)

5. **You ask a question** — Your question is also converted into numbers the same way

6. **Similar chunks are found** — The system compares your question's numbers with all the chunk numbers to find the closest matches (like finding the most relevant pages)

7. **AI writes the answer** — The matching chunks are sent to a **Language Model (LLM)** which reads them and generates a clear, natural-language answer

---

## 4. Key Features

### 📚 Custom Knowledge Base
- Add any `.txt` files to the `knowledge_base` folder
- Reload anytime using the sidebar button
- Currently loaded with 6 documents (profile, experience, skills, projects, achievements, goals)

### 🤖 Multiple AI Models
The app lets you choose from different AI models:

| Model | Best For |
|-------|----------|
| Mistral 7B Instruct | General-purpose, reliable |
| Zephyr 7B | Conversational, friendly |
| Phi-3 Mini | Fast, efficient |
| Llama 3.2 3B | Meta's latest compact model |
| Gemma 2 2B | Google's lightweight model |

### 🔍 Configurable Retrieval
- **Chunk Size** (500–2000): Controls how big each document piece is
- **Number of Results** (1–5): How many relevant pieces to retrieve

### 📄 Source Citations
Every answer includes an expandable section showing exactly which document fragments were used — so you can verify the answer.

### ⚡ 100% Free
All processing happens via HuggingFace's free Inference API — no paid subscriptions or expensive GPU hardware needed.

### 💬 Chat History
The app remembers your conversation, so you can ask follow-up questions naturally.

---

## 5. Technology Stack

| Component | Technology | Role |
|-----------|-----------|------|
| User Interface | **Streamlit** | Creates the web-based chat interface |
| Document Loading | **LangChain** | Reads and processes text files |
| Text Splitting | **RecursiveCharacterTextSplitter** | Breaks documents into chunks intelligently |
| Embeddings | **HuggingFace API** (e.g., all-MiniLM-L6-v2) | Converts text into numerical representations |
| Vector Database | **FAISS** (Facebook AI Similarity Search) | Stores and searches embeddings efficiently |
| Answer Generation | **HuggingFace Inference API** | Runs the LLM to generate answers |
| Environment Mgmt | **python-dotenv** | Manages configuration securely |

---

## 6. How to Use the Application

### First-Time Setup
1. **Get a free HuggingFace account** at [huggingface.co](https://huggingface.co/join)
2. **Create a token** at [Settings → Tokens](https://huggingface.co/settings/tokens)
   - Choose **"Fine-grained"** type
   - Enable **"Make calls to Inference Providers"**
3. **Install dependencies**: `pip install -r requirements.txt`
4. **Add documents** to the `knowledge_base/` folder

### Running the App
```
streamlit run app.py
```
Then open `http://localhost:8501` in your browser.

### Asking Questions
1. Paste your HuggingFace token in the sidebar
2. Wait for the knowledge base to load (green ✅ confirmation)
3. Type your question in the chat box
4. View the AI-generated answer and optionally expand source documents

---

## 7. Project File Structure

```
ApplicationTest1/
├── app.py                  ← Main application (320 lines)
├── requirements.txt        ← Python package dependencies
├── .env                    ← Stores your HuggingFace token
├── README.md               ← Quick-start guide
├── knowledge_base/         ← Your documents go here
│   ├── profile.txt
│   ├── experience.txt
│   ├── skills.txt
│   ├── projects.txt
│   ├── achievements.txt
│   └── goals.txt
└── venv_rag/               ← Python virtual environment
```

---

## 8. Error Handling

The application includes user-friendly error handling:

| Error | What It Means | Solution |
|-------|--------------|----------|
| **403 Forbidden** | Token doesn't have correct permissions | Recreate token with "Inference Providers" enabled |
| **Model loading** | AI model is starting up on the server | Wait 20–30 seconds and retry |
| **No documents found** | Knowledge base folder is empty | Add `.txt` files and reload |
| **Embedding error** | Issue converting text to numbers | Check token and selected model |

---

## 9. Key Takeaways

> [!TIP]
> **Why RAG matters:** Traditional AI models can "hallucinate" — make up information. RAG solves this by grounding AI answers in your actual documents, making it far more reliable for business and academic use.

- **RAG = Search + AI** — Combines document retrieval with AI generation
- **Your data stays private** — Documents are processed in your session only
- **Completely free** — No paid APIs, no GPU required
- **Customizable** — Swap models, tune chunk sizes, change the knowledge base anytime
- **Transparent** — Always shows which sources were used for each answer

---

*Report prepared for Makers Lab, SPJIMR — Term 3*