Create README.md
Browse files## VisaBuddy β Canada Immigration Assistant
### A retrievalβaugmented, QLoRAβfineβtuned LLM assistant for Canadian immigration guidance
---
### Badges (text)
- **License**: MIT
- **Model (Hugging Face)**: `<your-hf-username>/<your-model-repo>`
- **Tech**: `LLM β’ QLoRA β’ RAG β’ FastAPI β’ React β’ TypeScript β’ TailwindCSS β’ ChromaDB`
---
## Overview
VisaBuddy β Canada Immigration Assistant is an endβtoβend system that combines a custom fineβtuned LLM with a modern web UI to help users explore Canadian immigration programs in a conversational way.
The assistant:
- retrieves information from cleaned IRCC (Immigration, Refugees and Citizenship Canada) documents,
- runs a **RAG (RetrievalβAugmented Generation)** pipeline over a **ChromaDB** vector store,
- and answers questions through a **FastAPI** backend and **React** chat frontend.
> **Important:** VisaBuddy is **not** an official source of immigration advice and is **not** a substitute for a licensed immigration consultant or lawyer. Always verify final decisions and requirements directly with official IRCC resources.
---
## Architecture
Highβlevel flow from user to model:
```text
βββββββββββββ ββββββββββββββββ βββββββββββββ ββββββββββββββββ βββββββββββββββββ
β Browser β ββββΆ β Frontend β ββββΆ β API β ββββΆ β Model + β ββββΆ β Vector DB β
β (User UI) β β (React/TS) β β(FastAPI) β β RAG Core β β (ChromaDB) β
βββββββββββββ ββββββββββββββββ βββββββββββββ ββββββββββββββββ βββββββββββββββββ
β² β
ββββββββββββββββββββββββββ streamed / batched responses βββββββββββββββββββββββββββββββ
```
Expanded data path:
```text
User Question
β
βΌ
React Chat UI
β (POST /v1/chat)
βΌ
FastAPI Backend (Colab, behind Cloudflare Tunnel)
β
βββΊ RAG Retriever
β ββ Chunked IRCC docs
β ββ Embeddings (Sentence Transformers)
β ββ ChromaDB vector search
β
βββΊ Prompt Builder (question + retrieved context)
β
βββΊ QLoRAβfineβtuned LLM (4βbit quantized)
β
βΌ
Formatted Answer β sent back to UI
```
---
## Features
### AI / Modeling
- **Custom LLM fineβtuned with QLoRA**
- Built on a modern base model (e.g. LLaMA 3 / Mistral / Qwen; configurable in code).
- 4βbit quantization for efficient inference.
- Fineβtuned on an **instruction JSONL dataset** derived from IRCC content.
- **RAG (RetrievalβAugmented Generation)**
- Cleans IRCC HTML + PDF documents.
- Chunks long texts into overlapping segments.
- Uses a **Sentence Transformers** embedding model.
- Stores embeddings and metadata in **ChromaDB**.
- Retrieves topβk relevant chunks per query and injects them into prompts.
- **Evaluation**
- Training **loss curve** and **perplexity**.
- Retrieval metrics: **precision@k**, **recall@k**, **MRR**.
- Answerβquality metrics: **factuality**, **hallucination rate**, **groundedness**, **context utilization**.
- Notebook dashboards for quick visual inspection.
### Backend (FastAPI, Colab)
- **FastAPI server** running in **Google Colab**.
- Loads a **model ZIP** from **Google Drive** (or other storage) and extracts it.
- Uses **PEFT** LoRA adapters on top of the base model.
- Exposes an inference endpoint (e.g. `POST /v1/chat`) for the frontend.
- Tunnelled to the public internet via **Cloudflare Tunnel**.
### Frontend (React + TypeScript)
- Modern **chat interface** tailored to visa/immigration workflows.
- Built with **React**, **TypeScript**, and **TailwindCSS**.
- Configurable `VITE_API_URL` pointing to the backend tunnel URL.
- Multiple conversation threads, persistent history and smooth animations.
---
## Tech Stack
### Frontend
- **React**, **TypeScript**, **Vite**
- **TailwindCSS** for styling
- Optional animation and icon libraries
### Backend
- **Python 3.x**
- **FastAPI** (or compatible ASGI framework)
- **Uvicorn** for serving
- Runs inside **Google Colab**
- **Cloudflare Tunnel** for secure public access
### AI / ML
- **Transformers** + **PEFT**
- **QLoRA** (LoRA on quantized weights, 4βbit)
- **Sentence Transformers** for embeddings
- **ChromaDB** as vector database
### Infrastructure / Storage
- **Google Drive** for model ZIP storage
- **Google Colab** GPU runtime
- **Cloudflare Tunnel** for public HTTPS endpoint
---
## How the Model Works
### Data & Preprocessing
- Public IRCC documents (HTML and PDF) are:
- crawled or downloaded,
- parsed into raw text,
- cleaned (removing navigation, styling, boilerplate),
- normalized (whitespace, encoding artifacts).
### Chunking
- Long documents are split into **overlapping chunks**:
- Sentenceβaware segmentation.
- A target chunk size (e.g. 800β1,000 tokens or words).
- Overlap between chunks (e.g. 150β200 tokens/words) to preserve context across boundaries.
### Embeddings & Vector Store
- Each chunk is embedded using a **Sentence Transformers** model.
- Embeddings + metadata (source URL, section, document ID) are stored in **ChromaDB**.
- At query time:
- the user question is embedded,
- **k** nearest chunks are retrieved,
- these chunks become the **context** for RAG.
### QLoRA FineβTuning
- A base model such as **LLaMA 3**, **Mistral**, or **Qwen** (configurable) is used.
- **QLoRA** is applied:
- The base model is loaded in **4βbit** quantized mode.
- Lowβrank adapters (LoRA) are trained on top of frozen base weights.
- Training uses an instruction dataset in JSONL with fields like:
- `instruction`, `input`, `output`, plus metadata.
- Benefits:
- Strong performance with limited GPU memory.
- Safety: itβs easy to discard or retrain adapters without touching base weights.
### Inference Flow
1. User asks a question via the web UI.
2. The frontend sends a request to the FastAPI endpoint.
3. Backend:
- embeds the question,
- retrieves topβk context chunks from ChromaDB,
- builds a prompt that includes:
- system instructions,
- retrieved context,
- the user question.
4. The QLoRAβfineβtuned model generates an answer.
5. The answer is returned to the frontend and displayed in the chat.
---
## Installation & Setup
### 1. Clone the Repository
```bash
git clone https://github.com/<your-username>/<your-repo-name>.git
cd <your-repo-name>
```
---
### 2. Frontend Setup (React + Vite)
#### 2.1 Install Dependencies
```bash
cd frontend # or project root if frontend is at top-level
npm install
```
#### 2.2 Environment Variables
Create a `.env` (or `.env.local`) file in the frontend root:
```bash
VITE_API_URL=<your-cloudflare-tunnel-url>
```
Examples:
- `VITE_API_URL=https://your-tunnel-subdomain.trycloudflare.com`
This URL should point to the public endpoint that fronts your FastAPI server in Colab.
#### 2.3 Run the Dev Server
```bash
npm run dev
```
Open the printed URL in your browser (typically `http://localhost:5173`) to use VisaBuddy.
---
### 3. Backend Setup (Google Colab + FastAPI)
The backend is intended to run inside a **Google Colab** notebook.
#### 3.1 Upload Model ZIP to Google Drive
1. Train or download the QLoRA adapters and tokenizer.
2. Zip the model folder into something like:
- `visa-buddy-model.zip`
3. Upload the ZIP to Google Drive, for example:
- `MyDrive/visa_buddy/visa-buddy-model.zip`
#### 3.2 Mount Google Drive in Colab
In your Colab notebook:
```python
from google.colab import drive
drive.mount("/content/drive")
MODEL_ZIP_PATH = "/content/drive/MyDrive/visa_buddy/visa-buddy-model.zip"
LOCAL_MODEL_DIR = "/content/visa_buddy_model"
```
#### 3.3 Unzip and Load the Model
```python
import os
import zipfile
os.makedirs(LOCAL_MODEL_DIR, exist_ok=True)
with zipfile.ZipFile(MODEL_ZIP_PATH, "r") as zf:
zf.extractall(LOCAL_MODEL_DIR)
```
Then load the base model and LoRA adapters with **Transformers** + **PEFT**:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
BASE_MODEL_NAME = "<base-model-name>" # e.g. meta-llama/Meta-Llama-3-8B-Instruct
ADAPTER_PATH = LOCAL_MODEL_DIR # folder with adapter_config / adapter_model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_NAME,
torch_dtype=torch.float16,
load_in_4bit=True,
device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model.eval()
```
#### 3.4 FastAPI Server
Create a simple FastAPI app in the notebook:
```python
from fastapi import FastAPI
from pydantic import BaseModel
from fastapi.responses import JSONResponse
import uvicorn
app = FastAPI()
class ChatRequest(BaseModel):
message: str
@app .post("/v1/chat")
def chat(req: ChatRequest):
# TODO: run RAG retrieval and LLM generation here
# response_text = generate_answer(req.message)
response_text = "This is a placeholder response."
return JSONResponse({"answer": response_text})
def start_server():
uvicorn.run(app, host="0.0.0.0", port=8000)
```
Start the server (in a background cell or with `nest_asyncio` / separate process).
#### 3.5 Expose Backend via Cloudflare Tunnel
1. Install cloudflared in Colab:
```bash
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!sudo dpkg -i cloudflared-linux-amd64.deb
```
2. Start a tunnel targeting the FastAPI port (8000). Follow Cloudflareβs docs to authenticate and run:
```bash
!cloudflared tunnel --url http://localhost:80
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
metrics:
|
| 6 |
+
- precision
|
| 7 |
+
- accuracy
|
| 8 |
+
base_model:
|
| 9 |
+
- meta-llama/Llama-3.1-8B-Instruct
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
library_name: transformers
|
| 12 |
+
tags:
|
| 13 |
+
- visa
|
| 14 |
+
- canada
|
| 15 |
+
- rag
|
| 16 |
+
- q-lora
|
| 17 |
+
- lora
|
| 18 |
+
- finetuned
|
| 19 |
+
- google-colab
|
| 20 |
+
- fastapi
|
| 21 |
+
- fullstack
|
| 22 |
+
- react
|
| 23 |
+
- tailwind
|
| 24 |
+
- figma
|
| 25 |
+
- ux
|
| 26 |
+
- ui
|
| 27 |
+
- typescript
|
| 28 |
+
- embeddings
|
| 29 |
+
- chromadb
|
| 30 |
+
- ai-assistant
|
| 31 |
+
- instruct
|
| 32 |
+
---
|