GAKR-model / README.md
Ashok75's picture
Update README.md
12cba98 verified
refer the space - ashok75-gakr.hf.space
# GAKR AI – Local File‑Aware Chat Assistant
GAKR AI is a **local, privacy‑friendly chat assistant** that runs entirely on your machine.
It combines a **FastAPI backend**, a modern **web chat UI**, and a **file‑intelligence pipeline** that can read and summarize many file types before generating natural‑language responses.
The assistant itself is **text‑only**. It never directly sees raw PDFs, images, audio, or videos.
Instead, specialized tools convert files into **structured text summaries**, and the language model reasons over that text.
---
## ✨ Features
### 🌐 Web Chat Interface
- Clean dark UI with message bubbles and typing indicator
- Auto‑growing input box
- Attach files from camera, gallery, or filesystem
- Works in any modern browser at **http://localhost:8080**
### 🧠 Text + File Understanding
- **Prompt only** → general assistant (explanations, coding help, reasoning)
- **Prompt + files** → full analysis pipeline:
- Detects file type
- Stores uploads in `dataupload/`
- Extracts structured facts
- Feeds extracted context + question to the model
### 📂 Multi‑File, Multi‑Type Uploads
Upload multiple files at once:
- Documents: PDF, DOCX, TXT
- Tabular data: CSV, Excel, JSON
- Images: OCR via Tesseract
- Audio: Speech‑to‑text via Whisper
- Video: Audio extraction via ffmpeg → Whisper
### 💾 Persistent Uploads
- Files saved under `dataupload/` by type
- Timestamped, safe filenames
- Automatic directory creation
### 🔐 Simple Login Reminder UX
- After **5 guest messages**, a popup encourages login
- Logged‑in users are not interrupted
- Login state stored in `localStorage`
---
## 🗂 Project Structure
```
project_root/
├── run.py # FastAPI backend + template serving
├── load_model.py # Loads the language model once
├── generate.py # generate_response() wrapper
├── file_pipeline.py # File detection, storage, and summarization
├── templates/
│ ├── chat.html # Main chat interface
│ └── auth.html # Login / signup UI
├── dataupload/ # Created at runtime for uploads
│ ├── images/
│ ├── videos/
│ ├── audio/
│ ├── documents/
│ ├── tabular/
│ └── other/
└── requirements.txt
```
---
## ⚙️ Installation
### 1️⃣ Create & Activate Virtual Environment (Recommended)
```bash
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# or
.\.venv\Scripts\activate # Windows
```
### 2️⃣ Install Python Dependencies
```bash
pip install -r requirements.txt
```
**requirements.txt**
```
fastapi
uvicorn[standard]
python-multipart
torch
transformers
accelerate
safetensors
pandas
numpy
pdfplumber
pymupdf
python-docx
Pillow
pytesseract
openai-whisper
ffmpeg-python
```
### 3️⃣ Install System Tools
- **Tesseract OCR** (for image text extraction)
- **ffmpeg** (for audio extraction and Whisper)
Install via OS package manager (`apt`, `brew`, `choco`) or official installers.
---
## ▶️ Running GAKR AI
### Start the Backend
```bash
python run.py
```
Expected output:
```
🚀 Starting GAKR AI Backend...
✅ Model initialized successfully
🌐 SERVER & CHAT LOCATION
🚀 CHAT INTERFACE: http://localhost:8080
🔧 API DOCUMENTATION: http://localhost:8080/docs
✅ CHAT.HTML SERVED: templates/chat.html
```
### Open the Chat UI
Navigate to:
```
http://localhost:8080
```
---
## 🔌 API Overview
### POST `/api/analyze`
**Request** (`multipart/form-data`)
- `api_key` (string, required)
- `prompt` (string, required)
- `files` (optional, multiple)
**Behavior**
- No files → General assistant mode
- With files → File‑analysis mode using structured summaries
**Response**
```json
{
"response": "natural-language answer here",
"context": {
"files": [
{
"original_name": "report.pdf",
"stored_path": "dataupload/documents/20241214_report.pdf",
"kind": "document",
"summary": {
"type": "document",
"char_count": 12345,
"preview": "First 4000 characters..."
}
}
]
},
"status": "success"
}
```
---
## 🧪 File Intelligence Pipeline
Handled by `file_pipeline.py`
### Type Detection
- Tabular → CSV, XLSX, JSON
- Documents → PDF, DOCX, TXT
- Images → PNG, JPG
- Audio → MP3, WAV
- Video → MP4, MKV
### Summaries
- **Tabular**: rows, columns, missing values, stats
- **Documents**: character count + preview
- **Images**: dimensions + OCR text
- **Audio**: duration + transcript preview
- **Video**: extracted audio analysis
Errors are stored per‑file and never crash the whole request.
---
## 🎨 Frontend UX Highlights
- Auto‑growing textarea
- Attachment chips with remove buttons
- Typing indicator
- URL prefill: `?q=your+question`
- Generic error message for all backend failures
---
## 🔐 Security Notes
- API key is currently a fixed string (for local use)
- For production:
- Use environment variables
- Add real authentication (JWT / sessions)
- Restrict CORS
- Apply upload size limits and cleanup policies
---
## 🚀 Extending GAKR AI
Ideas:
- Per‑user chat & file history (database)
- Search across uploaded documents
- External API integrations
- HTTPS + reverse proxy deployment
---
## 🧠 Philosophy
**GAKR AI is an intelligence layer.**
Tools translate reality (files, media, data) into structured language.
The language model turns that language into insight, reasoning, and action.