GAKR-model / README.md

Update README.md

12cba98 verified 19 days ago

5.7 kB

	refer the space - ashok75-gakr.hf.space
	# GAKR AI – Local File‑Aware Chat Assistant

	GAKR AI is a local, privacy‑friendly chat assistant that runs entirely on your machine.
	It combines a FastAPI backend, a modern web chat UI, and a file‑intelligence pipeline that can read and summarize many file types before generating natural‑language responses.

	The assistant itself is text‑only. It never directly sees raw PDFs, images, audio, or videos.
	Instead, specialized tools convert files into structured text summaries, and the language model reasons over that text.

	---

	## ✨ Features

	### 🌐 Web Chat Interface
	- Clean dark UI with message bubbles and typing indicator
	- Auto‑growing input box
	- Attach files from camera, gallery, or filesystem
	- Works in any modern browser at http://localhost:8080

	### 🧠 Text + File Understanding
	- Prompt only → general assistant (explanations, coding help, reasoning)
	- Prompt + files → full analysis pipeline:
	- Detects file type
	- Stores uploads in `dataupload/`
	- Extracts structured facts
	- Feeds extracted context + question to the model

	### 📂 Multi‑File, Multi‑Type Uploads
	Upload multiple files at once:
	- Documents: PDF, DOCX, TXT
	- Tabular data: CSV, Excel, JSON
	- Images: OCR via Tesseract
	- Audio: Speech‑to‑text via Whisper
	- Video: Audio extraction via ffmpeg → Whisper

	### 💾 Persistent Uploads
	- Files saved under `dataupload/` by type
	- Timestamped, safe filenames
	- Automatic directory creation

	### 🔐 Simple Login Reminder UX
	- After 5 guest messages, a popup encourages login
	- Logged‑in users are not interrupted
	- Login state stored in `localStorage`

	---

	## 🗂 Project Structure

	```
	project_root/
	├── run.py # FastAPI backend + template serving
	├── load_model.py # Loads the language model once
	├── generate.py # generate_response() wrapper
	├── file_pipeline.py # File detection, storage, and summarization
	├── templates/
	│ ├── chat.html # Main chat interface
	│ └── auth.html # Login / signup UI
	├── dataupload/ # Created at runtime for uploads
	│ ├── images/
	│ ├── videos/
	│ ├── audio/
	│ ├── documents/
	│ ├── tabular/
	│ └── other/
	└── requirements.txt
	```

	---

	## ⚙️ Installation

	### 1️⃣ Create & Activate Virtual Environment (Recommended)

	```bash
	python -m venv .venv
	source .venv/bin/activate # Linux / macOS
	# or
	.\.venv\Scripts\activate # Windows
	```

	### 2️⃣ Install Python Dependencies

	```bash
	pip install -r requirements.txt
	```

	requirements.txt
	```
	fastapi
	uvicorn[standard]
	python-multipart

	torch
	transformers
	accelerate
	safetensors

	pandas
	numpy

	pdfplumber
	pymupdf
	python-docx

	Pillow
	pytesseract

	openai-whisper
	ffmpeg-python
	```

	### 3️⃣ Install System Tools

	- Tesseract OCR (for image text extraction)
	- ffmpeg (for audio extraction and Whisper)

	Install via OS package manager (`apt`, `brew`, `choco`) or official installers.

	---

	## ▶️ Running GAKR AI

	### Start the Backend

	```bash
	python run.py
	```

	Expected output:
	```
	🚀 Starting GAKR AI Backend...
	✅ Model initialized successfully

	🌐 SERVER & CHAT LOCATION
	🚀 CHAT INTERFACE: http://localhost:8080
	🔧 API DOCUMENTATION: http://localhost:8080/docs
	✅ CHAT.HTML SERVED: templates/chat.html
	```

	### Open the Chat UI
	Navigate to:
	```
	http://localhost:8080
	```

	---

	## 🔌 API Overview

	### POST `/api/analyze`

	Request (`multipart/form-data`)
	- `api_key` (string, required)
	- `prompt` (string, required)
	- `files` (optional, multiple)

	Behavior
	- No files → General assistant mode
	- With files → File‑analysis mode using structured summaries

	Response
	```json
	{
	"response": "natural-language answer here",
	"context": {
	"files": [
	{
	"original_name": "report.pdf",
	"stored_path": "dataupload/documents/20241214_report.pdf",
	"kind": "document",
	"summary": {
	"type": "document",
	"char_count": 12345,
	"preview": "First 4000 characters..."
	}
	}
	]
	},
	"status": "success"
	}
	```

	---

	## 🧪 File Intelligence Pipeline

	Handled by `file_pipeline.py`

	### Type Detection
	- Tabular → CSV, XLSX, JSON
	- Documents → PDF, DOCX, TXT
	- Images → PNG, JPG
	- Audio → MP3, WAV
	- Video → MP4, MKV

	### Summaries
	- Tabular: rows, columns, missing values, stats
	- Documents: character count + preview
	- Images: dimensions + OCR text
	- Audio: duration + transcript preview
	- Video: extracted audio analysis

	Errors are stored per‑file and never crash the whole request.

	---

	## 🎨 Frontend UX Highlights

	- Auto‑growing textarea
	- Attachment chips with remove buttons
	- Typing indicator
	- URL prefill: `?q=your+question`
	- Generic error message for all backend failures

	---

	## 🔐 Security Notes

	- API key is currently a fixed string (for local use)
	- For production:
	- Use environment variables
	- Add real authentication (JWT / sessions)
	- Restrict CORS
	- Apply upload size limits and cleanup policies

	---

	## 🚀 Extending GAKR AI

	Ideas:
	- Per‑user chat & file history (database)
	- Search across uploaded documents
	- External API integrations
	- HTTPS + reverse proxy deployment

	---

	## 🧠 Philosophy

	GAKR AI is an intelligence layer.
	Tools translate reality (files, media, data) into structured language.
	The language model turns that language into insight, reasoning, and action.