| refer the space - ashok75-gakr.hf.space | |
| # GAKR AI – Local File‑Aware Chat Assistant | |
| GAKR AI is a **local, privacy‑friendly chat assistant** that runs entirely on your machine. | |
| It combines a **FastAPI backend**, a modern **web chat UI**, and a **file‑intelligence pipeline** that can read and summarize many file types before generating natural‑language responses. | |
| The assistant itself is **text‑only**. It never directly sees raw PDFs, images, audio, or videos. | |
| Instead, specialized tools convert files into **structured text summaries**, and the language model reasons over that text. | |
| --- | |
| ## ✨ Features | |
| ### 🌐 Web Chat Interface | |
| - Clean dark UI with message bubbles and typing indicator | |
| - Auto‑growing input box | |
| - Attach files from camera, gallery, or filesystem | |
| - Works in any modern browser at **http://localhost:8080** | |
| ### 🧠 Text + File Understanding | |
| - **Prompt only** → general assistant (explanations, coding help, reasoning) | |
| - **Prompt + files** → full analysis pipeline: | |
| - Detects file type | |
| - Stores uploads in `dataupload/` | |
| - Extracts structured facts | |
| - Feeds extracted context + question to the model | |
| ### 📂 Multi‑File, Multi‑Type Uploads | |
| Upload multiple files at once: | |
| - Documents: PDF, DOCX, TXT | |
| - Tabular data: CSV, Excel, JSON | |
| - Images: OCR via Tesseract | |
| - Audio: Speech‑to‑text via Whisper | |
| - Video: Audio extraction via ffmpeg → Whisper | |
| ### 💾 Persistent Uploads | |
| - Files saved under `dataupload/` by type | |
| - Timestamped, safe filenames | |
| - Automatic directory creation | |
| ### 🔐 Simple Login Reminder UX | |
| - After **5 guest messages**, a popup encourages login | |
| - Logged‑in users are not interrupted | |
| - Login state stored in `localStorage` | |
| --- | |
| ## 🗂 Project Structure | |
| ``` | |
| project_root/ | |
| ├── run.py # FastAPI backend + template serving | |
| ├── load_model.py # Loads the language model once | |
| ├── generate.py # generate_response() wrapper | |
| ├── file_pipeline.py # File detection, storage, and summarization | |
| ├── templates/ | |
| │ ├── chat.html # Main chat interface | |
| │ └── auth.html # Login / signup UI | |
| ├── dataupload/ # Created at runtime for uploads | |
| │ ├── images/ | |
| │ ├── videos/ | |
| │ ├── audio/ | |
| │ ├── documents/ | |
| │ ├── tabular/ | |
| │ └── other/ | |
| └── requirements.txt | |
| ``` | |
| --- | |
| ## ⚙️ Installation | |
| ### 1️⃣ Create & Activate Virtual Environment (Recommended) | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate # Linux / macOS | |
| # or | |
| .\.venv\Scripts\activate # Windows | |
| ``` | |
| ### 2️⃣ Install Python Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| **requirements.txt** | |
| ``` | |
| fastapi | |
| uvicorn[standard] | |
| python-multipart | |
| torch | |
| transformers | |
| accelerate | |
| safetensors | |
| pandas | |
| numpy | |
| pdfplumber | |
| pymupdf | |
| python-docx | |
| Pillow | |
| pytesseract | |
| openai-whisper | |
| ffmpeg-python | |
| ``` | |
| ### 3️⃣ Install System Tools | |
| - **Tesseract OCR** (for image text extraction) | |
| - **ffmpeg** (for audio extraction and Whisper) | |
| Install via OS package manager (`apt`, `brew`, `choco`) or official installers. | |
| --- | |
| ## ▶️ Running GAKR AI | |
| ### Start the Backend | |
| ```bash | |
| python run.py | |
| ``` | |
| Expected output: | |
| ``` | |
| 🚀 Starting GAKR AI Backend... | |
| ✅ Model initialized successfully | |
| 🌐 SERVER & CHAT LOCATION | |
| 🚀 CHAT INTERFACE: http://localhost:8080 | |
| 🔧 API DOCUMENTATION: http://localhost:8080/docs | |
| ✅ CHAT.HTML SERVED: templates/chat.html | |
| ``` | |
| ### Open the Chat UI | |
| Navigate to: | |
| ``` | |
| http://localhost:8080 | |
| ``` | |
| --- | |
| ## 🔌 API Overview | |
| ### POST `/api/analyze` | |
| **Request** (`multipart/form-data`) | |
| - `api_key` (string, required) | |
| - `prompt` (string, required) | |
| - `files` (optional, multiple) | |
| **Behavior** | |
| - No files → General assistant mode | |
| - With files → File‑analysis mode using structured summaries | |
| **Response** | |
| ```json | |
| { | |
| "response": "natural-language answer here", | |
| "context": { | |
| "files": [ | |
| { | |
| "original_name": "report.pdf", | |
| "stored_path": "dataupload/documents/20241214_report.pdf", | |
| "kind": "document", | |
| "summary": { | |
| "type": "document", | |
| "char_count": 12345, | |
| "preview": "First 4000 characters..." | |
| } | |
| } | |
| ] | |
| }, | |
| "status": "success" | |
| } | |
| ``` | |
| --- | |
| ## 🧪 File Intelligence Pipeline | |
| Handled by `file_pipeline.py` | |
| ### Type Detection | |
| - Tabular → CSV, XLSX, JSON | |
| - Documents → PDF, DOCX, TXT | |
| - Images → PNG, JPG | |
| - Audio → MP3, WAV | |
| - Video → MP4, MKV | |
| ### Summaries | |
| - **Tabular**: rows, columns, missing values, stats | |
| - **Documents**: character count + preview | |
| - **Images**: dimensions + OCR text | |
| - **Audio**: duration + transcript preview | |
| - **Video**: extracted audio analysis | |
| Errors are stored per‑file and never crash the whole request. | |
| --- | |
| ## 🎨 Frontend UX Highlights | |
| - Auto‑growing textarea | |
| - Attachment chips with remove buttons | |
| - Typing indicator | |
| - URL prefill: `?q=your+question` | |
| - Generic error message for all backend failures | |
| --- | |
| ## 🔐 Security Notes | |
| - API key is currently a fixed string (for local use) | |
| - For production: | |
| - Use environment variables | |
| - Add real authentication (JWT / sessions) | |
| - Restrict CORS | |
| - Apply upload size limits and cleanup policies | |
| --- | |
| ## 🚀 Extending GAKR AI | |
| Ideas: | |
| - Per‑user chat & file history (database) | |
| - Search across uploaded documents | |
| - External API integrations | |
| - HTTPS + reverse proxy deployment | |
| --- | |
| ## 🧠 Philosophy | |
| **GAKR AI is an intelligence layer.** | |
| Tools translate reality (files, media, data) into structured language. | |
| The language model turns that language into insight, reasoning, and action. | |