kerdos-llm-rag-api / README.md
Bhaskar Ram
feat: Kerdos AI RAG API v1.0
b1a3dce
metadata
title: Kerdos AI  Custom LLM RAG API
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
tags:
  - rag
  - document-qa
  - fastapi
  - llama
  - faiss
  - nlp
  - question-answering
  - kerdos
  - private-llm
  - api

🤖 Kerdos AI — Custom LLM RAG API

A REST API by Kerdos Infrasoft Private Limited Upload documents. Ask questions. Get answers — strictly grounded in your data.


✨ Features

📄 Multi-format PDF, DOCX, TXT, MD, CSV
🧠 LLM meta-llama/Llama-3.1-8B-Instruct via HF Inference Router
🔒 Grounded Answers only from your uploaded documents
💬 Multi-turn Conversation history per session
Fast all-MiniLM-L6-v2 + FAISS in-memory
🔑 Session-based Each client gets an isolated FAISS index

📡 API Reference

Interactive docs → /docs (Swagger UI)

Method Path Description
POST /sessions Create a session → get session_id
GET /sessions/{id} Session status
DELETE /sessions/{id} Delete session
POST /sessions/{id}/documents Upload & index files
POST /sessions/{id}/chat Ask a question
DELETE /sessions/{id}/history Clear chat history
GET /health Health check

🔁 Typical Workflow

BASE=https://kerdosdotio-kerdos-llm-rag-api.hf.space

# 1. Create session
curl -X POST $BASE/sessions

# 2. Upload a document
curl -X POST "$BASE/sessions/{session_id}/documents" \
  -F "files=@your_doc.pdf"

# 3. Ask a question
curl -X POST "$BASE/sessions/{session_id}/chat" \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarise this document", "hf_token": "hf_..."}'

⚙️ Environment / Secrets

Set these in Settings → Variables and secrets of this Space:

Secret Description
HF_TOKEN Your HuggingFace token (Write access + Llama 3.1 licence accepted)
SESSION_TTL_MINUTES Session expiry (default: 60)
MAX_UPLOAD_MB Max upload size in MB (default: 50)

🏗️ Architecture

FastAPI (api.py)
  ├── SessionStore — UUID sessions, TTL, per-session lock
  └── RAGSession
        ├── parse_file()       — PDF/DOCX/TXT/CSV
        ├── chunk_text()       — 512-char chunks, 64 overlap
        ├── all-MiniLM-L6-v2   — embeddings
        ├── FAISS              — in-memory vector search
        └── call_llm()         — HF Router → Llama 3.1 8B

💼 Enterprise Edition

Interested in private, on-premise deployment?

  • 🔒 Private LLM Hosting
  • 🎛️ Custom Model Fine-tuning
  • 🛡️ Data Privacy Guarantees
  • 🏷️ White-label Deployments

📧 partnership@kerdos.in | 🌐 kerdos.in/contact


© 2024–2025 Kerdos Infrasoft Private Limited | Bengaluru, Karnataka, India