Spaces:
Build error
Build error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,92 +1,94 @@
|
|
| 1 |
-
|
| 2 |
-
title: RAGtim Bot - Raktim's AI Assistant
|
| 3 |
-
emoji: π€
|
| 4 |
-
colorFrom: green
|
| 5 |
-
colorTo: blue
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: "4.44.0"
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: mit
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
# π€ RAGtim Bot - Raktim's AI Assistant
|
| 14 |
-
|
| 15 |
-
An intelligent AI assistant powered by Hugging Face Transformers that answers questions about Raktim Mondol's research, expertise, and professional background.
|
| 16 |
-
|
| 17 |
-
## π Features
|
| 18 |
-
|
| 19 |
-
- **Complete Markdown Knowledge Base**: Loads all portfolio content from markdown files
|
| 20 |
-
- **GPU-Accelerated Search**: Uses `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity
|
| 21 |
-
- **Comprehensive Coverage**: Research, publications, skills, experience, education, statistics
|
| 22 |
-
- **API Endpoints**: Direct access to search and statistics
|
| 23 |
-
- **Real-time Chat**: Interactive conversational interface
|
| 24 |
-
|
| 25 |
-
## π Knowledge Base
|
| 26 |
-
|
| 27 |
-
This Space loads comprehensive information from:
|
| 28 |
-
|
| 29 |
-
- **about.md** - Personal information, contact details, professional summary
|
| 30 |
-
- **research_details.md** - Detailed research projects, methodologies, current work
|
| 31 |
-
- **publications_detailed.md** - Complete publication details, technical contributions
|
| 32 |
-
- **skills_expertise.md** - Comprehensive technical skills, tools, frameworks
|
| 33 |
-
- **experience_detailed.md** - Professional experience, teaching, research roles
|
| 34 |
-
- **statistics.md** - Statistical methods, biostatistics expertise, methodologies
|
| 35 |
-
|
| 36 |
-
## π What You Can Ask
|
| 37 |
-
|
| 38 |
-
- Research projects and methodologies
|
| 39 |
-
- Publications with technical details
|
| 40 |
-
- Technical skills and programming expertise
|
| 41 |
-
- Educational background and achievements
|
| 42 |
-
- Professional experience and teaching roles
|
| 43 |
-
- Statistical methods and biostatistics applications
|
| 44 |
-
- Awards, recognition, and professional development
|
| 45 |
-
- Contact information and collaboration opportunities
|
| 46 |
-
|
| 47 |
-
## π API Usage
|
| 48 |
|
| 49 |
-
|
| 50 |
-
```python
|
| 51 |
-
import requests
|
| 52 |
|
| 53 |
-
|
| 54 |
-
"https://raktimhugging-ragtim-bot.hf.space/api/search",
|
| 55 |
-
json={"query": "What is Raktim's research about?", "top_k": 5}
|
| 56 |
-
)
|
| 57 |
-
results = response.json()
|
| 58 |
-
```
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
-
|
| 71 |
-
-
|
| 72 |
-
-
|
| 73 |
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
-
- Portfolio websites for intelligent chat assistance
|
| 78 |
-
- Research collaboration platforms
|
| 79 |
-
- Academic networking tools
|
| 80 |
-
- Professional inquiry systems
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
-
|
| 85 |
-
- **Email**: r.mondol@unsw.edu.au
|
| 86 |
-
- **Portfolio**: [mondol.me](https://mondol.me)
|
| 87 |
-
- **Institution**: UNSW Sydney, School of Computer Science & Engineering
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
-
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π₯ Hybrid Search RAGtim Bot
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval.
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
## π Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
- **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking
|
| 8 |
+
- **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion
|
| 9 |
+
- **Real-time API**: RESTful endpoints for integration
|
| 10 |
+
- **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics
|
| 11 |
+
- **Knowledge Base**: Comprehensive markdown-based knowledge system
|
| 12 |
+
|
| 13 |
+
## π§ Technology Stack
|
| 14 |
+
|
| 15 |
+
- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
|
| 16 |
+
- **Search**: Custom BM25 implementation + Vector similarity
|
| 17 |
+
- **Framework**: Gradio 4.44.0
|
| 18 |
+
- **ML**: Transformers, PyTorch, NumPy
|
| 19 |
+
- **Deployment**: Hugging Face Spaces
|
| 20 |
+
|
| 21 |
+
## π Knowledge Base Structure
|
| 22 |
+
|
| 23 |
+
The system processes markdown files from the `knowledge_base/` directory:
|
| 24 |
+
- `about.md` - Personal information and professional summary
|
| 25 |
+
- `research_details.md` - Research projects and methodologies
|
| 26 |
+
- `publications_detailed.md` - Publications with technical details
|
| 27 |
+
- `skills_expertise.md` - Technical skills and expertise
|
| 28 |
+
- `experience_detailed.md` - Professional experience
|
| 29 |
+
- `statistics.md` - Statistical methods and biostatistics
|
| 30 |
+
|
| 31 |
+
## π Search Methods
|
| 32 |
|
| 33 |
+
### Hybrid Search (Recommended)
|
| 34 |
+
Combines semantic and keyword search with configurable weights:
|
| 35 |
+
- Default: 60% vector + 40% BM25
|
| 36 |
+
- Optimal for most queries
|
| 37 |
+
- Balances meaning and exact term matching
|
| 38 |
|
| 39 |
+
### Vector Search
|
| 40 |
+
Pure semantic similarity using transformer embeddings:
|
| 41 |
+
- Best for conceptual questions
|
| 42 |
+
- Finds semantically related content
|
| 43 |
+
- Language-agnostic similarity
|
| 44 |
|
| 45 |
+
### BM25 Search
|
| 46 |
+
Traditional keyword-based ranking:
|
| 47 |
+
- Excellent for specific terms
|
| 48 |
+
- TF-IDF with document length normalization
|
| 49 |
+
- Fast and interpretable
|
| 50 |
|
| 51 |
+
## π οΈ API Endpoints
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
### Search API
|
| 54 |
+
GET /api/stats
|
| 55 |
+
|
| 56 |
+
## π Configuration
|
| 57 |
+
|
| 58 |
+
Key parameters in `config.py`:
|
| 59 |
+
- `BM25_K1 = 1.5` - Term frequency saturation
|
| 60 |
+
- `BM25_B = 0.75` - Document length normalization
|
| 61 |
+
- `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting
|
| 62 |
+
- `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting
|
| 63 |
+
|
| 64 |
+
## π Deployment
|
| 65 |
+
|
| 66 |
+
1. Clone to Hugging Face Spaces
|
| 67 |
+
2. Ensure all markdown files are in `knowledge_base/`
|
| 68 |
+
3. The system auto-initializes on startup
|
| 69 |
+
4. Access via the provided Space URL
|
| 70 |
|
| 71 |
+
## π‘ Usage Examples
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
**Chat Interface:**
|
| 74 |
+
- "What is Raktim's LLM research?"
|
| 75 |
+
- "Tell me about statistical methods"
|
| 76 |
+
- "Describe multimodal AI capabilities"
|
| 77 |
+
|
| 78 |
+
**Advanced Search:**
|
| 79 |
+
- Adjust vector/BM25 weights
|
| 80 |
+
- Compare search methods
|
| 81 |
+
- Fine-tune result count
|
| 82 |
+
|
| 83 |
+
**API Integration:**
|
| 84 |
+
```python
|
| 85 |
+
import requests
|
| 86 |
|
| 87 |
+
response = requests.get(
|
| 88 |
+
"https://your-space.hf.space/api/search",
|
| 89 |
+
params={
|
| 90 |
+
"query": "machine learning research",
|
| 91 |
+
"top_k": 5,
|
| 92 |
+
"search_type": "hybrid"
|
| 93 |
+
}
|
| 94 |
+
)
|