YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
MenoChat - Menstrual & Menopausal Health Chatbot
A Bangla conversational AI assistant ("আপু") designed to provide health education for Bangladeshi women on menstrual and menopausal health topics.
🏗️ Architecture
MenoChat uses:
- LLM: Locally fine-tuned Llama 3.2 3B model (TituLM) with LoRA adapter
- RAG: FAISS vector database with BGE-M3 embeddings and reranking
- TTS: VoxCPM-based Bangla text-to-speech model
- UI: Chainlit for conversational interface
📁 Project Structure
MenoChat/
├── chainlit_app_clean.py # Main Chainlit application
├── load_llm.py # LLM model loader
├── load_tts.py # TTS model loader
├── load_vector_db.py # Vector DB and RAG functions
├── requirements.txt # Python dependencies
│
├── llm/ # LLM models (local)
│ ├── titulm/ # Base model directory
│ └── adapter/ # LoRA adapter weights
│
├── embedder/ # Embedding models (local)
│ ├── bge_embed/ # BGE-M3 embeddings
│ └── bge_rerank/ # BGE reranker
│
├── tts/ # TTS models (local)
│ └── meno_tts/ # VoxCPM TTS with LoRA
│ ├── config.json
│ ├── audiovae.pth
│ ├── lora/
│ │ ├── lora_config.json
│ │ └── lora_weights.safetensors
│ └── ...
│
└── vector_db/ # RAG knowledge base
├── faiss.index # FAISS index
├── CHUNKS.jsonl # Document chunks
├── meta.jsonl # Metadata
└── uid_list.txt # UID mapping
🚀 Setup Instructions
1. Prerequisites
- Python: 3.10 or higher
- CUDA: 11.8+ (for GPU acceleration)
- GPU: NVIDIA GPU with 8GB+ VRAM recommended
- Storage: ~15GB for all models
2. Install Dependencies
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
# Install FAISS (CPU or GPU version)
pip install faiss-cpu # For CPU
# OR
pip install faiss-gpu # For GPU
3. Model Setup
All models are expected to be stored locally in their respective directories:
LLM (Llama 3.2 3B + LoRA)
- Base model:
llm/titulm/ - Adapter:
llm/adapter/
If models are not present, download from Hugging Face:
# Base model (if needed)
huggingface-cli download hishab/titulm-llama-3.2-3b-v2.0 --local-dir llm/titulm
# Adapter should already be in llm/adapter/
Embedder & Reranker
- BGE-M3 embeddings:
embedder/bge_embed/ - BGE reranker:
embedder/bge_rerank/
# Download if needed
huggingface-cli download BAAI/bge-m3 --local-dir embedder/bge_embed
huggingface-cli download BAAI/bge-reranker-v2-m3 --local-dir embedder/bge_rerank
TTS (VoxCPM)
- TTS model:
tts/meno_tts/ - Should contain
audiovae.pth,config.json, andlora/directory
Vector Database
- FAISS index and chunks should be in
vector_db/ - Files:
faiss.index,CHUNKS.jsonl,meta.jsonl
4. Configuration
Model paths are defined in chainlit_app_clean.py:
BASE_MODEL_DIR = "llm/titulm"
ADAPTER_DIR = "llm/adapter"
DB_DIR = "vector_db"
EMB_MODEL_DIR = "embedder/bge_embed"
RERANK_MODEL_DIR = "embedder/bge_rerank"
TTS_MODEL_DIR = "tts/meno_tts"
Adjust these if your models are in different locations.
🎯 Running the Application
Start the Chainlit App
chainlit run chainlit_app_clean.py -w
The -w flag enables auto-reload on file changes (useful for development).
Access the UI
Open your browser and navigate to:
http://localhost:8000
💬 Usage
- Text Input: Type your question in Bangla in the chat interface
- Starters: Click on one of the pre-defined starter questions
- Response: The AI will:
- Retrieve relevant context from the knowledge base (RAG)
- Generate an answer using the LLM
- Convert the answer to speech using TTS
- Display text + play audio automatically
Example Questions
- পিরিয়ডের সময় আমার অনেক পেটব্যথা হয়। এটা কীভাবে কমানো যায়?
- আমার পিরিয়ড অনিয়মিত হয়েছে। এটা কি স্বাভাবিক?
- প্যাড, মেনস্ট্রুয়াল কাপ আর ট্যাম্পনের মধ্যে কোনটা সবচেয়ে সুরক্ষিত?
🔧 Customization
Adjust RAG Parameters
In chainlit_app_clean.py, modify the generate_answer() function:
# Retrieve more context
top, ctx = retrieve_then_rerank(embedder, reranker, index, chunks, meta,
question_bn, top_k=5, faiss_top_n=50)
Modify LLM Generation
Adjust generation parameters in generate_answer():
out = ft_model.generate(
**inputs,
max_new_tokens=320, # Increase for longer responses
temperature=0.7, # Add for sampling
do_sample=True, # Enable sampling
repetition_penalty=1.18,
# ... other params
)
TTS Settings
Modify TTS generation in text_to_speech():
audio = tts_model.generate(
text=prompt,
cfg_value=2.0, # Classifier-free guidance
inference_timesteps=10, # Quality vs speed tradeoff
)
🔍 Features
Current Features
- ✅ Text-based conversation in Bangla
- ✅ RAG-based context retrieval with reranking
- ✅ Local LLM inference with LoRA adapter
- ✅ Bangla TTS for responses
- ✅ Pre-defined starter questions
Planned Features
- ⏳ ASR (Automatic Speech Recognition) for voice input
- ⏳ Multi-turn conversation history
- ⏳ User authentication and data persistence
- ⏳ Admin dashboard for knowledge base management
🐛 Troubleshooting
Out of Memory (GPU)
If you encounter CUDA OOM errors:
# In load_llm.py, enable smaller precision
ft_model, ft_tok = FastLanguageModel.from_pretrained(
model_name=base_model_dir,
max_seq_length=2048,
dtype=None,
load_in_4bit=True, # Already enabled
load_in_8bit=False, # Try 8-bit if 4-bit doesn't work
)
FAISS Index Errors
Ensure FAISS index matches the embedding dimension:
# Check index properties
python -c "import faiss; idx = faiss.read_index('vector_db/faiss.index'); print(idx.d, idx.ntotal)"
TTS Not Working
Ensure VoxCPM is installed:
cd tts/meno_tts/VoxCPM pip install -e .Check if LoRA weights exist:
ls tts/meno_tts/lora/ # Should show: lora_config.json, lora_weights.safetensors
Model Loading Takes Too Long
First time loading models will be slow due to disk I/O. Subsequent runs should be faster with caching.
📊 System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 16 GB | 32 GB |
| GPU VRAM | 6 GB | 12 GB+ |
| Storage | 15 GB | 30 GB |
| CUDA | 11.8 | 12.1+ |
📝 License
[Your License Here]
👥 Contributors
[Your name/team]
📧 Contact
For questions or support, contact: [your-email]
Note: This application is for educational purposes only and should not replace professional medical advice.