YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

MenoChat - Menstrual & Menopausal Health Chatbot

A Bangla conversational AI assistant ("আপু") designed to provide health education for Bangladeshi women on menstrual and menopausal health topics.

🏗️ Architecture

MenoChat uses:

LLM: Locally fine-tuned Llama 3.2 3B model (TituLM) with LoRA adapter
RAG: FAISS vector database with BGE-M3 embeddings and reranking
TTS: VoxCPM-based Bangla text-to-speech model
UI: Chainlit for conversational interface

📁 Project Structure

MenoChat/
├── chainlit_app_clean.py       # Main Chainlit application
├── load_llm.py                 # LLM model loader
├── load_tts.py                 # TTS model loader
├── load_vector_db.py           # Vector DB and RAG functions
├── requirements.txt            # Python dependencies
│
├── llm/                        # LLM models (local)
│   ├── titulm/                 # Base model directory
│   └── adapter/                # LoRA adapter weights
│
├── embedder/                   # Embedding models (local)
│   ├── bge_embed/              # BGE-M3 embeddings
│   └── bge_rerank/             # BGE reranker
│
├── tts/                        # TTS models (local)
│   └── meno_tts/               # VoxCPM TTS with LoRA
│       ├── config.json
│       ├── audiovae.pth
│       ├── lora/
│       │   ├── lora_config.json
│       │   └── lora_weights.safetensors
│       └── ...
│
└── vector_db/                  # RAG knowledge base
    ├── faiss.index             # FAISS index
    ├── CHUNKS.jsonl            # Document chunks
    ├── meta.jsonl              # Metadata
    └── uid_list.txt            # UID mapping

🚀 Setup Instructions

1. Prerequisites

Python: 3.10 or higher
CUDA: 11.8+ (for GPU acceleration)
GPU: NVIDIA GPU with 8GB+ VRAM recommended
Storage: ~15GB for all models

2. Install Dependencies

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Install FAISS (CPU or GPU version)
pip install faiss-cpu  # For CPU
# OR
pip install faiss-gpu  # For GPU

3. Model Setup

All models are expected to be stored locally in their respective directories:

LLM (Llama 3.2 3B + LoRA)

Base model: llm/titulm/
Adapter: llm/adapter/

If models are not present, download from Hugging Face:

# Base model (if needed)
huggingface-cli download hishab/titulm-llama-3.2-3b-v2.0 --local-dir llm/titulm

# Adapter should already be in llm/adapter/

Embedder & Reranker

BGE-M3 embeddings: embedder/bge_embed/
BGE reranker: embedder/bge_rerank/

# Download if needed
huggingface-cli download BAAI/bge-m3 --local-dir embedder/bge_embed
huggingface-cli download BAAI/bge-reranker-v2-m3 --local-dir embedder/bge_rerank

TTS (VoxCPM)

TTS model: tts/meno_tts/
Should contain audiovae.pth, config.json, and lora/ directory

Vector Database

FAISS index and chunks should be in vector_db/
Files: faiss.index, CHUNKS.jsonl, meta.jsonl

4. Configuration

Model paths are defined in chainlit_app_clean.py:

BASE_MODEL_DIR = "llm/titulm"
ADAPTER_DIR = "llm/adapter"      
DB_DIR = "vector_db"   
EMB_MODEL_DIR = "embedder/bge_embed"
RERANK_MODEL_DIR = "embedder/bge_rerank"
TTS_MODEL_DIR = "tts/meno_tts"

Adjust these if your models are in different locations.

🎯 Running the Application

Start the Chainlit App

chainlit run chainlit_app_clean.py -w

The -w flag enables auto-reload on file changes (useful for development).

Access the UI

Open your browser and navigate to:

http://localhost:8000

💬 Usage

Text Input: Type your question in Bangla in the chat interface
Starters: Click on one of the pre-defined starter questions
Response: The AI will:
- Retrieve relevant context from the knowledge base (RAG)
- Generate an answer using the LLM
- Convert the answer to speech using TTS
- Display text + play audio automatically

Example Questions

পিরিয়ডের সময় আমার অনেক পেটব্যথা হয়। এটা কীভাবে কমানো যায়?
আমার পিরিয়ড অনিয়মিত হয়েছে। এটা কি স্বাভাবিক?
প্যাড, মেনস্ট্রুয়াল কাপ আর ট্যাম্পনের মধ্যে কোনটা সবচেয়ে সুরক্ষিত?

🔧 Customization

Adjust RAG Parameters

In chainlit_app_clean.py, modify the generate_answer() function:

# Retrieve more context
top, ctx = retrieve_then_rerank(embedder, reranker, index, chunks, meta, 
                                question_bn, top_k=5, faiss_top_n=50)

Modify LLM Generation

Adjust generation parameters in generate_answer():

out = ft_model.generate(
    **inputs,
    max_new_tokens=320,        # Increase for longer responses
    temperature=0.7,           # Add for sampling
    do_sample=True,            # Enable sampling
    repetition_penalty=1.18,
    # ... other params
)

TTS Settings

Modify TTS generation in text_to_speech():

audio = tts_model.generate(
    text=prompt,
    cfg_value=2.0,            # Classifier-free guidance
    inference_timesteps=10,   # Quality vs speed tradeoff
)

🔍 Features

Current Features

✅ Text-based conversation in Bangla
✅ RAG-based context retrieval with reranking
✅ Local LLM inference with LoRA adapter
✅ Bangla TTS for responses
✅ Pre-defined starter questions

Planned Features

⏳ ASR (Automatic Speech Recognition) for voice input
⏳ Multi-turn conversation history
⏳ User authentication and data persistence
⏳ Admin dashboard for knowledge base management

🐛 Troubleshooting

Out of Memory (GPU)

If you encounter CUDA OOM errors:

# In load_llm.py, enable smaller precision
ft_model, ft_tok = FastLanguageModel.from_pretrained(
    model_name=base_model_dir,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,  # Already enabled
    load_in_8bit=False,  # Try 8-bit if 4-bit doesn't work
)

FAISS Index Errors

Ensure FAISS index matches the embedding dimension:

# Check index properties
python -c "import faiss; idx = faiss.read_index('vector_db/faiss.index'); print(idx.d, idx.ntotal)"

TTS Not Working

Ensure VoxCPM is installed:

cd tts/meno_tts/VoxCPM
pip install -e .

Check if LoRA weights exist:

ls tts/meno_tts/lora/
# Should show: lora_config.json, lora_weights.safetensors

Model Loading Takes Too Long

First time loading models will be slow due to disk I/O. Subsequent runs should be faster with caching.

📊 System Requirements

Component	Minimum	Recommended
RAM	16 GB	32 GB
GPU VRAM	6 GB	12 GB+
Storage	15 GB	30 GB
CUDA	11.8	12.1+

📝 License

[Your License Here]

👥 Contributors

[Your name/team]

📧 Contact

For questions or support, contact: [your-email]

Note: This application is for educational purposes only and should not replace professional medical advice.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support