YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

MenoChat - Menstrual & Menopausal Health Chatbot

A Bangla conversational AI assistant ("আপু") designed to provide health education for Bangladeshi women on menstrual and menopausal health topics.

🏗️ Architecture

MenoChat uses:

  • LLM: Locally fine-tuned Llama 3.2 3B model (TituLM) with LoRA adapter
  • RAG: FAISS vector database with BGE-M3 embeddings and reranking
  • TTS: VoxCPM-based Bangla text-to-speech model
  • UI: Chainlit for conversational interface

📁 Project Structure

MenoChat/
├── chainlit_app_clean.py       # Main Chainlit application
├── load_llm.py                 # LLM model loader
├── load_tts.py                 # TTS model loader
├── load_vector_db.py           # Vector DB and RAG functions
├── requirements.txt            # Python dependencies
│
├── llm/                        # LLM models (local)
│   ├── titulm/                 # Base model directory
│   └── adapter/                # LoRA adapter weights
│
├── embedder/                   # Embedding models (local)
│   ├── bge_embed/              # BGE-M3 embeddings
│   └── bge_rerank/             # BGE reranker
│
├── tts/                        # TTS models (local)
│   └── meno_tts/               # VoxCPM TTS with LoRA
│       ├── config.json
│       ├── audiovae.pth
│       ├── lora/
│       │   ├── lora_config.json
│       │   └── lora_weights.safetensors
│       └── ...
│
└── vector_db/                  # RAG knowledge base
    ├── faiss.index             # FAISS index
    ├── CHUNKS.jsonl            # Document chunks
    ├── meta.jsonl              # Metadata
    └── uid_list.txt            # UID mapping

🚀 Setup Instructions

1. Prerequisites

  • Python: 3.10 or higher
  • CUDA: 11.8+ (for GPU acceleration)
  • GPU: NVIDIA GPU with 8GB+ VRAM recommended
  • Storage: ~15GB for all models

2. Install Dependencies

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Install FAISS (CPU or GPU version)
pip install faiss-cpu  # For CPU
# OR
pip install faiss-gpu  # For GPU

3. Model Setup

All models are expected to be stored locally in their respective directories:

LLM (Llama 3.2 3B + LoRA)

  • Base model: llm/titulm/
  • Adapter: llm/adapter/

If models are not present, download from Hugging Face:

# Base model (if needed)
huggingface-cli download hishab/titulm-llama-3.2-3b-v2.0 --local-dir llm/titulm

# Adapter should already be in llm/adapter/

Embedder & Reranker

  • BGE-M3 embeddings: embedder/bge_embed/
  • BGE reranker: embedder/bge_rerank/
# Download if needed
huggingface-cli download BAAI/bge-m3 --local-dir embedder/bge_embed
huggingface-cli download BAAI/bge-reranker-v2-m3 --local-dir embedder/bge_rerank

TTS (VoxCPM)

  • TTS model: tts/meno_tts/
  • Should contain audiovae.pth, config.json, and lora/ directory

Vector Database

  • FAISS index and chunks should be in vector_db/
  • Files: faiss.index, CHUNKS.jsonl, meta.jsonl

4. Configuration

Model paths are defined in chainlit_app_clean.py:

BASE_MODEL_DIR = "llm/titulm"
ADAPTER_DIR = "llm/adapter"      
DB_DIR = "vector_db"   
EMB_MODEL_DIR = "embedder/bge_embed"
RERANK_MODEL_DIR = "embedder/bge_rerank"
TTS_MODEL_DIR = "tts/meno_tts"

Adjust these if your models are in different locations.

🎯 Running the Application

Start the Chainlit App

chainlit run chainlit_app_clean.py -w

The -w flag enables auto-reload on file changes (useful for development).

Access the UI

Open your browser and navigate to:

http://localhost:8000

💬 Usage

  1. Text Input: Type your question in Bangla in the chat interface
  2. Starters: Click on one of the pre-defined starter questions
  3. Response: The AI will:
    • Retrieve relevant context from the knowledge base (RAG)
    • Generate an answer using the LLM
    • Convert the answer to speech using TTS
    • Display text + play audio automatically

Example Questions

  • পিরিয়ডের সময় আমার অনেক পেটব্যথা হয়। এটা কীভাবে কমানো যায়?
  • আমার পিরিয়ড অনিয়মিত হয়েছে। এটা কি স্বাভাবিক?
  • প্যাড, মেনস্ট্রুয়াল কাপ আর ট্যাম্পনের মধ্যে কোনটা সবচেয়ে সুরক্ষিত?

🔧 Customization

Adjust RAG Parameters

In chainlit_app_clean.py, modify the generate_answer() function:

# Retrieve more context
top, ctx = retrieve_then_rerank(embedder, reranker, index, chunks, meta, 
                                question_bn, top_k=5, faiss_top_n=50)

Modify LLM Generation

Adjust generation parameters in generate_answer():

out = ft_model.generate(
    **inputs,
    max_new_tokens=320,        # Increase for longer responses
    temperature=0.7,           # Add for sampling
    do_sample=True,            # Enable sampling
    repetition_penalty=1.18,
    # ... other params
)

TTS Settings

Modify TTS generation in text_to_speech():

audio = tts_model.generate(
    text=prompt,
    cfg_value=2.0,            # Classifier-free guidance
    inference_timesteps=10,   # Quality vs speed tradeoff
)

🔍 Features

Current Features

  • ✅ Text-based conversation in Bangla
  • ✅ RAG-based context retrieval with reranking
  • ✅ Local LLM inference with LoRA adapter
  • ✅ Bangla TTS for responses
  • ✅ Pre-defined starter questions

Planned Features

  • ⏳ ASR (Automatic Speech Recognition) for voice input
  • ⏳ Multi-turn conversation history
  • ⏳ User authentication and data persistence
  • ⏳ Admin dashboard for knowledge base management

🐛 Troubleshooting

Out of Memory (GPU)

If you encounter CUDA OOM errors:

# In load_llm.py, enable smaller precision
ft_model, ft_tok = FastLanguageModel.from_pretrained(
    model_name=base_model_dir,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,  # Already enabled
    load_in_8bit=False,  # Try 8-bit if 4-bit doesn't work
)

FAISS Index Errors

Ensure FAISS index matches the embedding dimension:

# Check index properties
python -c "import faiss; idx = faiss.read_index('vector_db/faiss.index'); print(idx.d, idx.ntotal)"

TTS Not Working

  1. Ensure VoxCPM is installed:

    cd tts/meno_tts/VoxCPM
    pip install -e .
    
  2. Check if LoRA weights exist:

    ls tts/meno_tts/lora/
    # Should show: lora_config.json, lora_weights.safetensors
    

Model Loading Takes Too Long

First time loading models will be slow due to disk I/O. Subsequent runs should be faster with caching.

📊 System Requirements

Component Minimum Recommended
RAM 16 GB 32 GB
GPU VRAM 6 GB 12 GB+
Storage 15 GB 30 GB
CUDA 11.8 12.1+

📝 License

[Your License Here]

👥 Contributors

[Your name/team]

📧 Contact

For questions or support, contact: [your-email]


Note: This application is for educational purposes only and should not replace professional medical advice.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support