maternal-chat / README.md
michsethowusu's picture
Update README.md
ce41796 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Multilingual Twi Health Information System
emoji: 🌍
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 4.3.0
python_version: '3.10'
app_file: app.py

🌍 Multilingual Twi Health Information System

A semantic search system that allows users to ask health-related questions in any language (Twi, Ga, Ewe, Hausa, English, etc.) and receive relevant answers in Twi.

🎯 Key Features

  • 🌐 Multilingual Input: Ask questions in any language
  • πŸ‡¬πŸ‡­ Twi Answers: All responses are provided in Twi
  • πŸ” Semantic Search: Uses E5-Multilingual-Large embeddings for accurate matching
  • ⚑ Fast Retrieval: FAISS-powered search across millions of Q&A pairs
  • πŸ“ Three-Paragraph Format: Presents top 3 most relevant answers

πŸ’‘ How It Works

  1. User Input: Enter a question in any supported language
  2. Embedding: Question is encoded using E5-Multilingual-Large model
  3. Search: FAISS finds the top 3 most similar English questions
  4. Response: Corresponding Twi answers are presented as 3 paragraphs

πŸš€ Quick Start

For Users

Simply visit the Space and:

  1. Type your question in any language
  2. Click "HwehwΙ› | Search"
  3. Receive 3 relevant answers in Twi
  4. Optional: Enable "Show matched questions" to see source questions

For Developers

Step 1: Create Embeddings (Google Colab)

# Upload your CSV with columns: question, answer
# Run the embedding creation script
# Download: faiss_index.bin, metadata.json, config.json

Step 2: Deploy to Hugging Face Spaces

  1. Create a new Gradio Space
  2. Upload required files:
    • app.py
    • requirements.txt
    • faiss_index.bin
    • metadata.json
    • config.json
  3. Space will automatically build and deploy

πŸ“‚ File Structure

.
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ faiss_index.bin       # FAISS index (English questions)
β”œβ”€β”€ metadata.json         # Question-answer pairs
β”œβ”€β”€ config.json           # Model configuration
└── README.md            # This file

πŸ”§ Technical Details

Model Architecture

  • Embedding Model: intfloat/multilingual-e5-large
  • Embedding Dimension: 1024
  • Supported Languages: 100+ languages including:
    • Twi, Ga, Ewe, Hausa
    • English, French, Arabic
    • And many more

Search Configuration

  • Similarity Metric: Cosine Similarity (Inner Product)
  • Index Type: FAISS Flat Index
  • Top-K Results: 3 (configurable)
  • Encoding Strategy:
    • Questions: passage: prefix
    • User queries: query: prefix

Performance

  • Search Speed: < 100ms for millions of records
  • Accuracy: State-of-the-art multilingual semantic matching
  • Scalability: Handles millions of Q&A pairs

πŸ“Š Data Format

Input CSV

question,answer
"Alcohol while breastfeeding?","Nufu yΙ› papa ma wo ba..."
"Ambulance service number?","Ɛfa awoΙ” mu ahohiahia..."

Requirements

  • Columns: question (English), answer (Twi)
  • Format: UTF-8 encoded CSV
  • Size: Tested with 3M+ rows

🌟 Example Queries

Language Question Result
English "What should I do about alcohol while breastfeeding?" βœ… Finds relevant answer
Twi "DΙ›n na menyΙ› fa alcohol ho wΙ” nufunom bere mu?" βœ… Finds relevant answer
Ga "MΙ›nya ambulance service frΙ› nΙ”ma no?" βœ… Finds relevant answer

πŸ” Privacy & Security

  • ❌ No user data is stored
  • βœ… All processing happens in real-time
  • βœ… No tracking or analytics
  • βœ… Open-source and transparent

πŸ› οΈ Customization

Adjust Number of Results

# In app.py, modify:
results = search_answers(query, top_k=5)  # Change from 3 to 5

Change Response Format

# Modify format_response() function in app.py
# Customize how answers are presented

Add New Languages

The system automatically supports 100+ languages through E5-Multilingual-Large. No configuration needed!

πŸ“ˆ Performance Optimization

For large datasets:

  • Use Git LFS for faiss_index.bin (>100MB)
  • Consider quantized FAISS indices for faster search
  • Enable caching for frequently asked questions

🀝 Contributing

Contributions are welcome! Areas for improvement:

  • Add more Q&A pairs
  • Improve answer formatting
  • Add audio input/output
  • Implement feedback mechanism

πŸ“„ License

[Add your license here]

πŸ™ Acknowledgments

  • Model: E5-Multilingual-Large by Microsoft
  • Framework: Sentence Transformers, FAISS, Gradio
  • Data: [Add your data source]

πŸ“ž Support

For issues or questions:

  • Open an issue in the Space discussions
  • Contact: [Your contact information]

Medaase! | Thank you! πŸ‡¬πŸ‡­