Spaces:

Jack-ki1
/

chatbot1

Sleeping

App Files Files Community

Jack-ki1 commited on Nov 9, 2025

Commit

00bd2b1

verified ·

1 Parent(s): 25bf556

Upload 16 files

Browse files

Files changed (16) hide show

.streamlit/config.toml +9 -0
README.md +134 -10
apt.txt +3 -0
docker/Dockerfile +60 -0
requirements.txt +6 -0
scripts/list_models.py +76 -0
scripts/test_huggingface.py +116 -0
scripts/troubleshoot.py +263 -0
src/__init__.py +4 -0
src/app.py +439 -0
src/chat_engine.py +228 -0
src/config.py +106 -0
src/models.py +10 -0
src/pdf_export.py +192 -0
src/utils.py +209 -0
tests/test_chat_engine.py +10 -0

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,9 @@

+[theme]
+primaryColor = "#FF4B4B"
+backgroundColor = "#FFFFFF"
+secondaryBackgroundColor = "#F0F2F6"
+textColor = "#262730"
+font = "sans serif"
+[server]
+runOnSave = true

README.md CHANGED Viewed

@@ -1,10 +1,134 @@
----
-title: Chatbot1
-emoji: 👁
-colorFrom: red
-colorTo: pink
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🎓 FINESE SCHOOL: AI Assistant for Data Professionals
+Ask questions on Python, SQL, Power BI, ML, and more — get **accurate, topic-specific answers** with **code examples**.
+✅ Powered by **Gemini**
+✅ Download chat as PDF
+✅ Strict topic enforcement
+✅ Built with **Streamlit** & **LangChain**
+## 🚀 Features
+- **Expert-level responses** from "Dr. Data", our AI mentor with PhDs in CS and Statistics
+- **Topic-specific knowledge** with strict enforcement to keep answers relevant
+- **Code-rich explanations** with runnable examples
+- **Best practices** and common pitfalls highlighted
+- **PDF export** of entire sessions for offline reference
+- **Beautiful UI** with dark/light mode support
+## 🔧 Quick Setup
+1. Copy `.env.example` to `.env`:
+   ```bash
+   cp .env.example .env
+   ```
+   Or on Windows:
+   ```powershell
+   copy .env.example .env
+   ```
+2. Edit `.env` and add your API key:
+   For Google Gemini (default):
+   - Get your API key from [Google AI Studio](https://aistudio.google.com/)
+   - Set `GOOGLE_API_KEY=your_google_api_key_here`
+   - Optionally set `MODEL_NAME` to a specific model (defaults to "gemini-1.5-flash")
+   For Hugging Face (recommended for Hugging Face deployment):
+   - Get your API key from [Hugging Face](https://huggingface.co/settings/tokens)
+   - Set `HUGGINGFACE_API_KEY=your_huggingface_api_key_here`
+   - Set `API_TYPE=huggingface`
+   - Optionally set `MODEL_NAME` to a specific model (defaults to "mistralai/Mistral-7B-Instruct-v0.2")
+   For OpenAI (alternative):
+   - Get your API key from [OpenAI](https://platform.openai.com/api-keys)
+   - Set `OPENAI_API_KEY=your_openai_api_key_here`
+   - Set `API_TYPE=openai`
+   - Optionally set `MODEL_NAME` to a specific model (defaults to "gpt-3.5-turbo")
+3. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. Run the app:
+   ```bash
+   streamlit run src/app.py
+   ```
+## 📌 Environment Variables
+- `API_TYPE` (optional): The API provider to use. Options are "google", "huggingface", or "openai" (defaults to "huggingface")
+- `GOOGLE_API_KEY` (required for Google): Your Google AI API key
+- `HUGGINGFACE_API_KEY` (required for Hugging Face): Your Hugging Face API key
+- `OPENAI_API_KEY` (required for OpenAI): Your OpenAI API key
+- `MODEL_NAME` (optional): The model to use (defaults to provider-specific models)
+- `TEMPERATURE` (optional): Model temperature (defaults to 0.3)
+- `MAX_TOKENS` (optional): Maximum tokens in response (defaults to 2048)
+- `IS_DOCKER` (optional): Set to "true" when running in Docker
+## 🎯 Available Topics
+1. **Python** - Core Python concepts, data structures, functions, decorators
+2. **Data Analysis with Pandas & NumPy** - Data wrangling, vectorization, time series
+3. **SQL** - ANSI SQL with focus on PostgreSQL/SQLite
+4. **Power BI** - DAX formulas, data modeling, performance tuning
+5. **Machine Learning** - Scikit-learn, model evaluation, feature engineering
+6. **Deep Learning** - Neural networks with TensorFlow/PyTorch
+7. **Data Visualization** - Effective static & interactive plots
+## 🐳 Docker Support
+To run in Docker:
+```bash
+docker build -t finesse-school .
+docker run -p 8501:8501 -e HUGGINGFACE_API_KEY=your_key_here -e API_TYPE=huggingface finesse-school
+```
+## 🛠️ Troubleshooting
+### Common issue: "model not found" (404)
+If you see an error like:
+```
+Tutor error: 404 models/gemini-1.5-flash is not found for API version v1beta, or is not supported for generateContent.
+```
+Steps to resolve:
+1. First, list available models for your API key:
+   ```powershell
+   python .\scripts\list_models.py
+   ```
+2. Set `MODEL_NAME` to one of the available models from the list:
+   ```powershell
+   $env:MODEL_NAME="gemini-1.0-pro"  # Example - use an available model from the list
+   ```
+### Hugging Face API Key Issues
+If you're getting an error like "You must provide an api_key", make sure:
+1. You have set the `HUGGINGFACE_API_KEY` environment variable in your `.env` file
+2. Your API key is valid and has "Read" permissions
+3. You have set `API_TYPE=huggingface` in your `.env` file
+Example of a correct `.env` file for Hugging Face:
+```
+API_TYPE=huggingface
+HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
+```
+### Other Issues
+Run the troubleshooting script to diagnose common problems:
+```bash
+python .\scripts\troubleshoot.py
+```
+If you're still stuck, open an issue and include the output of `list_available_models()` and your `MODEL_NAME` value (do not include your API key).

apt.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+wkhtmltopdf
+xvfb
+libfontconfig

docker/Dockerfile ADDED Viewed

	@@ -0,0 +1,60 @@

+# Multi-stage build for optimization
+FROM python:3.11-slim as builder
+WORKDIR /app
+# Install system dependencies
+COPY apt.txt .
+RUN apt-get update && \
+    xargs -a apt.txt apt-get install -y --no-install-recommends || echo "No apt.txt file or no packages to install" && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+# Create non-root user
+RUN groupadd -r appuser && useradd -r -g appuser appuser
+# Install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Final stage
+FROM python:3.11-slim
+# Copy system dependencies from builder stage
+COPY --from=builder /usr/bin/wkhtmltopdf /usr/bin/wkhtmltopdf 2>/dev/null || echo "wkhtmltopdf not available"
+COPY --from=builder /usr/bin/xvfb-run /usr/bin/xvfb-run 2>/dev/null || echo "xvfb-run not available"
+COPY --from=builder /usr/lib/x86_64-linux-gnu/lib* /usr/lib/x86_64-linux-gnu/ 2>/dev/null || echo "No lib files to copy"
+COPY --from=builder /usr/lib/lib* /usr/lib/ 2>/dev/null || echo "No lib files to copy"
+COPY --from=builder /usr/share/wkhtmltopdf /usr/share/wkhtmltopdf 2>/dev/null || echo "wkhtmltopdf assets not available"
+COPY --from=builder /etc/fonts /etc/fonts 2>/dev/null || echo "Font files not available"
+COPY --from=builder /usr/share/fonts /usr/share/fonts 2>/dev/null || echo "Font files not available"
+# Create non-root user
+RUN groupadd -r appuser && useradd -r -g appuser appuser
+WORKDIR /app
+# Copy Python dependencies from builder stage
+COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
+# Copy application code
+COPY . .
+# Create .env file if it doesn't exist
+RUN if [ ! -f .env ]; then cp .env.example .env; fi
+# Change ownership to non-root user
+RUN chown -R appuser:appuser /app
+# Switch to non-root user
+USER appuser
+# Expose port
+EXPOSE 8501
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+  CMD curl --fail http://localhost:8501/_stcore/health || exit 1
+# Run app
+ENTRYPOINT ["streamlit", "run", "src/app.py", "--server.port=8501", "--server.address=0.0.0.0"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+streamlit
+langchain-google-genai
+langchain-huggingface
+langchain-openai
+pdfkit
+python-dotenv

scripts/list_models.py ADDED Viewed

	@@ -0,0 +1,76 @@

+#!/usr/bin/env python3
+"""List available models for the configured GOOGLE_API_KEY.
+Run:
+  $env:GOOGLE_API_KEY="your_key_here"
+  python .\scripts\list_models.py
+This prints a sample of models you can set as MODEL_NAME.
+"""
+import os
+import sys
+import json
+try:
+    from langchain_google_genai import GoogleGenerativeAI
+except Exception as e:
+    print("Missing dependency: langchain_google_genai (or import failed):", e)
+    sys.exit(2)
+def main():
+    key = os.getenv("GOOGLE_API_KEY")
+    if not key:
+        print("ERROR: Set the GOOGLE_API_KEY environment variable before running this script.")
+        print("In PowerShell: $env:GOOGLE_API_KEY=\"your_key_here\"")
+        print("In Bash/Mac/Linux: export GOOGLE_API_KEY=\"your_key_here\"")
+        return
+    print("🔍 Connecting to Google Generative AI...")
+    client = GoogleGenerativeAI(google_api_key=key)
+    try:
+        print("🔄 Fetching available models...")
+        models = client.list_models()
+    except Exception as e:
+        print("❌ Failed to list models:", e)
+        print("\n💡 Troubleshooting tips:")
+        print("   1. Check that your API key is valid and properly set")
+        print("   2. Ensure you have internet connectivity")
+        print("   3. Check if there are any firewall restrictions")
+        return
+    print("\n✅ Available models:")
+    print("=" * 60)
+    # models may be a list of strings or dict-like objects; print readable representation
+    try:
+        model_list = []
+        for i, m in enumerate(models):
+            if hasattr(m, 'name'):
+                # It's likely a Model object
+                model_name = m.name.replace('models/', '') if m.name.startswith('models/') else m.name
+                model_list.append(model_name)
+                print(f"{i+1:2d}. {model_name}")
+            else:
+                # It's likely a string or dict
+                model_str = str(m)
+                model_name = model_str.replace('models/', '') if model_str.startswith('models/') else model_str
+                model_list.append(model_name)
+                print(f"{i+1:2d}. {model_name}")
+        print(f"\n📊 Total models found: {len(model_list)}")
+    except TypeError:
+        print(models)
+    print("\n📝 To use a specific model, set the MODEL_NAME environment variable:")
+    print("PowerShell: $env:MODEL_NAME=\"model_name_from_list\"")
+    print("Bash/Linux: export MODEL_NAME=\"model_name_from_list\"")
+    print("\n⭐ Recommended models:")
+    recommended = [m for m in model_list if any(r in m.lower() for r in ['gemini-1.5', 'gemini-pro', 'gemma'])]
+    for model in recommended[:5]:
+        print(f"  • {model}")
+if __name__ == "__main__":
+    main()

scripts/test_huggingface.py ADDED Viewed

	@@ -0,0 +1,116 @@

+#!/usr/bin/env python3
+"""
+Test script to verify Hugging Face API setup.
+Run this script to check if your Hugging Face API key is working correctly:
+  python .\scripts\test_huggingface.py
+"""
+import os
+import sys
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+def test_huggingface_api():
+    """Test Hugging Face API connectivity."""
+    print("🔍 Testing Hugging Face API connectivity...")
+    # Check if API key is set
+    api_key = os.getenv("HUGGINGFACE_API_KEY")
+    if not api_key:
+        print("❌ HUGGINGFACE_API_KEY environment variable is not set")
+        print("   Please set your Hugging Face API key in the .env file")
+        return False
+    print(f"✅ HUGGINGFACE_API_KEY is set (length: {len(api_key)} characters)")
+    # Try to import the required library
+    try:
+        from langchain_huggingface import HuggingFaceEndpoint
+        print("✅ langchain_huggingface library is available")
+    except ImportError as e:
+        print(f"❌ Failed to import langchain_huggingface: {e}")
+        print("   Install it with: pip install langchain-huggingface")
+        return False
+    # Map model names to their appropriate task types
+    TASK_MAPPING = {
+        "microsoft/DialoGPT-large": "conversational",
+        "HuggingFaceH4/zephyr-7b-beta": "conversational",
+        "google/flan-t5-xxl": "text2text-generation",
+        "google/flan-t5-xl": "text2text-generation",
+        "google/flan-ul2": "text2text-generation",
+        "bigscience/bloom": "text-generation",
+        "gpt2": "text-generation",
+        "mistralai/Mistral-7B-Instruct-v0.2": "text-generation",
+    }
+    # List of models to try with their expected task types
+    models_to_try = [
+        (os.getenv("MODEL_NAME", "mistralai/Mistral-7B-Instruct-v0.2"), "text-generation"),
+        ("microsoft/DialoGPT-large", "conversational"),
+        ("google/flan-t5-xxl", "text2text-generation"),
+        ("HuggingFaceH4/zephyr-7b-beta", "conversational")
+    ]
+    for model_name, task_type in models_to_try:
+        print(f"🔍 Testing model initialization with: {model_name} (task: {task_type})")
+        try:
+            llm = HuggingFaceEndpoint(
+                repo_id=model_name,
+                huggingfacehub_api_token=api_key,
+                task=task_type,  # Specify the correct task type
+                temperature=0.1,
+                max_new_tokens=100
+            )
+            print("✅ Model initialized successfully")
+            # Test a simple prompt
+            print("🔍 Sending test request...")
+            # Use appropriate prompt format based on task type
+            if task_type == "conversational":
+                # For conversational models, we need to format the input as conversation
+                response = llm.invoke("Hello, how are you?")
+            else:
+                response = llm.invoke("Say 'Hello, FINESE SCHOOL!' in one word.")
+            print(f"✅ Test request successful")
+            print(f"   Response: {response.strip()}")
+            return True
+        except Exception as e:
+            print(f"❌ Failed with model {model_name}: {str(e)}")
+            print("   Trying next model...\n")
+            continue
+    print("❌ All models failed. Please check your API key and network connection.")
+    print("\n💡 Troubleshooting tips:")
+    print("   1. Check that your API key is valid")
+    print("   2. Verify you have internet connectivity")
+    print("   3. Check if there are any firewall restrictions")
+    print("   4. Make sure you haven't exceeded your rate limits")
+    return False
+def main():
+    """Main test function."""
+    print("🧪 FINESE SCHOOL Hugging Face API Test Script")
+    print("=" * 50)
+    success = test_huggingface_api()
+    print("\n📋 Summary")
+    print("=" * 50)
+    if success:
+        print("✅ Hugging Face API setup is working correctly!")
+        print("\n🚀 You can now run the main application:")
+        print("   streamlit run src/app.py")
+    else:
+        print("❌ Hugging Face API setup has issues.")
+        print("   Please check the error messages above and fix the issues.")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

scripts/troubleshoot.py ADDED Viewed

	@@ -0,0 +1,263 @@

+#!/usr/bin/env python3
+"""Troubleshoot common issues with the FINESE SCHOOL application.
+This script helps diagnose common configuration issues and provides
+suggestions for fixing them.
+"""
+import os
+import sys
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+def check_environment_variables():
+    """Check if required environment variables are set."""
+    print("🔍 Checking environment variables...")
+    # Check API type
+    api_type = os.getenv("API_TYPE", "huggingface").lower()
+    print(f"✅ API_TYPE is set to: {api_type}")
+    # Check API key based on API type
+    if api_type == "google":
+        api_key = os.getenv("GOOGLE_API_KEY")
+        if not api_key:
+            print("❌ GOOGLE_API_KEY is not set")
+            print("   Please set your Google API key in the .env file")
+            return False
+        else:
+            print(f"✅ GOOGLE_API_KEY is set (length: {len(api_key)} characters)")
+    elif api_type == "openai":
+        api_key = os.getenv("OPENAI_API_KEY")
+        if not api_key:
+            print("❌ OPENAI_API_KEY is not set")
+            print("   Please set your OpenAI API key in the .env file")
+            return False
+        else:
+            print(f"✅ OPENAI_API_KEY is set (length: {len(api_key)} characters)")
+    else:  # huggingface
+        api_key = os.getenv("HUGGINGFACE_API_KEY")
+        if not api_key:
+            print("❌ HUGGINGFACE_API_KEY is not set")
+            print("   Please set your Hugging Face API key in the .env file")
+            return False
+        else:
+            print(f"✅ HUGGINGFACE_API_KEY is set (length: {len(api_key)} characters)")
+    model_name = os.getenv("MODEL_NAME")
+    if model_name:
+        print(f"✅ MODEL_NAME is set to: {model_name}")
+    else:
+        print("⚠️  MODEL_NAME is not set, using default model")
+    temperature = os.getenv("TEMPERATURE", "0.3")
+    print(f"✅ TEMPERATURE is set to: {temperature}")
+    max_tokens = os.getenv("MAX_TOKENS", "2048")
+    print(f"✅ MAX_TOKENS is set to: {max_tokens}")
+    is_docker = os.getenv("IS_DOCKER", "false")
+    print(f"✅ IS_DOCKER is set to: {is_docker}")
+    return True
+def check_dependencies():
+    """Check if required dependencies are installed."""
+    print("\n🔍 Checking dependencies...")
+    dependencies = [
+        "streamlit",
+        "pdfkit",
+        "python-dotenv",
+        "langchain",
+        "pydantic",
+        "pygments"
+    ]
+    # API-specific dependencies
+    api_type = os.getenv("API_TYPE", "huggingface").lower()
+    if api_type == "google":
+        dependencies.append("langchain-google-genai")
+    elif api_type == "openai":
+        dependencies.append("langchain-openai")
+    else:  # huggingface
+        dependencies.append("langchain-huggingface")
+    missing_deps = []
+    for dep in dependencies:
+        try:
+            __import__(dep)
+            print(f"✅ {dep} is installed")
+        except ImportError as e:
+            print(f"❌ {dep} is missing: {e}")
+            missing_deps.append(dep)
+    if missing_deps:
+        print(f"\n🔧 To install missing dependencies, run:")
+        print(f"   pip install {' '.join(missing_deps)}")
+        return False
+    return True
+def check_model_access():
+    """Check if we can access the configured model."""
+    print("\n🔍 Checking model access...")
+    api_type = os.getenv("API_TYPE", "huggingface").lower()
+    if api_type == "google":
+        try:
+            from langchain_google_genai import GoogleGenerativeAI
+            api_key = os.getenv("GOOGLE_API_KEY")
+            if not api_key:
+                print("❌ Cannot check model access without GOOGLE_API_KEY")
+                return False
+            model_name = os.getenv("MODEL_NAME", "gemini-1.5-flash")
+            print(f"   Testing access to Google model: {model_name}")
+            llm = GoogleGenerativeAI(
+                model=model_name,
+                google_api_key=api_key
+            )
+            # Test a simple prompt
+            print("   Sending test request...")
+            response = llm.invoke("Say 'Hello, FINESE SCHOOL!' in one word.")
+            print(f"✅ Successfully connected to Google Generative AI")
+            print(f"   Test response: {response.content.strip()}")
+            return True
+        except Exception as e:
+            print(f"❌ Failed to access Google model: {str(e)}")
+            print("\n💡 Troubleshooting tips:")
+            print("   1. Check that your API key is valid")
+            print("   2. Verify the model name is correct by running scripts/list_models.py")
+            print("   3. Check your internet connection")
+            return False
+    elif api_type == "openai":
+        try:
+            from langchain_openai import ChatOpenAI
+            api_key = os.getenv("OPENAI_API_KEY")
+            if not api_key:
+                print("❌ Cannot check model access without OPENAI_API_KEY")
+                return False
+            model_name = os.getenv("MODEL_NAME", "gpt-3.5-turbo")
+            print(f"   Testing access to OpenAI model: {model_name}")
+            llm = ChatOpenAI(
+                model_name=model_name,
+                openai_api_key=api_key
+            )
+            # Test a simple prompt
+            print("   Sending test request...")
+            response = llm.invoke("Say 'Hello, FINESE SCHOOL!' in one word.")
+            print(f"✅ Successfully connected to OpenAI")
+            print(f"   Test response: {response.content.strip()}")
+            return True
+        except Exception as e:
+            print(f"❌ Failed to access OpenAI model: {str(e)}")
+            print("\n💡 Troubleshooting tips:")
+            print("   1. Check that your API key is valid")
+            print("   2. Verify the model name is correct")
+            print("   3. Check your internet connection")
+            return False
+    else:  # huggingface
+        try:
+            from langchain_huggingface import HuggingFaceEndpoint
+            api_key = os.getenv("HUGGINGFACE_API_KEY")
+            if not api_key:
+                print("❌ Cannot check model access without HUGGINGFACE_API_KEY")
+                return False
+            model_name = os.getenv("MODEL_NAME", "mistralai/Mistral-7B-Instruct-v0.2")
+            print(f"   Testing access to Hugging Face model: {model_name}")
+            llm = HuggingFaceEndpoint(
+                repo_id=model_name,
+                huggingfacehub_api_token=api_key,
+                temperature=0.1,
+                max_new_tokens=100
+            )
+            # Test a simple prompt
+            print("   Sending test request...")
+            response = llm.invoke("Say 'Hello, FINESE SCHOOL!' in one word.")
+            print(f"✅ Successfully connected to Hugging Face Inference API")
+            print(f"   Test response: {response.strip()}")
+            return True
+        except Exception as e:
+            print(f"❌ Failed to access Hugging Face model: {str(e)}")
+            print("\n💡 Troubleshooting tips:")
+            print("   1. Check that your API key is valid")
+            print("   2. Verify the model name is correct")
+            print("   3. Check your internet connection")
+            print("   4. Make sure you haven't exceeded rate limits")
+            return False
+def check_wkhtmltopdf():
+    """Check if wkhtmltopdf is installed for PDF generation."""
+    print("\n🔍 Checking PDF generation support...")
+    try:
+        import pdfkit
+        print("✅ pdfkit is installed")
+    except ImportError:
+        print("❌ pdfkit is not installed")
+        return False
+    try:
+        # Try to configure wkhtmltopdf
+        config = pdfkit.configuration()
+        print("✅ wkhtmltopdf is configured")
+        return True
+    except OSError:
+        print("⚠️  wkhtmltopdf is not installed or not in PATH")
+        print("   PDF export functionality will be limited")
+        print("\n🔧 To install wkhtmltopdf:")
+        print("   Windows: Download from https://wkhtmltopdf.org/downloads.html")
+        print("   macOS: brew install --cask wkhtmltopdf")
+        print("   Linux: sudo apt-get install wkhtmltopdf")
+        return True  # Not critical for basic functionality
+def main():
+    """Main troubleshooting function."""
+    print("🛠️  FINESE SCHOOL Troubleshooting Script")
+    print("=" * 50)
+    checks = [
+        check_environment_variables,
+        check_dependencies,
+        check_model_access,
+        check_wkhtmltopdf
+    ]
+    results = []
+    for check in checks:
+        try:
+            results.append(check())
+        except Exception as e:
+            print(f"❌ Check failed with exception: {e}")
+            results.append(False)
+    print("\n📋 Summary")
+    print("=" * 50)
+    if all(results):
+        print("✅ All checks passed! You should be able to run FINESE SCHOOL.")
+        print("\n🚀 To start the application, run:")
+        print("   streamlit run src/app.py")
+    else:
+        passed = sum(results)
+        total = len(results)
+        print(f"⚠️  {passed}/{total} checks passed.")
+        if passed > 0:
+            print("✅ Some functionality may work, but fix the issues above for full functionality.")
+        else:
+            print("❌ Critical issues found. Please address them before running the application.")
+        print("\n📝 For more help, check the README.md file or open an issue on GitHub.")
+if __name__ == "__main__":
+    main()

src/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+"""
+ChatBox Pro: Data Science Mentor
+An AI-powered assistant for data science professionals.
+"""

src/app.py ADDED Viewed

	@@ -0,0 +1,439 @@

+import sys
+import os
+# Add the project root directory to sys.path
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+import streamlit as st
+from dotenv import load_dotenv
+import re
+from src.config import TOPIC_REGISTRY
+from src.chat_engine import generate_structured_response
+from src.pdf_export import export_chat_to_pdf
+from src.utils import detect_language_from_context, sanitize_input
+# Load environment variables
+if os.getenv("IS_DOCKER") != "true":
+    load_dotenv()
+def highlight_text(text):
+    """Highlight important keywords in the text."""
+    keywords = ["important", "note", "remember", "key", "tip", "⚠️", "only", "strictly", "best practice", "crucial", "essential"]
+    sentences = text.split(". ")
+    highlighted_sentences = []
+    for sent in sentences:
+        if any(kw.lower() in sent.lower() for kw in keywords):
+            sent = f'<span style="background-color:#fff3cd; color:#856404; font-weight:bold;">{sent.strip()}.</span>'
+        else:
+            sent = sent.strip() + "." if sent.strip() else ""
+        highlighted_sentences.append(sent)
+    return ". ".join(filter(None, highlighted_sentences))
+# Configure page
+st.set_page_config(page_title="FINESE SCHOOL: Data Science Mentor", page_icon="🎓", layout="wide")
+# Define provider key mapping
+PROVIDER_KEY_MAPPING = {
+    "Google Gemini": "google",
+    "OpenAI": "openai",
+    "Hugging Face": "huggingface",
+    "Anthropic": "anthropic"
+}
+# Initialize session state
+if "chat_history" not in st.session_state:
+    st.session_state.chat_history = []
+if "llm_provider" not in st.session_state:
+    st.session_state.llm_provider = "Google Gemini"
+if "llm_api_key" not in st.session_state:
+    st.session_state.llm_api_key = ""
+if "llm_model" not in st.session_state:
+    st.session_state.llm_model = ""
+if "current_topic" not in st.session_state:
+    st.session_state.current_topic = list(TOPIC_REGISTRY.keys())[0] if TOPIC_REGISTRY else None
+# Apply custom CSS
+st.markdown("""
+<style>
+    .diagnosis {
+        background-color: #fff8e1;
+        padding: 15px;
+        border-radius: 10px;
+        margin: 15px 0;
+        border-left: 5px solid #ffc107;
+        box-shadow: 0 2px 5px rgba(0,0,0,0.05);
+    }
+    .tip {
+        background-color: #e8f5e9;
+        border-left: 5px solid #4caf50;
+        padding: 15px;
+        border-radius: 10px;
+        margin: 15px 0;
+        box-shadow: 0 2px 5px rgba(0,0,0,0.05);
+    }
+    .refs {
+        background-color: #f3e5f5;
+        border-left: 5px solid #9c27b0;
+        padding: 15px;
+        border-radius: 10px;
+        margin: 15px 0;
+        box-shadow: 0 2px 5px rgba(0,0,0,0.05);
+    }
+    .stButton>button {
+        border-radius: 10px;
+    }
+    .chat-message {
+        padding: 20px;
+        border-radius: 10px;
+        margin-bottom: 15px;
+        box-shadow: 0 2px 5px rgba(0,0,0,0.1);
+    }
+    .user-message {
+        background-color: #e3f2fd;
+        border-left: 5px solid #2196f3;
+    }
+    .assistant-message {
+        background-color: #f5f5f5;
+        border-left: 5px solid #757575;
+    }
+    .highlight-keyword {
+        background-color: #fff3cd;
+        color: #856404;
+        font-weight: bold;
+    }
+    .topic-card {
+        border: 1px solid #e0e0e0;
+        border-radius: 10px;
+        padding: 15px;
+        margin-bottom: 15px;
+        background-color: #fafafa;
+        transition: transform 0.2s;
+    }
+    .topic-card:hover {
+        transform: translateY(-3px);
+        box-shadow: 0 4px 8px rgba(0,0,0,0.1);
+    }
+    .topic-title {
+        font-weight: bold;
+        font-size: 1.1em;
+        margin-bottom: 5px;
+    }
+    .topic-description {
+        color: #666;
+        font-size: 0.9em;
+    }
+    .welcome-banner {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        color: white;
+        padding: 25px;
+        border-radius: 15px;
+        margin-bottom: 25px;
+        text-align: center;
+    }
+    .stats-card {
+        background-color: #e3f2fd;
+        border-radius: 10px;
+        padding: 15px;
+        text-align: center;
+        margin-bottom: 15px;
+    }
+    .code-block {
+        background-color: #f8f9fa;
+        border-radius: 8px;
+        padding: 15px;
+        overflow-x: auto;
+        font-family: monospace;
+        font-size: 0.9em;
+        margin: 15px 0;
+        border: 1px solid #eee;
+    }
+    .on-topic-warning {
+        background-color: #ffebee;
+        border-left: 5px solid #f44336;
+        padding: 15px;
+        border-radius: 10px;
+        margin: 15px 0;
+    }
+</style>
+""", unsafe_allow_html=True)
+# Header
+st.markdown('<div class="welcome-banner"><h1>🎓 FINESE SCHOOL: Your 24/7 Data Mentor</h1><p>Get expert-level, topic-locked, code-rich answers with best practices</p></div>', unsafe_allow_html=True)
+# Sidebar
+with st.sidebar:
+    st.header("⚙️ Settings & Controls")
+    # Theme selector
+    theme = st.selectbox("🎨 Theme", ["Light", "Dark"])
+    if theme == "Dark":
+        st.markdown("""
+        <style>
+            .stApp {
+                background-color: #0e1117;
+                color: white;
+            }
+            .stMarkdown, .stText {
+                color: white;
+            }
+            .topic-card {
+                background-color: #262730;
+                color: white;
+            }
+            .topic-description {
+                color: #ccc;
+            }
+        </style>
+        """, unsafe_allow_html=True)
+    st.divider()
+    st.subheader("🤖 LLM Provider")
+    llm_provider = st.selectbox(
+        "Select LLM Provider",
+        ["Google Gemini", "OpenAI", "Hugging Face", "Anthropic", "None"],
+        index=0,
+        key="llm_provider"
+    )
+    provider_key = PROVIDER_KEY_MAPPING.get(llm_provider, "")
+    if llm_provider != "None" and provider_key:
+        api_key = st.text_input(
+            f"{llm_provider} API Key",
+            type="password",
+            key=f"{provider_key}_api_key",
+            help="Enter your API key for the selected provider"
+        )
+    # Define provider-specific model options
+    PROVIDER_MODELS = {
+        "Google Gemini": [
+            "gemini-1.5-flash", "gemini-1.5-pro", "gemini-1.5-advanced",
+            "gemini-1.0-pro", "gemini-1.5-ultra"
+        ],
+        "OpenAI": [
+            "gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo",
+            "gpt-4", "gpt-4-32k"
+        ],
+        "Hugging Face": [
+            "mistralai/Mistral-7B-Instruct-v0.2", "meta-llama/Llama-3-8b-chat-hf",
+            "google/flan-t5-xxl", "HuggingFaceH4/zephyr-7b-beta"
+        ],
+        "Anthropic": [
+            "claude-3-5-sonnet-20240620", "claude-3-opus-20240229",
+            "claude-3-haiku-20240307", "claude-2.1"
+        ]
+    }
+    # Get models for selected provider
+    model_options = PROVIDER_MODELS.get(llm_provider, [])
+    model_options.append("Custom Model")
+    # Use the extracted model options in the selectbox
+    model_name = st.selectbox(
+        "Model Name",
+        options=model_options,
+        key=f"{provider_key}_model",
+        help="Select a model name or choose 'Custom Model' to enter your own"
+    )
+    # Simplify the custom model input logic
+    if model_name == "Custom Model":
+        custom_model_name = st.text_input(
+            "Enter a custom model name",
+            placeholder="Type your model name here...",
+            key=f"{provider_key}_custom_model"
+        )
+        if not custom_model_name.strip():
+            st.error("Custom model name cannot be empty.")
+    else:
+        custom_model_name = None
+    # Stats
+    st.divider()
+    st.subheader("📊 Session Stats")
+    st.markdown(f'<div class="stats-card"><h3>{len(st.session_state.chat_history)//2}</h3><p>Questions Asked</p></div>', unsafe_allow_html=True)
+    # Topic information
+    st.divider()
+    st.subheader("📘 Topics")
+    for topic_key, topic_spec in TOPIC_REGISTRY.items():
+        with st.expander(topic_key):
+            st.markdown(f"""
+            <div class="topic-card">
+                <div class="topic-title">{topic_spec.name}</div>
+                <div class="topic-description">{topic_spec.description}</div>
+                <div style="margin-top: 10px;">
+                    <strong>Domain:</strong> {topic_spec.domain}<br>
+                    <strong>Allowed Libraries:</strong> {', '.join(topic_spec.allowed_libraries) or 'None'}<br>
+                    <strong>Banned Topics:</strong> {', '.join(topic_spec.banned_topics) or 'None'}
+                </div>
+            </div>
+            """, unsafe_allow_html=True)
+    # Conversation history controls
+    st.divider()
+    st.subheader("🗂️ Conversation")
+    col1, col2 = st.columns(2)
+    with col1:
+        if st.button("🗑️ Clear History", use_container_width=True):
+            st.session_state.chat_history = []
+            st.success("History cleared!")
+            st.rerun()
+    with col2:
+        if st.button("📥 Export to PDF", use_container_width=True):
+            if st.session_state.chat_history:
+                try:
+                    with st.spinner("Generating PDF..."):
+                        pdf_bytes = export_chat_to_pdf(st.session_state.chat_history)
+                        st.download_button(
+                            "✅ Download PDF",
+                            pdf_bytes,
+                            "data_mentor_session.pdf",
+                            "application/pdf",
+                            use_container_width=True
+                        )
+                except Exception as e:
+                    st.error(f"PDF generation failed: {str(e)}")
+                    st.info("Please try again or contact support if the issue persists.")
+            else:
+                st.warning("No conversation to export")
+    # Info
+    st.divider()
+    st.subheader("ℹ️ About")
+    st.info("FINESE SCHOOL provides expert-level answers on data science topics with code examples and best practices.")
+# API Key validation - MOVED AFTER SIDEBAR
+current_provider = st.session_state.llm_provider
+if current_provider != "None":
+    provider_key = PROVIDER_KEY_MAPPING.get(current_provider, "")
+    if provider_key:
+        api_key = st.session_state.get(f"{provider_key}_api_key", "")
+        if not api_key:
+            st.error(f"⚠️ {current_provider} API key not found. Please enter your API key in the sidebar.")
+            st.stop()
+# Main interface
+col1, col2 = st.columns([1, 2])
+with col1:
+    st.header("🎯 Select Topic")
+    topic_keys = list(TOPIC_REGISTRY.keys())
+    selected_topic = st.selectbox("Choose your domain", topic_keys, index=topic_keys.index(st.session_state.current_topic) if st.session_state.current_topic in topic_keys else 0)
+    st.session_state.current_topic = selected_topic
+    topic_spec = TOPIC_REGISTRY[selected_topic]
+    st.markdown(f"""
+    <div class="topic-card">
+        <div class="topic-title">Current Topic: {topic_spec.name}</div>
+        <div class="topic-description">{topic_spec.description}</div>
+        <div style="margin-top: 10px;">
+            <strong>Style Guide:</strong> {topic_spec.style_guide}
+        </div>
+    </div>
+    """, unsafe_allow_html=True)
+with col2:
+    st.header("❓ Ask a Question")
+    user_q = st.text_area("Enter your precise question", height=120, placeholder=f"Ask anything about {selected_topic}...")
+    col_btn1, col_btn2 = st.columns(2)
+    with col_btn1:
+        submit = st.button("🧠 Get Expert Answer", type="primary", use_container_width=True)
+    with col_btn2:
+        clear = st.button("🗑️ Clear Chat", use_container_width=True)
+# Process user query
+if submit and user_q.strip():
+    # Sanitize input
+    sanitized_question = sanitize_input(user_q.strip())
+    if len(sanitized_question) < 10:
+        st.warning("Please enter a more detailed question (at least 10 characters).")
+    else:
+        try:
+            with st.spinner("Dr. Data is analyzing your question..."):
+                # Add user question to chat
+                st.session_state.chat_history.append(("🧑‍🎓 You", sanitized_question))
+                # Generate response
+                response = generate_structured_response(selected_topic, sanitized_question)
+                if not response.is_on_topic:
+                    msg = f'<div class="on-topic-warning"><strong>⚠️ Off-topic Question</strong><br>{response.answer}</div>'
+                    st.session_state.chat_history.append(("🤖 Dr. Data", msg))
+                else:
+                    # Build rich response
+                    parts = []
+                    if response.diagnosis:
+                        parts.append(f'<div class="diagnosis"><strong>🔍 Diagnosis:</strong> {response.diagnosis}</div>')
+                    parts.append(f'<div class="answer">{response.answer}</div>')
+                    if response.code_example:
+                        lang = detect_language_from_context(sanitized_question, selected_topic)
+                        parts.append(f'<div class="code-block">{response.code_example}</div>')
+                    if response.best_practice_tip:
+                        parts.append(f'<div class="tip"><strong>💡 Best Practice:</strong> {response.best_practice_tip}</div>')
+                    if response.references:
+                        refs = "<br>".join(f"• <a href='{r}' target='_blank'>{r}</a>" for r in response.references)
+                        parts.append(f'<div class="refs"><strong>📚 References:</strong><br>{refs}</div>')
+                    full_response = "".join(parts)
+                    # Apply highlighting to the response
+                    highlighted_response = highlight_text(full_response)
+                    st.session_state.chat_history.append(("🤖 Dr. Data", highlighted_response))
+                st.rerun()
+        except Exception as e:
+            st.error(f"❌ Tutor error: {str(e)}")
+            # Add error to chat for context
+            st.session_state.chat_history.append(("🤖 Dr. Data", f"❌ Sorry, I encountered an error: {str(e)}"))
+# Clear chat
+if clear:
+    st.session_state.chat_history = []
+    st.success("Chat cleared!")
+    st.rerun()
+# Render chat with markdown + HTML
+st.divider()
+st.header("💬 Conversation")
+# Limit conversation history for performance
+MAX_HISTORY = 50
+if len(st.session_state.chat_history) > MAX_HISTORY * 2:
+    st.session_state.chat_history = st.session_state.chat_history[-MAX_HISTORY * 2:]
+# Display messages
+if st.session_state.chat_history:
+    for sender, content in st.session_state.chat_history:
+        is_user = "You" in sender
+        message_class = "user-message" if is_user else "assistant-message"
+        with st.container():
+            if is_user:
+                st.markdown(
+                    f"""
+                    <div class="chat-message {message_class}">
+                        <strong>{sender}</strong>
+                        <div style="margin-top: 10px;">{content}</div>
+                    </div>
+                    """,
+                    unsafe_allow_html=True
+                )
+            else:
+                # Assistant message with enhanced styling
+                st.markdown(
+                    f"""
+                    <div class="chat-message {message_class}">
+                        <strong>{sender}</strong>
+                        <div style="margin-top: 10px;">{content}</div>
+                    </div>
+                    """,
+                    unsafe_allow_html=True
+                )
+else:
+    st.info("👋 Welcome! Select a topic and ask your first question to get started.")

src/chat_engine.py ADDED Viewed

	@@ -0,0 +1,228 @@

+import os
+import logging
+from langchain_core.output_parsers import PydanticOutputParser
+from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate
+from src.config import TOPIC_REGISTRY, MODEL_NAME, TEMPERATURE, MAX_TOKENS
+from src.models import TutorResponse
+# Conditional imports based on available API
+try:
+    from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAI
+    GOOGLE_API_AVAILABLE = True
+except ImportError:
+    GOOGLE_API_AVAILABLE = False
+    logging.warning("Google Generative AI library not available")
+try:
+    from langchain_huggingface import HuggingFaceEndpoint
+    HUGGINGFACE_API_AVAILABLE = True
+except ImportError:
+    HUGGINGFACE_API_AVAILABLE = False
+    logging.warning("HuggingFace library not available")
+try:
+    from langchain_openai import ChatOpenAI
+    OPENAI_API_AVAILABLE = True
+except ImportError:
+    OPENAI_API_AVAILABLE = False
+    logging.warning("OpenAI library not available")
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def get_llm():
+    # Determine which API to use based on environment variables
+    api_type = os.getenv("API_TYPE", "huggingface").lower()
+    if api_type == "google" and GOOGLE_API_AVAILABLE:
+        return get_google_llm()
+    elif api_type == "openai" and OPENAI_API_AVAILABLE:
+        return get_openai_llm()
+    elif api_type == "huggingface" and HUGGINGFACE_API_AVAILABLE:
+        return get_huggingface_llm()
+    else:
+        # Fallback to HuggingFace if preferred option is not available
+        if HUGGINGFACE_API_AVAILABLE:
+            return get_huggingface_llm()
+        elif GOOGLE_API_AVAILABLE:
+            return get_google_llm()
+        elif OPENAI_API_AVAILABLE:
+            return get_openai_llm()
+        else:
+            raise RuntimeError("No suitable LLM API available. Please install one of: langchain-google-genai, langchain-huggingface, langchain-openai")
+def get_google_llm():
+    key = os.getenv("GOOGLE_API_KEY")
+    if not key:
+        raise RuntimeError("GOOGLE_API_KEY is required for Google API")
+    # Ensure model name is set with fallback to a more current default
+    model_name = MODEL_NAME if MODEL_NAME else "gemini-1.5-flash"
+    logger.info(f"Initializing Google LLM with model: {model_name}")
+    return ChatGoogleGenerativeAI(
+        model=model_name,
+        temperature=TEMPERATURE,
+        max_tokens=MAX_TOKENS,
+        google_api_key=key,
+        convert_system_message_to_human=True  # Required for Gemini in LangChain
+    )
+def get_openai_llm():
+    key = os.getenv("OPENAI_API_KEY")
+    if not key:
+        raise RuntimeError("OPENAI_API_KEY is required for OpenAI API")
+    # Ensure model name is set
+    model_name = MODEL_NAME if MODEL_NAME else "gpt-3.5-turbo"
+    logger.info(f"Initializing OpenAI LLM with model: {model_name}")
+    return ChatOpenAI(
+        model_name=model_name,
+        temperature=TEMPERATURE,
+        max_tokens=MAX_TOKENS,
+        openai_api_key=key
+    )
+# ... existing code ...
+def get_huggingface_llm():
+    key = os.getenv("HUGGINGFACE_API_KEY")
+    # Check if API key is provided
+    if not key:
+        raise RuntimeError("HUGGINGFACE_API_KEY is required for Hugging Face API. Please set your API key.")
+    # Default to a good open-source model if none specified
+    model_name = MODEL_NAME if MODEL_NAME else "mistralai/Mistral-7B-Instruct-v0.2"
+    logger.info(f"Initializing HuggingFace LLM with model: {model_name}")
+    # Determine appropriate task based on model
+    task = "text-generation"
+    if "zephyr" in model_name.lower() or "dialo" in model_name.lower() or "mistral" in model_name.lower():
+        task = "conversational"
+    elif "flan" in model_name.lower():
+        task = "text2text-generation"
+    elif "t5" in model_name.lower():
+        task = "text2text-generation"
+    # Try to initialize the HuggingFace endpoint
+    try:
+        return HuggingFaceEndpoint(
+            repo_id=model_name,
+            huggingfacehub_api_token=key,
+            task=task,
+            temperature=TEMPERATURE,
+            max_new_tokens=MAX_TOKENS,
+        )
+    except Exception as e:
+        raise RuntimeError(f"Failed to initialize Hugging Face model {model_name}: {str(e)}")
+def validate_model_availability(model_name: str, api_key: str):
+    """
+    Validate if the specified model is available for the given API key.
+    Args:
+        model_name: Name of the model to check
+        api_key: API key
+    Raises:
+        RuntimeError: If the model is not available
+    """
+    # Simplified validation approach
+    logger.warning("Model validation is not implemented for all providers. Proceeding with initialization.")
+    pass
+def build_expert_prompt(topic_spec, user_question: str) -> ChatPromptTemplate:
+    parser = PydanticOutputParser(pydantic_object=TutorResponse)
+    system_message = f"""
+You are Dr. Data, a world-class data science educator with PhDs in CS and Statistics.
+You are tutoring a professional on: **{topic_spec.name}**
+Context:
+- Allowed libraries: {', '.join(topic_spec.allowed_libraries) or 'None'}
+- Avoid: {', '.join(topic_spec.banned_topics) or 'Nothing'}
+- Style: {topic_spec.style_guide}
+Rules:
+1. If the question is off-topic (e.g., about web dev in a Pandas session), set is_on_topic=False and give a polite redirect.
+2. Always attempt diagnosis: what might the user be confused about?
+3. Code must be minimal, correct, and include necessary imports.
+4. Cite official documentation when possible.
+5. NEVER hallucinate package functions.
+6. Output ONLY in the requested JSON format.
+{{format_instructions}}
+"""
+    return ChatPromptTemplate.from_messages([
+        SystemMessagePromptTemplate.from_template(system_message),
+        ("human", "Question: {question}")
+    ])
+def generate_structured_response(topic_key: str, user_question: str) -> TutorResponse:
+    try:
+        llm = get_llm()
+    except Exception as e:
+        raise RuntimeError(f"Failed to initialize LLM: {str(e)}")
+    topic_spec = TOPIC_REGISTRY[topic_key]
+    # Create parser
+    parser = PydanticOutputParser(pydantic_object=TutorResponse)
+    # Build prompt with proper variable names
+    prompt = build_expert_prompt(topic_spec, user_question)
+    # Create the chain with proper variable binding
+    chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm
+    # Invoke with the question
+    try:
+        raw_output = chain.invoke({"question": user_question})
+        logger.info(f"Raw LLM output: {raw_output.content[:200]}...")
+    except Exception as e:
+        error_msg = str(e).lower()
+        if "401" in error_msg or "unauthorized" in error_msg:
+            detailed_msg = "API key is invalid or expired. Please check your API key in the sidebar settings."
+        elif "429" in error_msg or "rate limit" in error_msg:
+            detailed_msg = "Rate limit exceeded. Please wait a few minutes or check your API plan limits."
+        elif "connection" in error_msg or "timeout" in error_msg:
+            detailed_msg = "Network connection issue. Please check your internet connection and try again."
+        elif "model" in error_msg and "not found" in error_msg:
+            detailed_msg = f"Model '{MODEL_NAME}' not available. Please select a valid model from the dropdown or check spelling."
+        else:
+            detailed_msg = f"Unexpected error: {str(e)}. Please check your model configuration."
+        raise RuntimeError(f"Failed to get response from LLM: {detailed_msg}")
+    # Parse and validate
+    try:
+        response = parser.parse(raw_output.content)
+    except Exception as e:
+        # Try to extract JSON from the response if parsing fails
+        import re
+        import json
+        # Look for JSON in the response
+        json_match = re.search(r'\{.*\}', raw_output.content, re.DOTALL)
+        if json_match:
+            try:
+                json_str = json_match.group(0)
+                # Fix common JSON issues
+                json_str = json_str.replace('\n', '').replace('\t', '')
+                # Parse and reconstruct response
+                json_data = json.loads(json_str)
+                response = TutorResponse(**json_data)
+            except Exception as json_e:
+                raise ValueError(f"Failed to parse LLM output as JSON: {json_e}\nOriginal error: {e}\nRaw: {raw_output.content[:500]}...")
+        else:
+            # Fallback: retry with stricter prompt or return error
+            raise ValueError(f"Failed to parse LLM output: {e}\nRaw: {raw_output.content[:500]}...")
+    return response

src/config.py ADDED Viewed

	@@ -0,0 +1,106 @@

+import os
+from typing import Dict, List, Literal
+from pydantic import BaseModel
+class TopicSpec(BaseModel):
+    name: str
+    description: str
+    domain: Literal["programming", "analysis", "visualization", "bi", "ml", "dl"]
+    allowed_libraries: List[str]
+    banned_topics: List[str]  # e.g., web dev, mobile
+    style_guide: str
+TOPIC_REGISTRY = {
+    "Python": TopicSpec(
+        name="Python",
+        description="Core Python: data structures, functions, decorators, context managers, type hints, performance.",
+        domain="programming",
+        allowed_libraries=["builtins", "collections", "itertools", "functools", "pathlib", "json"],
+        banned_topics=["Django", "Flask", "GUI", "web scraping", "APIs"],
+        style_guide="Be concise. Prefer standard library. Use type hints. Show 1-2 line examples unless complex."
+    ),
+    "Data Analysis with Pandas & NumPy": TopicSpec(
+        name="Data Analysis with Pandas & NumPy",
+        description="Data wrangling, vectorization, time series, memory optimization.",
+        domain="analysis",
+        allowed_libraries=["pandas", "numpy", "polars"],
+        banned_topics=["web", "streaming", "big data frameworks"],
+        style_guide="Always show DataFrame/Series input and output. Use .head() in examples. Avoid chained indexing."
+    ),
+    "SQL": TopicSpec(
+        name="SQL",
+        description="ANSI SQL with focus on PostgreSQL/SQLite. Window functions, CTEs, optimization.",
+        domain="analysis",
+        allowed_libraries=[],
+        banned_topics=["ORM", "NoSQL", "MongoDB"],
+        style_guide="Use explicit JOINs. Prefer CTEs over subqueries. Comment on performance implications."
+    ),
+    "Power BI": TopicSpec(
+        name="Power BI",
+        description="DAX formulas, data modeling, relationships, performance tuning.",
+        domain="bi",
+        allowed_libraries=[],
+        banned_topics=["Tableau", "Looker", "Python scripts in PBI"],
+        style_guide="Explain DAX logic step-by-step. Use VAR for readability. Warn about context transition gotchas."
+    ),
+    "Machine Learning": TopicSpec(
+        name="Machine Learning",
+        description="Scikit-learn, model evaluation, feature engineering, interpretability.",
+        domain="ml",
+        allowed_libraries=["sklearn", "xgboost", "lightgbm", "shap", "eli5"],
+        banned_topics=["LLMs", "neural nets", "PyTorch/TensorFlow"],
+        style_guide="Use pipelines. Show cross-validation. Emphasize data leakage prevention."
+    ),
+    "Deep Learning": TopicSpec(
+        name="Deep Learning",
+        description="Neural networks with TensorFlow/PyTorch: CNNs, RNNs, transformers basics.",
+        domain="dl",
+        allowed_libraries=["torch", "tensorflow", "keras", "transformers"],
+        banned_topics=["web deployment", "mobile"],
+        style_guide="Use high-level APIs (e.g., tf.keras). Show model.summary(). Include input shape."
+    ),
+    "Data Visualization": TopicSpec(
+        name="Data Visualization",
+        description="Effective static & interactive plots for insight communication.",
+        domain="visualization",
+        allowed_libraries=["matplotlib", "seaborn", "plotly", "altair"],
+        banned_topics=["D3.js", "web dashboards beyond Plotly"],
+        style_guide="Explain design choices (color, scale). Prefer Plotly for interactivity. Avoid pie charts."
+    ),
+}
+# Add validation for model configuration
+# Default to a more current and widely available model based on API type
+API_TYPE = os.getenv("API_TYPE", "huggingface").lower()
+if API_TYPE == "google":
+    DEFAULT_MODEL = "gemini-1.5-flash"
+elif API_TYPE == "openai":
+    DEFAULT_MODEL = "gpt-3.5-turbo"
+else:  # huggingface
+    DEFAULT_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
+MODEL_NAME = os.getenv("MODEL_NAME", DEFAULT_MODEL)
+# Ensure that the model name is valid
+if not MODEL_NAME:
+    MODEL_NAME = DEFAULT_MODEL
+try:
+    TEMPERATURE = float(os.getenv("TEMPERATURE", "0.3"))
+except ValueError:
+    TEMPERATURE = 0.3
+try:
+    MAX_TOKENS = int(os.getenv("MAX_TOKENS", "2048"))
+except ValueError:
+    MAX_TOKENS = 2048
+# Validate temperature range
+if TEMPERATURE < 0 or TEMPERATURE > 1:
+    TEMPERATURE = 0.3
+# Validate max tokens range
+if MAX_TOKENS < 1 or MAX_TOKENS > 8192:
+    MAX_TOKENS = 2048

src/models.py ADDED Viewed

	@@ -0,0 +1,10 @@

+from pydantic import BaseModel, Field
+from typing import List, Optional
+class TutorResponse(BaseModel):
+    is_on_topic: bool = Field(..., description="True only if question matches selected topic")
+    diagnosis: Optional[str] = Field(None, description="What the user might be misunderstanding")
+    answer: str = Field(..., description="Clear, step-by-step explanation")
+    code_example: Optional[str] = Field(None, description="Minimal, runnable code if applicable")
+    best_practice_tip: Optional[str] = Field(None, description="One key tip or warning")
+    references: List[str] = Field(default_factory=list, description="Official docs or authoritative sources")

src/pdf_export.py ADDED Viewed

	@@ -0,0 +1,192 @@

+import pdfkit
+import tempfile
+import os
+import html
+from pygments import highlight
+from pygments.lexers import get_lexer_by_name, guess_lexer
+from pygments.formatters import HtmlFormatter
+from src.utils import strip_html
+import logging
+logger = logging.getLogger(__name__)
+def syntax_highlight_code(code: str, language: str = "python") -> str:
+    try:
+        lexer = get_lexer_by_name(language)
+    except:
+        try:
+            lexer = guess_lexer(code)
+        except:
+            lexer = get_lexer_by_name("text")
+    formatter = HtmlFormatter(style="friendly", cssclass="codehilite")
+    return highlight(code, lexer, formatter)
+def render_chat_to_html(chat_history) -> str:
+    css = HtmlFormatter(style="friendly").get_style_defs('.codehilite')
+    html_lines = [f"""
+    <!DOCTYPE html>
+    <html>
+    <head>
+        <meta charset='utf-8'/>
+        <title>FINESE SCHOOL: Data Science Mentor Session</title>
+        <style>
+            body {{
+                font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+                line-height: 1.6;
+                padding: 30px;
+                background: #fff;
+                color: #333;
+            }}
+            h1 {{
+                color: #2c3e50;
+                text-align: center;
+                border-bottom: 2px solid #3498db;
+                padding-bottom: 10px;
+            }}
+            h2 {{
+                color: #3498db;
+                border-left: 4px solid #3498db;
+                padding-left: 10px;
+            }}
+            .message {{
+                margin-bottom: 25px;
+                padding: 20px;
+                border-radius: 12px;
+                box-shadow: 0 2px 5px rgba(0,0,0,0.1);
+            }}
+            .user {{
+                background: #e3f2fd;
+                border-left: 5px solid #2196f3;
+            }}
+            .assistant {{
+                background: #f5f5f5;
+                border-left: 5px solid #757575;
+            }}
+            .diagnosis {{
+                background: #fff8e1;
+                padding: 15px;
+                border-radius: 10px;
+                margin: 15px 0;
+                border-left: 5px solid #ffc107;
+            }}
+            .tip {{
+                background: #e8f5e9;
+                border-left: 5px solid #4caf50;
+                padding: 15px;
+                border-radius: 10px;
+                margin: 15px 0;
+            }}
+            .refs {{
+                background: #f3e5f5;
+                border-left: 5px solid #9c27b0;
+                padding: 15px;
+                border-radius: 10px;
+                margin: 15px 0;
+            }}
+            .on-topic-warning {{
+                background: #ffebee;
+                border-left: 5px solid #f44336;
+                padding: 15px;
+                border-radius: 10px;
+                margin: 15px 0;
+            }}
+            .code-block {{
+                background-color: #f8f9fa;
+                border-radius: 8px;
+                padding: 15px;
+                overflow-x: auto;
+                font-family: 'Courier New', monospace;
+                font-size: 0.9em;
+                margin: 15px 0;
+                border: 1px solid #eee;
+            }}
+            {css}
+            a {{
+                color: #3498db;
+                text-decoration: none;
+            }}
+            a:hover {{
+                text-decoration: underline;
+            }}
+        </style>
+    </head>
+    <body>
+        <h1>FINESE SCHOOL: Expert Data Science Session</h1>
+        <p><em>Session exported on {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</em></p>
+        <hr>
+    """]
+    for role, content in chat_history:
+        cls = "user" if "You" in role else "assistant"
+        clean_content = strip_html(content)
+        # Handle special content blocks
+        import re
+        # Process diagnosis blocks
+        clean_content = re.sub(r'<div class="diagnosis">(.*?)</div>', r'<div class="diagnosis">\1</div>', clean_content, flags=re.DOTALL)
+        # Process tip blocks
+        clean_content = re.sub(r'<div class="tip">(.*?)</div>', r'<div class="tip">\1</div>', clean_content, flags=re.DOTALL)
+        # Process reference blocks
+        clean_content = re.sub(r'<div class="refs">(.*?)</div>', r'<div class="refs">\1</div>', clean_content, flags=re.DOTALL)
+        # Process code blocks
+        def replace_code_block(match):
+            code = match.group(1)
+            return f'<div class="code-block"><pre>{html.escape(code)}</pre></div>'
+        clean_content = re.sub(r'<div class="codehilite">(.*?)</div>', replace_code_block, clean_content, flags=re.DOTALL)
+        # Process on-topic warnings
+        clean_content = re.sub(r'<div class="on-topic-warning">(.*?)</div>', r'<div class="on-topic-warning">\1</div>', clean_content, flags=re.DOTALL)
+        html_lines.append(f'<div class="message {cls}"><h2>{role}</h2><div>{clean_content}</div></div>')
+    html_lines.append("</body></html>")
+    return "".join(html_lines)
+def export_chat_to_pdf(chat_history) -> bytes:
+    try:
+        # Try to configure wkhtmltopdf - fallback to default if not found
+        try:
+            config = pdfkit.configuration(wkhtmltopdf="/usr/bin/wkhtmltopdf")
+        except:
+            config = None
+        html_content = render_chat_to_html(chat_history)
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".html", delete=False, encoding="utf-8") as f:
+            f.write(html_content)
+            temp_html = f.name
+        pdf_path = temp_html.replace(".html", ".pdf")
+        options = {
+            'page-size': 'A4',
+            'margin-top': '0.75in',
+            'margin-right': '0.75in',
+            'margin-bottom': '0.75in',
+            'margin-left': '0.75in',
+            'encoding': "UTF-8",
+            'no-outline': None,
+            'enable-local-file-access': None,
+            'quiet': ''
+        }
+        try:
+            if config:
+                pdfkit.from_file(temp_html, pdf_path, configuration=config, options=options)
+            else:
+                pdfkit.from_file(temp_html, pdf_path, options=options)
+            with open(pdf_path, "rb") as f:
+                return f.read()
+        finally:
+            for path in [temp_html, pdf_path]:
+                if os.path.exists(path):
+                    os.remove(path)
+    except Exception as e:
+        logger.error(f"PDF export failed: {str(e)}")
+        raise RuntimeError(f"Failed to export PDF: {str(e)}")

src/utils.py ADDED Viewed

	@@ -0,0 +1,209 @@

+import re
+import html
+import uuid
+import logging
+from typing import List, Tuple, Optional
+# Configure logging
+logger = logging.getLogger(__name__)
+def sanitize_input(text: str) -> str:
+    """Sanitize user input to prevent potential injection attacks.
+    Args:
+        text: User input text
+    Returns:
+        Sanitized text with safe characters only
+    """
+    try:
+        # Remove any potentially harmful characters while preserving basic formatting
+        sanitized = re.sub(r'[<>]', '', text)
+        # Remove any JavaScript event handlers
+        sanitized = re.sub(r'on\w+="[^"]*"', '', sanitized, flags=re.IGNORECASE)
+        # Limit length with increased capacity
+        return sanitized[:2000]
+    except Exception as e:
+        logger.error(f"Error sanitizing input: {e}")
+        return ""
+def strip_html(text: str) -> str:
+    """Remove HTML tags from text while preserving content structure.
+    Args:
+        text: HTML content to be stripped
+    Returns:
+        Plain text with HTML tags removed but content structure preserved
+    """
+    if not text:
+        return ""
+    # Replace line break tags with actual line breaks
+    text = text.replace('<br>', '\n')
+    text = text.replace('<br/>', '\n')
+    text = text.replace('</p>', '\n\n')
+    text = text.replace('</div>', '\n\n')
+    # Replace list tags with appropriate formatting
+    text = re.sub(r'</?ul>', '\n', text)
+    text = re.sub(r'</?ol>', '\n', text)
+    text = re.sub(r'<li>', '\n- ', text)
+    # Remove remaining HTML tags
+    clean_text = re.sub(r"<[^>]+>", "", text)
+    # Clean up extra whitespace
+    clean_text = re.sub(r'\n\s*\n', '\n\n', clean_text)
+    return clean_text.strip()
+def inject_interactive_elements(html_str: str) -> str:
+    """
+    Add interactive elements to HTML content like:
+    - Copy buttons for code blocks
+    - Expandable sections for long content
+    - Syntax highlighting
+    Args:
+        html_str: HTML content with potential code blocks
+    Returns:
+        HTML content with interactive elements added
+    """
+    if not html_str or '```' not in html_str:
+        return html_str
+    import re
+    # Add copy buttons to code blocks
+    def add_copy_button(match):
+        code_content = match.group(2)
+        code_lang = match.group(1) if match.group(1) else "text"
+        button_id = str(uuid.uuid4())[:8]
+        return f'''
+        <div style="position: relative; margin: 10px 0;">
+            <button id="copy-btn-{button_id}" onclick="copyCode('{button_id}')"
+                style="position: absolute; top: 5px; right: 5px; z-index: 10;
+                       background: #f0f0f0; border: 1px solid #ccc; border-radius: 4px;
+                       padding: 4px 8px; cursor: pointer; font-size: 12px;">
+                Copy
+            </button>
+            <pre style="padding: 20px 10px 10px 10px; border-radius: 8px;
+                        background: #f8f8f8; overflow-x: auto; position: relative;">
+                <code class="language-{code_lang}">{html.escape(code_content)}</code>
+            </pre>
+        </div>
+        '''
+    # Process code blocks with language specification
+    try:
+        result = re.sub(r'```(\w*)\n(.*?)```', add_copy_button, html_str, flags=re.DOTALL)
+        # Add JavaScript for copy functionality
+        js_script = """
+        <script>
+        function copyCode(elementId) {
+            const button = document.getElementById('copy-btn-' + elementId);
+            const codeBlock = button.nextElementSibling.querySelector('code');
+            const text = codeBlock.textContent;
+            navigator.clipboard.writeText(text).then(() => {
+                const originalText = button.textContent;
+                button.textContent = 'Copied!';
+                setTimeout(() => {
+                    button.textContent = originalText;
+                }, 2000);
+            }).catch(err => {
+                console.error('Failed to copy: ', err);
+                button.textContent = 'Failed';
+                setTimeout(() => {
+                    button.textContent = 'Copy';
+                }, 2000);
+            });
+        }
+        // Initialize syntax highlighting
+        document.addEventListener('DOMContentLoaded', (event) => {
+            document.querySelectorAll('pre code').forEach((el) => {
+                hljs.highlightElement(el);
+            });
+        });
+        </script>
+        """
+        # Add syntax highlighting CSS if needed
+        css_link = '<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/github.min.css">\n'
+        hljs_script = '<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>\n'
+        # Add the script and CSS if we have code blocks
+        result = css_link + hljs_script + result + js_script
+        return result
+    except Exception as e:
+        logger.error(f"Error adding interactive elements: {e}")
+        return html_str
+def detect_language_from_context(question: str, topic: str) -> str:
+    """Detect the programming language based on question and topic context.
+    Args:
+        question: User's question text
+        topic: Main topic of the query
+    Returns:
+        Detected programming language code
+    """
+    # Language mapping with common indicators
+    mapping = {
+        "Python": ["python", "pandas", "numpy", "matplotlib", "dataframe"],
+        "SQL": ["sql", "query", "database", "select", "join"],
+        "JavaScript": ["javascript", "js", "react", "dom", "node"],
+        "Java": ["java", "spring", "hibernate"],
+        "C#": ["c#", "csharp", "dotnet", ".net"],
+        "Power BI": ["dax", "powerbi", "power bi", "pbix"],
+        "Data Visualization": ["visualization", "chart", "plot", "graph"],
+        "HTML": ["html", "markup", "webpage"],
+        "CSS": ["css", "stylesheet"],
+        "Shell": ["bash", "shell", "command", "script"]
+    }
+    # Check topic first with exact matches
+    for lang, keywords in mapping.items():
+        for keyword in keywords:
+            if keyword.lower() in topic.lower():
+                return lang.lower()
+    # Check question for additional clues
+    question_lower = question.lower()
+    for lang, keywords in mapping.items():
+        for keyword in keywords:
+            if keyword.lower() in question_lower:
+                return lang.lower()
+    return "text"
+def truncate_text(text: str, max_length: int = 500, min_length: int = 200) -> str:
+    """Truncate text to a maximum length while trying to preserve meaningful content.
+    Args:
+        text: Text to truncate
+        max_length: Maximum length for the truncated text
+        min_length: Minimum length before adding ellipsis
+    Returns:
+        Truncated text with ellipsis if needed
+    """
+    if not text:
+        return ""
+    if len(text) <= max_length:
+        return text
+    # Try to find a natural break point
+    space_index = text.rfind(' ', min_length, max_length)
+    if space_index > 0:
+        return text[:space_index] + "..."
+    # Fallback to simple truncation
+    return text[:max_length] + "..."

tests/test_chat_engine.py ADDED Viewed

	@@ -0,0 +1,10 @@

+# tests/test_chat_engine.py
+import unittest
+from src.chat_engine import build_expert_prompt
+from src.config import TOPIC_REGISTRY
+class TestChatEngine(unittest.TestCase):
+    def test_build_expert_prompt(self):
+        topic_spec = TOPIC_REGISTRY["Python"]
+        prompt = build_expert_prompt(topic_spec, "What is a decorator?")
+        self.assertIn("Dr. Data", prompt.messages[0].content)