Spaces:
Sleeping
title: Math Chatbot V2
emoji: 🌍
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
license: mit
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
MathSutra 12
MathSutra 12 is a domain-specific RAG chatbot for Class 12 Mathematics. It uses your 13 chapter PDFs, stores embeddings in ChromaDB, runs the LLM locally with Ollama, and answers with:
- a detailed explanation
- the most likely chapter
- the most likely topic
- supporting source chunks
Why this project setup
This project is best built in VS Code, not Colab.
Ollamaruns locally on your machine, which fits VS Code perfectly.ChromaDBpersists data locally, so your vector database stays available between runs.Streamlitis easier to demo from a local machine than from Colab.
Use Colab only if you want a cloud notebook experiment. For this final project, local development in VS Code is the better choice.
Recommended models
- LLM:
qwen2.5:7b-instruct - Embedding model:
nomic-embed-text
Why this pair:
qwen2.5:7b-instructis strong for reasoning and generally performs well on academic question answering.nomic-embed-textis lightweight and reliable for semantic retrieval in local RAG setups.
If your laptop is powerful, you can also try qwen2.5:14b.
Models available right now in this workspace
Your local Ollama already has:
mistral:latestembeddinggemma:latest
The included .env is configured to use these installed models so you can run the project immediately. You can switch to the recommended pair later by updating .env after pulling those models.
Project flow
- Put the PDFs in a
datafolder in the project root. - Install Python dependencies.
- Pull the Ollama models.
- Run ingestion to chunk, embed, and store data in ChromaDB.
- Launch Streamlit and ask questions.
Setup
Create and activate a virtual environment, then install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Create your environment file:
cp .env.example .env
Install and start Ollama, then pull the models:
ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text
ollama serve
Ingest the PDFs
python ingest.py --reset
What this does:
- discovers all chapter PDFs
- cleans extracted text
- chunks the content
- adds chapter and topic metadata
- creates embeddings with Ollama
- stores everything in ChromaDB
Run the app
streamlit run app.py
Optional CLI chat
python chat.py
File overview
app.py: Streamlit showcase appingest.py: data ingestion pipelinechat.py: terminal chat interfacetest.py: quick smoke test for a single questionsrc/edurag_math_bot/config.py: settings and pathssrc/edurag_math_bot/catalog.py: clean chapter metadatasrc/edurag_math_bot/pdf_processing.py: PDF extraction and chunkingsrc/edurag_math_bot/ollama_client.py: Ollama API wrappersrc/edurag_math_bot/vector_store.py: ChromaDB wrappersrc/edurag_math_bot/rag_chain.py: retrieval + prompt + answer generation
Suggested final project explanation
You can explain your system like this:
- Data collection: 13 NCERT Class 12 Mathematics PDFs
- Data cleaning: text extraction and cleanup from PDFs
- Chunking: splitting long text into meaningful chunks
- Embedding: converting chunks into vectors using
nomic-embed-text - Vector storage: storing vectors in
ChromaDB - Retrieval: finding the most relevant chunks for a user question
- LLM generation: sending retrieved context to
qwen2.5:7b-instruct - Output formatting: answering in detail with chapter and topic prediction
Notes
- The chapter name is detected from the PDF filename.
- The topic is inferred from chunk headings and nearby text.
- This works best on text-based PDFs. If your PDFs are image scans, add OCR first.