Spaces:
Configuration error
Configuration error
| MaterialMind | |
| Decisions, not summaries. MaterialMind is an LLM-powered materials-selection assistant that turns engineering requirements into an evidence-backed, ranked shortlist with page-level citations. It runs locally (Flask + Ollama) and searches your own PDF corpus via a lightweight RAG index. | |
| What MaterialMind does | |
| • Retrieves → Ranks → Explains: finds the most relevant pages, scores candidate materials against your constraints, and explains trade-offs with citations. | |
| • Local and private: your PDFs never leave your machine. | |
| • Robust ingestion: handles real-world PDFs (papers, standards, reports, textbooks); incremental updates. | |
| • Friendly UI: clean dark theme, background tiles, live weight controls (accepts 40/30/20/10 or 0.4/0.3/0.2/0.1). | |
| • Exact math: backend normalizes weights to sum to exactly 1.0000. | |
| Repository layout | |
| materialmind/ (your corpus lives here) | |
| ├─ sources/ (drop your PDFs here) | |
| └─ index/ (vector DB; auto-created) | |
| rag_mini.py (RAG: index/search/ask/answer/export) | |
| app_user.py (Flask app for end users) | |
| templates/ (base.html, index.html, results.html) | |
| static/ (styles.css and images: 11.png, 22.jpg) | |
| Note: We work with PDFs you drop into materialmind/sources (papers, standards, reports). You don’t need vendor datasheets; if you have them as PDFs, drop them in like any other PDF. | |
| Prerequisites | |
| • Python 3.10+ | |
| • pip and a virtual environment | |
| • Ollama (for model-generated answers). Example model: qwen2.5:7b-instruct | |
| • Optional: ocrmypdf + tesseract (only for scanned PDFs with no selectable text) | |
| Installation | |
| Create and activate a virtual environment. | |
| macOS / Linux: | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| Windows (PowerShell): | |
| python -m venv .venv | |
| .venv\Scripts\Activate.ps1 | |
| Install dependencies. | |
| Apple Silicon (macOS): | |
| pip install -U fastembed onnxruntime-silicon chromadb pypdf pymupdf markdown filelock flask flask-cors | |
| Linux/Windows: | |
| pip install -U fastembed onnxruntime chromadb pypdf pymupdf markdown filelock flask flask-cors | |
| Install and start Ollama; pull a local model. | |
| macOS (example): | |
| brew install ollama | |
| ollama serve & | |
| ollama pull qwen2.5:7b-instruct | |
| Confirm paths. | |
| rag_mini.py uses a folder named “materialmind” next to the script: | |
| BASE_DIR = Path(file).resolve().parent / "materialmind" | |
| Ensure the folder exists: | |
| materialmind/sources (create this and drop PDFs here) | |
| Add your PDFs | |
| Drop research papers, standards, reports, and textbooks into: | |
| materialmind/sources | |
| Tips: | |
| • Prefer publisher PDFs (selectable text). | |
| • If a PDF is scanned (no text), OCR it: | |
| ocrmypdf input.pdf output_ocr.pdf | |
| then place output_ocr.pdf in materialmind/sources | |
| Build or update the index | |
| First time (full index): | |
| python rag_mini.py --rebuild | |
| Later (only changed/new PDFs): | |
| python rag_mini.py --update | |
| Optional backup: | |
| python rag_mini.py --backup | |
| Optional export to JSONL (for inspection): | |
| python rag_mini.py --export-json ./materialmind_dump.jsonl | |
| Quick retrieval check (no model required) | |
| Example: | |
| python rag_mini.py --ask "Which alloys resist pitting in seawater better than 316L?" | |
| You will see top-k snippets with file:page citations from your PDFs. | |
| Run the UI | |
| Start the Flask app: | |
| python app_user.py | |
| Open: | |
| http://127.0.0.1:5000/ | |
| Usage: | |
| • Fill environment/temperature/constraints. | |
| • Set weights as percentages (e.g., 40/30/20/10) or fractions (0.4/0.3/0.2/0.1). The form enables the button only when the sum equals 100% (or 1.0). | |
| • Click “Get ranked shortlist” to see the ranked table, material cards, and page-level citations. | |
| How it works | |
| • Retrieval: FastEmbed (ONNX) creates embeddings; Chroma stores and retrieves chunks with file + page metadata. | |
| • Decision layer: your constraints and weights guide the model to produce a structured shortlist JSON (name, score, reasons, trade-offs, citations). | |
| • Local LLM: by default qwen2.5:7b-instruct via Ollama; you can swap models freely. | |
| CLI reference (rag_mini.py) | |
| --rebuild rebuild the entire index from PDFs in materialmind/sources | |
| --update incremental index of only changed/new PDFs | |
| --backup copy the current index to a timestamped folder | |
| --export-json PATH dump records to JSONL (optional) | |
| --ask "question" retrieval only with citations | |
| --answer "question" --model NAME --k N --show | |
| retrieval plus local LLM answer (Ollama required) | |
| Customization | |
| Change model (Ollama): | |
| ollama pull mistral:7b-instruct | |
| then set “Model” in the UI to mistral:7b-instruct | |
| Tune retrieval (rag_mini.py): | |
| CHUNK_CHARS, CHUNK_OVERLAP, DEFAULT_TOPK | |
| Swap embedding model (rag_mini.py): | |
| EMB_MODEL = "BAAI/bge-small-en-v1.5" (good default). Other FastEmbed models also work. | |
| Theme colors: | |
| Edit CSS variables in static/styles.css (:root { … }). | |
| Troubleshooting | |
| “Get ranked shortlist” button disabled: | |
| Enter weights as 40/30/20/10 or 0.4/0.3/0.2/0.1 until the sum reads 100%. | |
| Ollama not found / model errors: | |
| Install/start/pull as above. Retrieval still works without Ollama; only model-generated answers require it. | |
| Chroma lock / mutex errors: | |
| Close other processes using the index. If stuck: | |
| python rag_mini.py --backup | |
| rm -rf materialmind/index/chroma_v3 | |
| python rag_mini.py --rebuild | |
| No text in a PDF: | |
| OCR it with ocrmypdf, then re-index. | |
| Port already in use: | |
| Change the host/port in app_user.py. | |
| Privacy | |
| Everything runs locally by default. Your PDFs are indexed on disk and never uploaded. Remove PDFs from materialmind/sources and run --update to de-index. | |
| Roadmap | |
| Team/on-prem sharing, rule/property packs (e.g., PREN/oxidation windows/standards checks), uploads & tagging UI, LoRA adapter for stricter JSON, evaluation harness for grounding and JSON validity. | |
| License | |
| MIT (or your chosen permissive license). Add a LICENSE file to the repository. | |
| One-minute demo | |
| Drop a few corrosion or seawater PDFs into materialmind/sources. | |
| Build: python rag_mini.py --rebuild | |
| Run UI: python app_user.py | |
| Query seawater at 20–25 °C, UTS ≥ 600 MPa, weights 50/30/10/10. | |
| Show ranked shortlist, open a citation (file:page). | |
| Add another PDF, run --update, rerun the query, and show the new citation. |