Spaces:
Configuration error
Configuration error
File size: 6,086 Bytes
201d38b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
MaterialMind
Decisions, not summaries. MaterialMind is an LLM-powered materials-selection assistant that turns engineering requirements into an evidence-backed, ranked shortlist with page-level citations. It runs locally (Flask + Ollama) and searches your own PDF corpus via a lightweight RAG index.
What MaterialMind does
• Retrieves → Ranks → Explains: finds the most relevant pages, scores candidate materials against your constraints, and explains trade-offs with citations.
• Local and private: your PDFs never leave your machine.
• Robust ingestion: handles real-world PDFs (papers, standards, reports, textbooks); incremental updates.
• Friendly UI: clean dark theme, background tiles, live weight controls (accepts 40/30/20/10 or 0.4/0.3/0.2/0.1).
• Exact math: backend normalizes weights to sum to exactly 1.0000.
Repository layout
materialmind/ (your corpus lives here)
├─ sources/ (drop your PDFs here)
└─ index/ (vector DB; auto-created)
rag_mini.py (RAG: index/search/ask/answer/export)
app_user.py (Flask app for end users)
templates/ (base.html, index.html, results.html)
static/ (styles.css and images: 11.png, 22.jpg)
Note: We work with PDFs you drop into materialmind/sources (papers, standards, reports). You don’t need vendor datasheets; if you have them as PDFs, drop them in like any other PDF.
Prerequisites
• Python 3.10+
• pip and a virtual environment
• Ollama (for model-generated answers). Example model: qwen2.5:7b-instruct
• Optional: ocrmypdf + tesseract (only for scanned PDFs with no selectable text)
Installation
Create and activate a virtual environment.
macOS / Linux:
python3 -m venv .venv
source .venv/bin/activate
Windows (PowerShell):
python -m venv .venv
.venv\Scripts\Activate.ps1
Install dependencies.
Apple Silicon (macOS):
pip install -U fastembed onnxruntime-silicon chromadb pypdf pymupdf markdown filelock flask flask-cors
Linux/Windows:
pip install -U fastembed onnxruntime chromadb pypdf pymupdf markdown filelock flask flask-cors
Install and start Ollama; pull a local model.
macOS (example):
brew install ollama
ollama serve &
ollama pull qwen2.5:7b-instruct
Confirm paths.
rag_mini.py uses a folder named “materialmind” next to the script:
BASE_DIR = Path(file).resolve().parent / "materialmind"
Ensure the folder exists:
materialmind/sources (create this and drop PDFs here)
Add your PDFs
Drop research papers, standards, reports, and textbooks into:
materialmind/sources
Tips:
• Prefer publisher PDFs (selectable text).
• If a PDF is scanned (no text), OCR it:
ocrmypdf input.pdf output_ocr.pdf
then place output_ocr.pdf in materialmind/sources
Build or update the index
First time (full index):
python rag_mini.py --rebuild
Later (only changed/new PDFs):
python rag_mini.py --update
Optional backup:
python rag_mini.py --backup
Optional export to JSONL (for inspection):
python rag_mini.py --export-json ./materialmind_dump.jsonl
Quick retrieval check (no model required)
Example:
python rag_mini.py --ask "Which alloys resist pitting in seawater better than 316L?"
You will see top-k snippets with file:page citations from your PDFs.
Run the UI
Start the Flask app:
python app_user.py
Open:
http://127.0.0.1:5000/
Usage:
• Fill environment/temperature/constraints.
• Set weights as percentages (e.g., 40/30/20/10) or fractions (0.4/0.3/0.2/0.1). The form enables the button only when the sum equals 100% (or 1.0).
• Click “Get ranked shortlist” to see the ranked table, material cards, and page-level citations.
How it works
• Retrieval: FastEmbed (ONNX) creates embeddings; Chroma stores and retrieves chunks with file + page metadata.
• Decision layer: your constraints and weights guide the model to produce a structured shortlist JSON (name, score, reasons, trade-offs, citations).
• Local LLM: by default qwen2.5:7b-instruct via Ollama; you can swap models freely.
CLI reference (rag_mini.py)
--rebuild rebuild the entire index from PDFs in materialmind/sources
--update incremental index of only changed/new PDFs
--backup copy the current index to a timestamped folder
--export-json PATH dump records to JSONL (optional)
--ask "question" retrieval only with citations
--answer "question" --model NAME --k N --show
retrieval plus local LLM answer (Ollama required)
Customization
Change model (Ollama):
ollama pull mistral:7b-instruct
then set “Model” in the UI to mistral:7b-instruct
Tune retrieval (rag_mini.py):
CHUNK_CHARS, CHUNK_OVERLAP, DEFAULT_TOPK
Swap embedding model (rag_mini.py):
EMB_MODEL = "BAAI/bge-small-en-v1.5" (good default). Other FastEmbed models also work.
Theme colors:
Edit CSS variables in static/styles.css (:root { … }).
Troubleshooting
“Get ranked shortlist” button disabled:
Enter weights as 40/30/20/10 or 0.4/0.3/0.2/0.1 until the sum reads 100%.
Ollama not found / model errors:
Install/start/pull as above. Retrieval still works without Ollama; only model-generated answers require it.
Chroma lock / mutex errors:
Close other processes using the index. If stuck:
python rag_mini.py --backup
rm -rf materialmind/index/chroma_v3
python rag_mini.py --rebuild
No text in a PDF:
OCR it with ocrmypdf, then re-index.
Port already in use:
Change the host/port in app_user.py.
Privacy
Everything runs locally by default. Your PDFs are indexed on disk and never uploaded. Remove PDFs from materialmind/sources and run --update to de-index.
Roadmap
Team/on-prem sharing, rule/property packs (e.g., PREN/oxidation windows/standards checks), uploads & tagging UI, LoRA adapter for stricter JSON, evaluation harness for grounding and JSON validity.
License
MIT (or your chosen permissive license). Add a LICENSE file to the repository.
One-minute demo
Drop a few corrosion or seawater PDFs into materialmind/sources.
Build: python rag_mini.py --rebuild
Run UI: python app_user.py
Query seawater at 20–25 °C, UTS ≥ 600 MPa, weights 50/30/10/10.
Show ranked shortlist, open a citation (file:page).
Add another PDF, run --update, rerun the query, and show the new citation. |