Spaces:
Sleeping
Sleeping
metadata
title: Polymer Datasheet Crawler Agent
emoji: π§ͺ
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false
license: mit
π§ͺ Polymer Datasheet Crawler Agent
A LangGraph-powered agent that automatically crawls the web for commercial polymer datasheets, extracts structured material properties using LLaMA 3.1, and builds a searchable database.
Features
- Web Search: Uses Tavily to find datasheets from official manufacturer sites and accredited sources (MatWeb, UL Prospector, Omnexus, etc.)
- LLM Extraction: Sends raw content to LLaMA 3.1 via HuggingFace Inference API to extract 40+ material properties into structured JSON
- PDF Upload: Users can upload their own PDF datasheets for extraction
- Searchable Database: All records are stored in SQLite with full-text search and filtering
- CSV Export: One-click export of the entire database
Architecture
ββββββββββββββββ
β Gradio UI β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
β LangGraph β Workflow Orchestration
β Workflow β
ββββββββ¬ββββββββ
β
ββββββΌββββββ ββββββββββββββββββββ
β Router ββββββΊβ Web Search β (Tavily API)
β β β OR Upload Parse β (PyMuPDF)
ββββββ¬ββββββ βββββββββ¬βββββββββββ
β β
ββββββΌβββββββββββββββββββββΌβββ
β LLM Parse (LLaMA 3.1) β (HuggingFace Inference)
ββββββββββββββ¬ββββββββββββββββ
ββββββββββββββΌββββββββββββββββ
β Store in SQLite DB β (SQLAlchemy)
ββββββββββββββ¬ββββββββββββββββ
ββββββββββββββΌββββββββββββββββ
β Return Structured Output β
ββββββββββββββββββββββββββββββ
Property Categories Extracted
| Category | Properties |
|---|---|
| General | Material name, trade name, manufacturer, polymer family, grade, description, processing method, features, applications |
| Mechanical | Tensile strength/modulus, elongation at break, flexural strength/modulus, impact strength (Charpy/Izod), hardness (Shore D/Rockwell), compressive strength |
| Thermal | Melting temp, glass transition temp, HDT, Vicat softening temp, service temp, thermal conductivity, CTE, flammability rating |
| Physical | Density, MFI, water/moisture absorption, specific gravity, transparency, color |
| Electrical | Dielectric strength/constant, volume/surface resistivity, dissipation factor |
| Chemical | Acid/alkali/solvent/UV resistance, weatherability |
| Regulatory | FDA, RoHS, REACH, UL94 |
Setup
1. Environment Variables
Create a .env file (see .env.example):
TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxxxxxx
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
2. Install Dependencies
pip install -r requirements.txt
3. Run Locally
python app.py
Open http://localhost:7860 in your browser.
4. Deploy to HuggingFace Spaces
- Create a new Space on HuggingFace (SDK: Gradio)
- Add
TAVILY_API_KEYandHF_TOKENas Space secrets - Push this repository to the Space
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/polymer-datasheet-agent
git push hf main
Usage
Search & Add
- Enter a manufacturer name (e.g., "SABIC") and/or polymer family (e.g., "Polycarbonate")
- Optionally specify a grade (e.g., "Lexan 141R")
- Click Search & Add β the agent will search the web, extract properties, and store the record
Upload Datasheet
- Upload a PDF datasheet
- Click Parse & Add β the agent will extract text, parse properties via LLM, and store the record
Database Browser
- Search across all fields with free text
- Filter by manufacturer or polymer family
- Export the full database to CSV
Project Context
This is Part 1 of the Plinity β Infinite Recyclable Polymers project. The database built here will be consumed by a downstream Material Matching Agent that recommends polymers based on application requirements.
Tech Stack
- LangGraph β Workflow orchestration
- Tavily β Web search API
- LLaMA 3.1 8B Instruct β LLM via HuggingFace Inference API
- SQLite + SQLAlchemy β Database
- Gradio β Web UI
- PyMuPDF β PDF parsing
- Pandas β Data manipulation