Spaces:
Sleeping
Sleeping
| title: Polymer Datasheet Crawler Agent | |
| emoji: π§ͺ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # π§ͺ Polymer Datasheet Crawler Agent | |
| A LangGraph-powered agent that automatically crawls the web for commercial polymer datasheets, extracts structured material properties using LLaMA 3.1, and builds a searchable database. | |
| ## Features | |
| - **Web Search**: Uses [Tavily](https://tavily.com) to find datasheets from official manufacturer sites and accredited sources (MatWeb, UL Prospector, Omnexus, etc.) | |
| - **LLM Extraction**: Sends raw content to [LLaMA 3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) via HuggingFace Inference API to extract 40+ material properties into structured JSON | |
| - **PDF Upload**: Users can upload their own PDF datasheets for extraction | |
| - **Searchable Database**: All records are stored in SQLite with full-text search and filtering | |
| - **CSV Export**: One-click export of the entire database | |
| ## Architecture | |
| ``` | |
| ββββββββββββββββ | |
| β Gradio UI β | |
| ββββββββ¬ββββββββ | |
| β | |
| ββββββββΌββββββββ | |
| β LangGraph β Workflow Orchestration | |
| β Workflow β | |
| ββββββββ¬ββββββββ | |
| β | |
| ββββββΌββββββ ββββββββββββββββββββ | |
| β Router ββββββΊβ Web Search β (Tavily API) | |
| β β β OR Upload Parse β (PyMuPDF) | |
| ββββββ¬ββββββ βββββββββ¬βββββββββββ | |
| β β | |
| ββββββΌβββββββββββββββββββββΌβββ | |
| β LLM Parse (LLaMA 3.1) β (HuggingFace Inference) | |
| ββββββββββββββ¬ββββββββββββββββ | |
| ββββββββββββββΌββββββββββββββββ | |
| β Store in SQLite DB β (SQLAlchemy) | |
| ββββββββββββββ¬ββββββββββββββββ | |
| ββββββββββββββΌββββββββββββββββ | |
| β Return Structured Output β | |
| ββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Property Categories Extracted | |
| | Category | Properties | | |
| |----------|-----------| | |
| | **General** | Material name, trade name, manufacturer, polymer family, grade, description, processing method, features, applications | | |
| | **Mechanical** | Tensile strength/modulus, elongation at break, flexural strength/modulus, impact strength (Charpy/Izod), hardness (Shore D/Rockwell), compressive strength | | |
| | **Thermal** | Melting temp, glass transition temp, HDT, Vicat softening temp, service temp, thermal conductivity, CTE, flammability rating | | |
| | **Physical** | Density, MFI, water/moisture absorption, specific gravity, transparency, color | | |
| | **Electrical** | Dielectric strength/constant, volume/surface resistivity, dissipation factor | | |
| | **Chemical** | Acid/alkali/solvent/UV resistance, weatherability | | |
| | **Regulatory** | FDA, RoHS, REACH, UL94 | | |
| ## Setup | |
| ### 1. Environment Variables | |
| Create a `.env` file (see `.env.example`): | |
| ```bash | |
| TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxxxxxx | |
| HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx | |
| ``` | |
| ### 2. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 3. Run Locally | |
| ```bash | |
| python app.py | |
| ``` | |
| Open `http://localhost:7860` in your browser. | |
| ### 4. Deploy to HuggingFace Spaces | |
| 1. Create a new Space on HuggingFace (SDK: Gradio) | |
| 2. Add `TAVILY_API_KEY` and `HF_TOKEN` as Space secrets | |
| 3. Push this repository to the Space | |
| ```bash | |
| git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/polymer-datasheet-agent | |
| git push hf main | |
| ``` | |
| ## Usage | |
| ### Search & Add | |
| 1. Enter a manufacturer name (e.g., "SABIC") and/or polymer family (e.g., "Polycarbonate") | |
| 2. Optionally specify a grade (e.g., "Lexan 141R") | |
| 3. Click **Search & Add** β the agent will search the web, extract properties, and store the record | |
| ### Upload Datasheet | |
| 1. Upload a PDF datasheet | |
| 2. Click **Parse & Add** β the agent will extract text, parse properties via LLM, and store the record | |
| ### Database Browser | |
| - Search across all fields with free text | |
| - Filter by manufacturer or polymer family | |
| - Export the full database to CSV | |
| ## Project Context | |
| This is Part 1 of the **Plinity β Infinite Recyclable Polymers** project. The database built here will be consumed by a downstream **Material Matching Agent** that recommends polymers based on application requirements. | |
| ## Tech Stack | |
| - **LangGraph** β Workflow orchestration | |
| - **Tavily** β Web search API | |
| - **LLaMA 3.1 8B Instruct** β LLM via HuggingFace Inference API | |
| - **SQLite + SQLAlchemy** β Database | |
| - **Gradio** β Web UI | |
| - **PyMuPDF** β PDF parsing | |
| - **Pandas** β Data manipulation | |