--- title: Polymer Datasheet Crawler Agent emoji: ๐Ÿงช colorFrom: blue colorTo: green sdk: docker app_file: app.py pinned: false license: mit --- # ๐Ÿงช Polymer Datasheet Crawler Agent A LangGraph-powered agent that automatically crawls the web for commercial polymer datasheets, extracts structured material properties using LLaMA 3.1, and builds a searchable database. ## Features - **Web Search**: Uses [Tavily](https://tavily.com) to find datasheets from official manufacturer sites and accredited sources (MatWeb, UL Prospector, Omnexus, etc.) - **LLM Extraction**: Sends raw content to [LLaMA 3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) via HuggingFace Inference API to extract 40+ material properties into structured JSON - **PDF Upload**: Users can upload their own PDF datasheets for extraction - **Searchable Database**: All records are stored in SQLite with full-text search and filtering - **CSV Export**: One-click export of the entire database ## Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Gradio UI โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ LangGraph โ”‚ Workflow Orchestration โ”‚ Workflow โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Router โ”‚โ”€โ”€โ”€โ”€โ–บโ”‚ Web Search โ”‚ (Tavily API) โ”‚ โ”‚ โ”‚ OR Upload Parse โ”‚ (PyMuPDF) โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ” โ”‚ LLM Parse (LLaMA 3.1) โ”‚ (HuggingFace Inference) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Store in SQLite DB โ”‚ (SQLAlchemy) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Return Structured Output โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## Property Categories Extracted | Category | Properties | |----------|-----------| | **General** | Material name, trade name, manufacturer, polymer family, grade, description, processing method, features, applications | | **Mechanical** | Tensile strength/modulus, elongation at break, flexural strength/modulus, impact strength (Charpy/Izod), hardness (Shore D/Rockwell), compressive strength | | **Thermal** | Melting temp, glass transition temp, HDT, Vicat softening temp, service temp, thermal conductivity, CTE, flammability rating | | **Physical** | Density, MFI, water/moisture absorption, specific gravity, transparency, color | | **Electrical** | Dielectric strength/constant, volume/surface resistivity, dissipation factor | | **Chemical** | Acid/alkali/solvent/UV resistance, weatherability | | **Regulatory** | FDA, RoHS, REACH, UL94 | ## Setup ### 1. Environment Variables Create a `.env` file (see `.env.example`): ```bash TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxxxxxx HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx ``` ### 2. Install Dependencies ```bash pip install -r requirements.txt ``` ### 3. Run Locally ```bash python app.py ``` Open `http://localhost:7860` in your browser. ### 4. Deploy to HuggingFace Spaces 1. Create a new Space on HuggingFace (SDK: Gradio) 2. Add `TAVILY_API_KEY` and `HF_TOKEN` as Space secrets 3. Push this repository to the Space ```bash git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/polymer-datasheet-agent git push hf main ``` ## Usage ### Search & Add 1. Enter a manufacturer name (e.g., "SABIC") and/or polymer family (e.g., "Polycarbonate") 2. Optionally specify a grade (e.g., "Lexan 141R") 3. Click **Search & Add** โ€” the agent will search the web, extract properties, and store the record ### Upload Datasheet 1. Upload a PDF datasheet 2. Click **Parse & Add** โ€” the agent will extract text, parse properties via LLM, and store the record ### Database Browser - Search across all fields with free text - Filter by manufacturer or polymer family - Export the full database to CSV ## Project Context This is Part 1 of the **Plinity โ€” Infinite Recyclable Polymers** project. The database built here will be consumed by a downstream **Material Matching Agent** that recommends polymers based on application requirements. ## Tech Stack - **LangGraph** โ€” Workflow orchestration - **Tavily** โ€” Web search API - **LLaMA 3.1 8B Instruct** โ€” LLM via HuggingFace Inference API - **SQLite + SQLAlchemy** โ€” Database - **Gradio** โ€” Web UI - **PyMuPDF** โ€” PDF parsing - **Pandas** โ€” Data manipulation