ravimohan19's picture
Upload README.md with huggingface_hub
782d340 verified
---
title: Polymer Datasheet Crawler Agent
emoji: πŸ§ͺ
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false
license: mit
---
# πŸ§ͺ Polymer Datasheet Crawler Agent
A LangGraph-powered agent that automatically crawls the web for commercial polymer datasheets, extracts structured material properties using LLaMA 3.1, and builds a searchable database.
## Features
- **Web Search**: Uses [Tavily](https://tavily.com) to find datasheets from official manufacturer sites and accredited sources (MatWeb, UL Prospector, Omnexus, etc.)
- **LLM Extraction**: Sends raw content to [LLaMA 3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) via HuggingFace Inference API to extract 40+ material properties into structured JSON
- **PDF Upload**: Users can upload their own PDF datasheets for extraction
- **Searchable Database**: All records are stored in SQLite with full-text search and filtering
- **CSV Export**: One-click export of the entire database
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio UI β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚ LangGraph β”‚ Workflow Orchestration
β”‚ Workflow β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Router │────►│ Web Search β”‚ (Tavily API)
β”‚ β”‚ β”‚ OR Upload Parse β”‚ (PyMuPDF)
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”
β”‚ LLM Parse (LLaMA 3.1) β”‚ (HuggingFace Inference)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Store in SQLite DB β”‚ (SQLAlchemy)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Return Structured Output β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Property Categories Extracted
| Category | Properties |
|----------|-----------|
| **General** | Material name, trade name, manufacturer, polymer family, grade, description, processing method, features, applications |
| **Mechanical** | Tensile strength/modulus, elongation at break, flexural strength/modulus, impact strength (Charpy/Izod), hardness (Shore D/Rockwell), compressive strength |
| **Thermal** | Melting temp, glass transition temp, HDT, Vicat softening temp, service temp, thermal conductivity, CTE, flammability rating |
| **Physical** | Density, MFI, water/moisture absorption, specific gravity, transparency, color |
| **Electrical** | Dielectric strength/constant, volume/surface resistivity, dissipation factor |
| **Chemical** | Acid/alkali/solvent/UV resistance, weatherability |
| **Regulatory** | FDA, RoHS, REACH, UL94 |
## Setup
### 1. Environment Variables
Create a `.env` file (see `.env.example`):
```bash
TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxxxxxx
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Run Locally
```bash
python app.py
```
Open `http://localhost:7860` in your browser.
### 4. Deploy to HuggingFace Spaces
1. Create a new Space on HuggingFace (SDK: Gradio)
2. Add `TAVILY_API_KEY` and `HF_TOKEN` as Space secrets
3. Push this repository to the Space
```bash
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/polymer-datasheet-agent
git push hf main
```
## Usage
### Search & Add
1. Enter a manufacturer name (e.g., "SABIC") and/or polymer family (e.g., "Polycarbonate")
2. Optionally specify a grade (e.g., "Lexan 141R")
3. Click **Search & Add** β€” the agent will search the web, extract properties, and store the record
### Upload Datasheet
1. Upload a PDF datasheet
2. Click **Parse & Add** β€” the agent will extract text, parse properties via LLM, and store the record
### Database Browser
- Search across all fields with free text
- Filter by manufacturer or polymer family
- Export the full database to CSV
## Project Context
This is Part 1 of the **Plinity β€” Infinite Recyclable Polymers** project. The database built here will be consumed by a downstream **Material Matching Agent** that recommends polymers based on application requirements.
## Tech Stack
- **LangGraph** β€” Workflow orchestration
- **Tavily** β€” Web search API
- **LLaMA 3.1 8B Instruct** β€” LLM via HuggingFace Inference API
- **SQLite + SQLAlchemy** β€” Database
- **Gradio** β€” Web UI
- **PyMuPDF** β€” PDF parsing
- **Pandas** β€” Data manipulation