ravimohan19's picture
Upload README.md with huggingface_hub
782d340 verified
metadata
title: Polymer Datasheet Crawler Agent
emoji: πŸ§ͺ
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false
license: mit

πŸ§ͺ Polymer Datasheet Crawler Agent

A LangGraph-powered agent that automatically crawls the web for commercial polymer datasheets, extracts structured material properties using LLaMA 3.1, and builds a searchable database.

Features

  • Web Search: Uses Tavily to find datasheets from official manufacturer sites and accredited sources (MatWeb, UL Prospector, Omnexus, etc.)
  • LLM Extraction: Sends raw content to LLaMA 3.1 via HuggingFace Inference API to extract 40+ material properties into structured JSON
  • PDF Upload: Users can upload their own PDF datasheets for extraction
  • Searchable Database: All records are stored in SQLite with full-text search and filtering
  • CSV Export: One-click export of the entire database

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gradio UI  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚  LangGraph   β”‚   Workflow Orchestration
β”‚  Workflow    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Router   │────►│  Web Search      β”‚ (Tavily API)
  β”‚           β”‚     β”‚  OR Upload Parse β”‚ (PyMuPDF)
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                    β”‚
  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”
  β”‚  LLM Parse (LLaMA 3.1)    β”‚ (HuggingFace Inference)
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Store in SQLite DB        β”‚ (SQLAlchemy)
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Return Structured Output  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Property Categories Extracted

Category Properties
General Material name, trade name, manufacturer, polymer family, grade, description, processing method, features, applications
Mechanical Tensile strength/modulus, elongation at break, flexural strength/modulus, impact strength (Charpy/Izod), hardness (Shore D/Rockwell), compressive strength
Thermal Melting temp, glass transition temp, HDT, Vicat softening temp, service temp, thermal conductivity, CTE, flammability rating
Physical Density, MFI, water/moisture absorption, specific gravity, transparency, color
Electrical Dielectric strength/constant, volume/surface resistivity, dissipation factor
Chemical Acid/alkali/solvent/UV resistance, weatherability
Regulatory FDA, RoHS, REACH, UL94

Setup

1. Environment Variables

Create a .env file (see .env.example):

TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxxxxxx
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

2. Install Dependencies

pip install -r requirements.txt

3. Run Locally

python app.py

Open http://localhost:7860 in your browser.

4. Deploy to HuggingFace Spaces

  1. Create a new Space on HuggingFace (SDK: Gradio)
  2. Add TAVILY_API_KEY and HF_TOKEN as Space secrets
  3. Push this repository to the Space
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/polymer-datasheet-agent
git push hf main

Usage

Search & Add

  1. Enter a manufacturer name (e.g., "SABIC") and/or polymer family (e.g., "Polycarbonate")
  2. Optionally specify a grade (e.g., "Lexan 141R")
  3. Click Search & Add β€” the agent will search the web, extract properties, and store the record

Upload Datasheet

  1. Upload a PDF datasheet
  2. Click Parse & Add β€” the agent will extract text, parse properties via LLM, and store the record

Database Browser

  • Search across all fields with free text
  • Filter by manufacturer or polymer family
  • Export the full database to CSV

Project Context

This is Part 1 of the Plinity β€” Infinite Recyclable Polymers project. The database built here will be consumed by a downstream Material Matching Agent that recommends polymers based on application requirements.

Tech Stack

  • LangGraph β€” Workflow orchestration
  • Tavily β€” Web search API
  • LLaMA 3.1 8B Instruct β€” LLM via HuggingFace Inference API
  • SQLite + SQLAlchemy β€” Database
  • Gradio β€” Web UI
  • PyMuPDF β€” PDF parsing
  • Pandas β€” Data manipulation